简体   繁体   English

将一列中包含的数据拆分为 R 中的 3 列

[英]Split data contained in one column into 3 columns in R

I have a dataset containing character vectors (that are really numbers) that i want to split into 3 different columns.我有一个包含字符向量(实际上是数字)的数据集,我想将其拆分为 3 个不同的列。 These 3 columns need to have the 3 numbers contained in the original column.这 3 列需要包含原始列中的 3 个数字。

Data<-data.frame(c("1.50 (1.30 to 1.70)", "1.30 (1.20 to 1.50)"))`

colnames(Data)<- "values"

Data

        values
    1.50 (1.30 to 1.70)
    1.30 (1.20 to 1.50)

The result i expect is this.我期望的结果是这样的。

value1       value2        value3
 1.50          1.30          1.70
 1.30          1.20          1.50

One way of doing this can be to use the seperate in package tidyr .这样做的一种方式可以是使用seperate封装tidyr From the documentation : Separate a character column into multiple columns with a regular expression or numeric locations来自文档: Separate a character column into multiple columns with a regular expression or numeric locations

Adapting form the example in documentation, using decimal, and using extra="drop" for dropping discarded data without warnings :改编文档中的示例,使用十进制,并使用extra="drop"删除丢弃的数据而没有警告:

Data<-data.frame(c("1.50 (1.30 to 1.70)", "1.30 (1.20 to 1.50)")))
colnames(Data)<- "values"
Data 
 

require(tidyr)
separate(Data, col = values, into = paste0("value",1:3),
                     sep = "[^[:digit:]?\\.]+" , extra="drop")

#output 
   value1 value2 value3
> 1    150  0.130  170.0
> 2  13.02    120  150.5

We can also use extract specifying the regex pattern to extract data.我们也可以用extract指定的正则表达式来提取数据。

tidyr::extract(Data, values, paste0("value",1:3), 
             regex = '(\\d+\\.\\d+)\\s\\((\\d+\\.\\d+)\\sto\\s(\\d+\\.\\d+)\\)')

#  value1 value2 value3
#1   1.50   1.30   1.70
#2   1.30   1.20   1.50

(\\\\d+\\\\.\\\\d+) is used to extract a decimal value (\\\\d+\\\\.\\\\d+)用于提取十进制值

\\\\s is whitespace. \\\\s是空格。

We use capture groups to extract the value in three different columns.我们使用捕获组来提取三个不同列中的值。

You can try this code:你可以试试这个代码:

library(easyr)
x = data.frame(c("1.50 (1.30 to 1.70)", "1.30 (1.20 to 1.50)"))
colnames(x)[1] = "val"
x$val1 = left(x$val, 4)
x$val2 = mid(x$val, 7,4)
x$val3 = mid(x$val, 15,4)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM