将一列中包含的数据拆分为 R 中的 3 列

Question

I have a dataset containing character vectors (that are really numbers) that i want to split into 3 different columns.我有一个包含字符向量（实际上是数字）的数据集，我想将其拆分为 3 个不同的列。 These 3 columns need to have the 3 numbers contained in the original column.这 3 列需要包含原始列中的 3 个数字。

Data<-data.frame(c("1.50 (1.30 to 1.70)", "1.30 (1.20 to 1.50)"))`

colnames(Data)<- "values"

Data

        values
    1.50 (1.30 to 1.70)
    1.30 (1.20 to 1.50)

The result i expect is this.我期望的结果是这样的。

value1       value2        value3
 1.50          1.30          1.70
 1.30          1.20          1.50

Answer 1

One way of doing this can be to use the seperate in package tidyr .这样做的一种方式可以是使用seperate封装tidyr 。 From the documentation : Separate a character column into multiple columns with a regular expression or numeric locations来自文档： Separate a character column into multiple columns with a regular expression or numeric locations

Adapting form the example in documentation, using decimal, and using extra="drop" for dropping discarded data without warnings :改编文档中的示例，使用十进制，并使用extra="drop"删除丢弃的数据而没有警告：

Data<-data.frame(c("1.50 (1.30 to 1.70)", "1.30 (1.20 to 1.50)")))
colnames(Data)<- "values"
Data 
 

require(tidyr)
separate(Data, col = values, into = paste0("value",1:3),
                     sep = "[^[:digit:]?\\.]+" , extra="drop")

#output 
   value1 value2 value3
> 1    150  0.130  170.0
> 2  13.02    120  150.5

Answer 2

We can also use extract specifying the regex pattern to extract data.我们也可以用extract指定的正则表达式来提取数据。

tidyr::extract(Data, values, paste0("value",1:3), 
             regex = '(\\d+\\.\\d+)\\s\\((\\d+\\.\\d+)\\sto\\s(\\d+\\.\\d+)\\)')

#  value1 value2 value3
#1   1.50   1.30   1.70
#2   1.30   1.20   1.50

(\\\\d+\\\\.\\\\d+) is used to extract a decimal value (\\\\d+\\\\.\\\\d+)用于提取十进制值

\\\\s is whitespace. \\\\s是空格。

We use capture groups to extract the value in three different columns.我们使用捕获组来提取三个不同列中的值。

Answer 3

You can try this code:你可以试试这个代码：

library(easyr)
x = data.frame(c("1.50 (1.30 to 1.70)", "1.30 (1.20 to 1.50)"))
colnames(x)[1] = "val"
x$val1 = left(x$val, 4)
x$val2 = mid(x$val, 7,4)
x$val3 = mid(x$val, 15,4)

将一列中包含的数据拆分为 R 中的 3 列

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-10-18 02:54:45

解决方案2
0 2020-10-18 03:43:22

解决方案3
0 2020-10-18 04:32:49

将一列中包含的数据拆分为 R 中的 3 列

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-10-18 02:54:45

解决方案2 0 2020-10-18 03:43:22

解决方案3 0 2020-10-18 04:32:49

解决方案1
1 已采纳 2020-10-18 02:54:45

解决方案2
0 2020-10-18 03:43:22

解决方案3
0 2020-10-18 04:32:49