如何通过使用正则表达式从向量列中提取数字来创建新列？

Question

I need to extract numbers from a vector column using regex and create columns for each number. 我需要使用正则表达式从向量列中提取数字并为每个数字创建列。 The length of the vector will not be the same for each row. 每行的向量长度将不同。

For other purposes such as counting the number of elements within the vector I have used 用于其他目的，例如计算我使用的向量中的元素数

str_count(DATA$vectorCOL, '[0-9.+]+')

This is the data column 这是数据列

vectorCOL
63.
11., 36., 45+1., 79., 90+1.

45., 80., 87.

Expected output 预期产量

vectorCOL                    col1  col2 col3  col4 col5
63.                          63    NA   NA    NA   NA
11., 36., 45+1., 79., 90+1.  11    36   45+1  79   90+1
                             NA    NA   NA    NA   NA
45., 80., 87.                45    80   87    NA   NA

Answer 1

We can use cSplit 我们可以使用cSplit

splitstackshape::cSplit(df, "vectorCOL", sep = ",", drop = FALSE)
#                vectorCOL vectorCOL_1 vectorCOL_2 vectorCOL_3 vectorCOL_4 vectorCOL_5
#1:                     63.          63          NA        <NA>          NA        <NA>
#2: 11.,36.,45+1.,79.,90+1.          11          36       45+1.          79       90+1.
#3:                                  NA          NA        <NA>          NA        <NA>
#4:             45.,80.,87.          45          80         87.          NA        <NA>

If we don't want "." 如果我们不想要“。” in the output, we can remove them first using gsub . 在输出中，我们可以先使用gsub删除它们。

df$vectorCOL <- gsub("\\.", "",df$vectorCOL)

data 数据

df <-  structure(list(vectorCOL = c("63.", "11., 36., 45+1., 79., 90+1.", 
"", "45., 80., 87.")), row.names = c(NA, -4L), class = "data.frame")

Answer 2

Using data.table : 使用data.table ：

df <- df[, c(vectorCOL = list(vectorCOL), tstrsplit(vectorCOL, ","))]
setnames(df, names(df), sub("V", "col", names(df)))
df
#                      vectorCOL col2 col3   col4 col5   col6
# 1:                         63.  63. <NA>   <NA> <NA>   <NA>
# 2: 11., 36., 45+1., 79., 90+1.  11.  36.  45+1.  79.  90+1.
# 3:                             <NA> <NA>   <NA> <NA>   <NA>
# 4:               45., 80., 87.  45.  80.    87. <NA>   <NA>

如何通过使用正则表达式从向量列中提取数字来创建新列？

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-09-12 11:03:54

解决方案2
0 2019-09-12 11:52:13

如何通过使用正则表达式从向量列中提取数字来创建新列？

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-09-12 11:03:54

解决方案2 0 2019-09-12 11:52:13

解决方案1
1 已采纳 2019-09-12 11:03:54

解决方案2
0 2019-09-12 11:52:13