简体   繁体   English

如何通过使用正则表达式从向量列中提取数字来创建新列?

[英]How to create new columns by extracting numbers from a vector column with regex?

I need to extract numbers from a vector column using regex and create columns for each number. 我需要使用正则表达式从向量列中提取数字并为每个数字创建列。 The length of the vector will not be the same for each row. 每行的向量长度将不同。

For other purposes such as counting the number of elements within the vector I have used 用于其他目的,例如计算我使用的向量中的元素数

str_count(DATA$vectorCOL, '[0-9.+]+')

This is the data column 这是数据列

vectorCOL
63.
11., 36., 45+1., 79., 90+1.

45., 80., 87.

Expected output 预期产量

vectorCOL                    col1  col2 col3  col4 col5
63.                          63    NA   NA    NA   NA
11., 36., 45+1., 79., 90+1.  11    36   45+1  79   90+1
                             NA    NA   NA    NA   NA
45., 80., 87.                45    80   87    NA   NA

We can use cSplit 我们可以使用cSplit

splitstackshape::cSplit(df, "vectorCOL", sep = ",", drop = FALSE)
#                vectorCOL vectorCOL_1 vectorCOL_2 vectorCOL_3 vectorCOL_4 vectorCOL_5
#1:                     63.          63          NA        <NA>          NA        <NA>
#2: 11.,36.,45+1.,79.,90+1.          11          36       45+1.          79       90+1.
#3:                                  NA          NA        <NA>          NA        <NA>
#4:             45.,80.,87.          45          80         87.          NA        <NA>

If we don't want "." 如果我们不想要“。” in the output, we can remove them first using gsub . 在输出中,我们可以先使用gsub删除它们。

df$vectorCOL <- gsub("\\.", "",df$vectorCOL)

data 数据

df <-  structure(list(vectorCOL = c("63.", "11., 36., 45+1., 79., 90+1.", 
"", "45., 80., 87.")), row.names = c(NA, -4L), class = "data.frame")

Using data.table : 使用data.table

df <- df[, c(vectorCOL = list(vectorCOL), tstrsplit(vectorCOL, ","))]
setnames(df, names(df), sub("V", "col", names(df)))
df
#                      vectorCOL col2 col3   col4 col5   col6
# 1:                         63.  63. <NA>   <NA> <NA>   <NA>
# 2: 11., 36., 45+1., 79., 90+1.  11.  36.  45+1.  79.  90+1.
# 3:                             <NA> <NA>   <NA> <NA>   <NA>
# 4:               45., 80., 87.  45.  80.    87. <NA>   <NA>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM