[英]How to create new columns by extracting numbers from a vector column with regex?
I need to extract numbers from a vector column using regex and create columns for each number. 我需要使用正则表达式从向量列中提取数字并为每个数字创建列。 The length of the vector will not be the same for each row. 每行的向量长度将不同。
For other purposes such as counting the number of elements within the vector I have used 用于其他目的,例如计算我使用的向量中的元素数
str_count(DATA$vectorCOL, '[0-9.+]+')
This is the data column 这是数据列
vectorCOL
63.
11., 36., 45+1., 79., 90+1.
45., 80., 87.
Expected output 预期产量
vectorCOL col1 col2 col3 col4 col5
63. 63 NA NA NA NA
11., 36., 45+1., 79., 90+1. 11 36 45+1 79 90+1
NA NA NA NA NA
45., 80., 87. 45 80 87 NA NA
We can use cSplit
我们可以使用cSplit
splitstackshape::cSplit(df, "vectorCOL", sep = ",", drop = FALSE)
# vectorCOL vectorCOL_1 vectorCOL_2 vectorCOL_3 vectorCOL_4 vectorCOL_5
#1: 63. 63 NA <NA> NA <NA>
#2: 11.,36.,45+1.,79.,90+1. 11 36 45+1. 79 90+1.
#3: NA NA <NA> NA <NA>
#4: 45.,80.,87. 45 80 87. NA <NA>
If we don't want "." 如果我们不想要“。” in the output, we can remove them first using gsub
. 在输出中,我们可以先使用gsub
删除它们。
df$vectorCOL <- gsub("\\.", "",df$vectorCOL)
data 数据
df <- structure(list(vectorCOL = c("63.", "11., 36., 45+1., 79., 90+1.",
"", "45., 80., 87.")), row.names = c(NA, -4L), class = "data.frame")
Using data.table
: 使用data.table
:
df <- df[, c(vectorCOL = list(vectorCOL), tstrsplit(vectorCOL, ","))]
setnames(df, names(df), sub("V", "col", names(df)))
df
# vectorCOL col2 col3 col4 col5 col6
# 1: 63. 63. <NA> <NA> <NA> <NA>
# 2: 11., 36., 45+1., 79., 90+1. 11. 36. 45+1. 79. 90+1.
# 3: <NA> <NA> <NA> <NA> <NA>
# 4: 45., 80., 87. 45. 80. 87. <NA> <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.