[英]R: Splitting one column (different lengths) into new columns
I have a column of data that I would like to separate by comma (I have no problem with this part).我有一列数据,我想用逗号分隔(我对这部分没有问题)。 The problem I'm having is that I would like it to be separated into new columns in the data frame, and the original column itself has different numbers of values separated by commas.
我遇到的问题是我希望将它分成数据框中的新列,而原始列本身具有不同数量的值,以逗号分隔。 For example:
例如:
Column 1第 1 栏
Column1
1 AAA, BBB, CCC
2 AA232B
3 A, B, C, DDD
4 52 AJD 23
Given this set of data, I would have four columns:鉴于这组数据,我将有四列:
Col1 Col2 Col3 Col4
1 AAA BBB CCC
2 AA232B
3 A B C D
4 52 ADJ 23
Thanks!谢谢!
Here is another option using cSplit
这是使用
cSplit
另一个选项
library(splitstackshape)
cSplit(df, "x", ",")
# x_1 x_2 x_3 x_4
#1: AAA BBB CCC NA
#2: AA232B NA NA NA
#3: A B C DDD
#4: 52 AJD 23 NA NA NA
###data ###数据
df <- data.frame(x=c("AAA, BBB, CCC","AA232B","A, B, C, DDD","52 AJD 23"))
Use tidyr
library.使用
tidyr
库。
library(tidyr)
> df <- data.frame(col1 = c('AAA, BBB, CCC',
'AA232B',
'A, B, C, DDD',
'52 AJD 23'))
> df %>% separate(col1, paste0('col', c(1:4)), sep = ',', remove = T)
> df
## col1 col2 col3 col4
## 1 AAA BBB CCC <NA>
## 2 AA232B <NA> <NA> <NA>
## 3 A B C DDD
## 4 52 AJD 23 <NA> <NA> <NA>
Hope below query works, where a,b,c,d refers to column names.You can replace NA according to your wish.希望下面的查询有效,其中 a,b,c,d 指的是列名。您可以根据自己的意愿替换 NA。
df<-data.table(x=c("AAA, BBB, CCC","AA232B","A, B, C, DDD","52 AJD 23"))
df %>% separate(x, c("a","b","c","d"), extra = "merge", fill = "left")
abcd 1 AAA BBB CCC <NA> 2 AA232B <NA> <NA> <NA> 3 ABC DDD 4 52 AJD 23 <NA>
Just for comparison, a way with only base functions, aka the case for tidyr
只是为了比较,一种只有基函数的方法,也就是
tidyr
test <- apply(df, 1, function(i) {unlist( strsplit( i, split = ",") )})
test <- lapply(test, function(i) {c( i, rep( NA, 4-length(i)) )})
test <- data.frame(matrix(unlist(test), ncol = 4, byrow = T))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.