[英]split character column into multiple columns
我想要做的是将一个字符列拆分为多个列,而不会丢失 df 中的附加数据,并且列数根据输入是可变的。 我想举个例子会更容易:
df <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
我想通过 sep = "\n" 将列 c 拆分为多列。
我尝试使用seperate(df$c, "\n", 10)
但它不起作用,因为我使用字符作为分隔符。 10 只是一个想法,所以我宁愿拥有比需要更多的列而不是删除信息。
我尝试使用str_split_fixed(df$c, "\n", 10)
工作正常,但它删除了列 a 和 b,我不知道为什么或如何解决这个问题。
附加信息:最后我想在列表中使用该命令。
编辑:
df1 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
df2 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla \n ghfdghf \n hdhdh \n hjgfj \n td", "bla"))
map(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))
[[1]]
a b c1 c2 c3 c4
1 1 bla one two three <NA>
2 2 word bla why morebla helpme
3 3 otherword bla bla <NA> <NA>
4 4 nice bla <NA> <NA> <NA>
[[2]]
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla ghfdghf hdhdh hjgfj td
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
df <- data.frame(unlist(list))
我想这可能会导致问题,因为列表中的列数不同。 预期结果:
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla <NA> <NA> <NA> <NA>
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
5 1 bla one two three <NA> <NA> <NA>
6 2 word bla why morebla helpme <NA> <NA>
7 3 otherword bla bla ghfdghf hdhdh hjgfj td
8 4 nice bla <NA> <NA> <NA> <NA> <NA>
如果在 tidyverse/dplyr pipe 有点语法,您可以separate
tidyr
与stringr::str_count
结合使用,这完全符合您的要求。
df <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
library(tidyverse)
df %>% separate(c, into = paste0('c', seq_len(max(str_count(df$c, '\n')+1))), sep = '\n', fill = 'right')
a b c1 c2 c3 c4
1 1 bla one two three <NA>
2 2 word bla why morebla helpme
3 3 otherword bla bla <NA> <NA>
4 4 nice bla <NA> <NA> <NA>
要在 data.frames 列表上执行此操作,请这样做
df1 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
df2 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla \n ghfdghf \n hdhdh \n hjgfj \n td", "bla"))
map(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))
[[1]]
a b c1 c2 c3 c4
1 1 bla one two three <NA>
2 2 word bla why morebla helpme
3 3 otherword bla bla <NA> <NA>
4 4 nice bla <NA> <NA> <NA>
[[2]]
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla ghfdghf hdhdh hjgfj td
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
鉴于修订后的问题进一步编辑
map_dfr
map_dfr(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla <NA> <NA> <NA> <NA>
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
5 1 bla one two three <NA> <NA> <NA>
6 2 word bla why morebla helpme <NA> <NA>
7 3 otherword bla bla ghfdghf hdhdh hjgfj td
8 4 nice bla <NA> <NA> <NA> <NA> <NA>
map*
的情况下简单地执行它df1 %>% rbind(df2) %>% separate(c, into = paste0('c', seq_len(max(str_count(.$c, '\n')+1))), sep = '\n', fill = 'right')
a b c1 c2 c3 c4 c5 c6
1 1 bla one two three <NA> <NA> <NA>
2 2 word bla why morebla helpme <NA> <NA>
3 3 otherword bla bla <NA> <NA> <NA> <NA>
4 4 nice bla <NA> <NA> <NA> <NA> <NA>
5 1 bla one two three <NA> <NA> <NA>
6 2 word bla why morebla helpme <NA> <NA>
7 3 otherword bla bla ghfdghf hdhdh hjgfj td
8 4 nice bla <NA> <NA> <NA> <NA> <NA>
cc = strsplit(df$c, "\n")
l = max(lengths(cc))
CC = lapply(cc, function(x) c(x, rep(NA, l-length(x))))
CC = do.call(rbind, CC)
cbind(df[c('a', 'b')], CC)
a b 1 2 3 4
1 bla one two three <NA>
2 word bla why morebla helpme
3 otherword bla bla <NA> <NA>
4 nice bla <NA> <NA> <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.