将字符列拆分为多列

Question

我想要做的是将一个字符列拆分为多个列，而不会丢失 df 中的附加数据，并且列数根据输入是可变的。 我想举个例子会更容易：

df <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))

我想通过 sep = "\n" 将列 c 拆分为多列。

我尝试使用seperate(df$c, "\n", 10)但它不起作用，因为我使用字符作为分隔符。 10 只是一个想法，所以我宁愿拥有比需要更多的列而不是删除信息。
我尝试使用str_split_fixed(df$c, "\n", 10)工作正常，但它删除了列 a 和 b，我不知道为什么或如何解决这个问题。

附加信息：最后我想在列表中使用该命令。

编辑：

df1 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
df2 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla  \n ghfdghf \n hdhdh \n hjgfj \n td", "bla"))

map(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))

[[1]]
  a         b   c1    c2        c3      c4
1 1       bla one   two      three    <NA>
2 2      word bla   why   morebla   helpme
3 3 otherword bla    bla      <NA>    <NA>
4 4      nice  bla  <NA>      <NA>    <NA>

[[2]]
  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one    two      three    <NA>    <NA> <NA>
2 2      word bla    why   morebla   helpme    <NA> <NA>
3 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
4 4      nice  bla   <NA>      <NA>    <NA>    <NA> <NA> 

df <- data.frame(unlist(list))

我想这可能会导致问题，因为列表中的列数不同。 预期结果：

  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one   two      three    <NA>     <NA>  <NA>
2 2      word bla   why   morebla   helpme     <NA>  <NA>
3 3 otherword bla    bla      <NA>    <NA>     <NA>  <NA>
4 4      nice  bla  <NA>      <NA>    <NA>     <NA>  <NA>
5 1       bla one    two      three    <NA>    <NA> <NA>
6 2      word bla    why   morebla   helpme    <NA> <NA>
7 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
8 4      nice  bla   <NA>      <NA>    <NA>    <NA> <NA>

Answer 1

如果在 tidyverse/dplyr pipe 有点语法，您可以separate tidyr与stringr::str_count结合使用，这完全符合您的要求。

df <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))

library(tidyverse)
df %>% separate(c, into = paste0('c', seq_len(max(str_count(df$c, '\n')+1))), sep = '\n', fill = 'right')

  a         b   c1    c2        c3      c4
1 1       bla one   two      three    <NA>
2 2      word bla   why   morebla   helpme
3 3 otherword bla    bla      <NA>    <NA>
4 4      nice  bla  <NA>      <NA>    <NA>

要在 data.frames 列表上执行此操作，请这样做

df1 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla", "bla"))
df2 <- data.frame(a = c(1:4), b = c("bla", "word", "otherword", "nice"), c = c("one \n two \n three", "bla \n why \n morebla \n helpme", "bla \n bla  \n ghfdghf \n hdhdh \n hjgfj \n td", "bla"))

map(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))

[[1]]
  a         b   c1    c2        c3      c4
1 1       bla one   two      three    <NA>
2 2      word bla   why   morebla   helpme
3 3 otherword bla    bla      <NA>    <NA>
4 4      nice  bla  <NA>      <NA>    <NA>

[[2]]
  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one    two      three    <NA>    <NA> <NA>
2 2      word bla    why   morebla   helpme    <NA> <NA>
3 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
4 4      nice  bla   <NA>      <NA>    <NA>    <NA> <NA>

鉴于修订后的问题进一步编辑

改用map_dfr

map_dfr(list(df1, df2), ~.x %>% separate(c, into = paste0('c', seq_len(max(str_count(.x$c, '\n')+1))), sep = '\n', fill = 'right'))

  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one    two      three    <NA>    <NA> <NA>
2 2      word bla    why   morebla   helpme    <NA> <NA>
3 3 otherword bla     bla      <NA>    <NA>    <NA> <NA>
4 4      nice  bla   <NA>      <NA>    <NA>    <NA> <NA>
5 1       bla one    two      three    <NA>    <NA> <NA>
6 2      word bla    why   morebla   helpme    <NA> <NA>
7 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
8 4      nice  bla   <NA>      <NA>    <NA>    <NA> <NA>

但是我看不出为什么要在列表的单独项目上执行它，然后是 r-binding 而不是第一个 r-binding，然后在没有map*的情况下简单地执行它

df1 %>% rbind(df2) %>% separate(c, into = paste0('c', seq_len(max(str_count(.$c, '\n')+1))), sep = '\n', fill = 'right')

  a         b   c1     c2        c3      c4      c5   c6
1 1       bla one    two      three    <NA>    <NA> <NA>
2 2      word bla    why   morebla   helpme    <NA> <NA>
3 3 otherword bla     bla      <NA>    <NA>    <NA> <NA>
4 4      nice  bla   <NA>      <NA>    <NA>    <NA> <NA>
5 1       bla one    two      three    <NA>    <NA> <NA>
6 2      word bla    why   morebla   helpme    <NA> <NA>
7 3 otherword bla   bla    ghfdghf   hdhdh   hjgfj    td
8 4      nice  bla   <NA>      <NA>    <NA>    <NA> <NA>

Answer 2

cc = strsplit(df$c, "\n")
l = max(lengths(cc))
CC = lapply(cc, function(x) c(x, rep(NA, l-length(x))))
CC = do.call(rbind, CC)
cbind(df[c('a', 'b')], CC)

a         b    1     2         3       4
1       bla one   two      three    <NA>
2      word bla   why   morebla   helpme
3 otherword bla    bla      <NA>    <NA>
4      nice  bla  <NA>      <NA>    <NA>

将字符列拆分为多列

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-04-29 11:55:07

解决方案2
1 2021-04-29 11:55:11

将字符列拆分为多列

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-04-29 11:55:07

解决方案2 1 2021-04-29 11:55:11

解决方案1
2 已采纳 2021-04-29 11:55:07

解决方案2
1 2021-04-29 11:55:11