用不同数量的逗号分隔多列中的行

Question

I use R and I have a dataframe with 3 columns that contains values separed with ",".我使用R并且我有一个 dataframe 有 3 列，其中包含用“，”分隔的值。

Here's what it looks like:这是它的样子：

col_A可乐	col_B col_B	col_C col_C
first_name,last_name,age名字，姓氏，年龄	John,Appleseed,23约翰，苹果籽，23	Steve,Jobs, 33史蒂夫，乔布斯，33

I want each value separed by a comma to create a new row for this value.我希望用逗号分隔的每个值为此值创建一个新行。 So it should look like this:所以它应该是这样的：

col_A可乐	col_B col_B	col_C col_C
first_name名	John约翰	Steve史蒂夫
last_name姓	Appleseed苹果籽	Jobs工作
age年龄	23 23	33 33

I have succeeded to perform it by doing like this:我通过这样做成功地执行了它：

col_A<- strsplit(df$col_A, split = ",")
col_B<- strsplit(df$col_B, split = ",")
col_C<- strsplit(df$col_C, split = ",")

df2<-data.frame(col_A= unlist(col_A),
                 col_B=unlist(col_B),
                col_C=unlist(col_C))

the problem is the table is messy: sometimes I have different number of commas, so when I use str split, I don't have the same number of elements in my lists and the data.frame() function will not work if there isn't the same number of elements.问题是表格很乱：有时我有不同数量的逗号，所以当我使用 str split 时，我的列表中没有相同数量的元素，并且 data.frame() function 将不起作用，如果有'不相同数量的元素。 To illustrate, sometimes I will have 3 elements separed by a comma in col_A, while there are 4 commas in col_B and col_C.为了说明，有时我会在 col_A 中用逗号分隔 3 个元素，而在 col_B 和 col_C 中有 4 个逗号。 And vice versa.反之亦然。 Here's an example:这是一个例子：

col_A可乐	col_B col_B	col_C col_C
first_name,last_name,age名字，姓氏，年龄	John,Appleseed,23,约翰，Appleseed，23，	Steve,Jobs, 33,史蒂夫，乔布斯，33 岁，

How can I do to get rid of this problem of formatting?我该怎么做才能摆脱这种格式问题？ Adding commas before using str_split don't seem like a good solution to me.在使用 str_split 之前添加逗号对我来说似乎不是一个好的解决方案。

Answer 1

You can use str_remove() across al columns to get rid of the ending commas.您可以在所有列中使用str_remove()来去掉结尾的逗号。 Then you can separate_rows() to get what you want.然后你可以separate_rows()来得到你想要的。 This will not affect the output in rows without ending commas.这不会影响没有逗号结尾的行中的 output。

library(tidyverse)

df1 <- tibble::tribble(
                        ~col_A,              ~col_B,           ~col_C,
    "first_name,last_name,age", "John,Appleseed,23", "Steve,Jobs, 33"
    )

df2 <- tibble::tribble(
                        ~col_A,               ~col_B,            ~col_C,
    "first_name,last_name,age", "John,Appleseed,23,", "Steve,Jobs, 33,"
    )

df1 %>% 
    mutate(across(.fns = ~str_remove(.x, ",$"))) %>% 
    separate_rows(everything(), sep = ",")
#> # A tibble: 3 x 3
#>   col_A      col_B     col_C  
#>   <chr>      <chr>     <chr>  
#> 1 first_name John      "Steve"
#> 2 last_name  Appleseed "Jobs" 
#> 3 age        23        " 33"

df2 %>% 
    mutate(across(.fns = ~str_remove(.x, ",$"))) %>% 
    separate_rows(everything(), sep = ",")
#> # A tibble: 3 x 3
#>   col_A      col_B     col_C  
#>   <chr>      <chr>     <chr>  
#> 1 first_name John      "Steve"
#> 2 last_name  Appleseed "Jobs" 
#> 3 age        23        " 33"

^{Created on 2021-03-02 by the reprex package (v0.3.0)}^{由代表 package (v0.3.0) 于 2021 年 3 月 2 日创建}

Answer 2

Maybe you can use regmatches like below也许您可以使用regmatches的正则匹配

list2DF(lapply(df, function(x) unlist(regmatches(x, gregexpr("\\w+", x)))))

which gives这使

       col_A     col_B col_C
1 first_name      John Steve
2  last_name Appleseed  Jobs
3        age        23    33

Data数据

> dput(df)
structure(list(col_A = "first_name,last_name,age", col_B = "John,Appleseed,23,,,,",
    col_C = "Steve,Jobs, 33"), row.names = c(NA, -1L), class = "data.frame")

> df
                     col_A                 col_B          col_C
1 first_name,last_name,age John,Appleseed,23,,,, Steve,Jobs, 33

用不同数量的逗号分隔多列中的行

问题描述

2 个解决方案

解决方案1
1 2021-03-02 13:42:31

解决方案2
0 2021-03-02 14:12:26

用不同数量的逗号分隔多列中的行

问题描述

2 个解决方案

解决方案1 1 2021-03-02 13:42:31

解决方案2 0 2021-03-02 14:12:26

解决方案1
1 2021-03-02 13:42:31

解决方案2
0 2021-03-02 14:12:26