简体   繁体   English

用不同数量的逗号分隔多列中的行

[英]Separate rows in multiple columns with differing number of commas

I use R and I have a dataframe with 3 columns that contains values separed with ",".我使用R并且我有一个 dataframe 有 3 列,其中包含用“,”分隔的值。

Here's what it looks like:这是它的样子:

col_A可乐 col_B col_B col_C col_C
first_name,last_name,age名字,姓氏,年龄 John,Appleseed,23约翰,苹果籽,23 Steve,Jobs, 33史蒂夫,乔布斯,33

I want each value separed by a comma to create a new row for this value.我希望用逗号分隔的每个值为此值创建一个新行。 So it should look like this:所以它应该是这样的:

col_A可乐 col_B col_B col_C col_C
first_name John约翰 Steve史蒂夫
last_name Appleseed苹果籽 Jobs工作
age年龄 23 23 33 33

I have succeeded to perform it by doing like this:我通过这样做成功地执行了它:

col_A<- strsplit(df$col_A, split = ",")
col_B<- strsplit(df$col_B, split = ",")
col_C<- strsplit(df$col_C, split = ",")

df2<-data.frame(col_A= unlist(col_A),
                 col_B=unlist(col_B),
                col_C=unlist(col_C))

the problem is the table is messy: sometimes I have different number of commas, so when I use str split, I don't have the same number of elements in my lists and the data.frame() function will not work if there isn't the same number of elements.问题是表格很乱:有时我有不同数量的逗号,所以当我使用 str split 时,我的列表中没有相同数量的元素,并且 data.frame() function 将不起作用,如果有'不相同数量的元素。 To illustrate, sometimes I will have 3 elements separed by a comma in col_A, while there are 4 commas in col_B and col_C.为了说明,有时我会在 col_A 中用逗号分隔 3 个元素,而在 col_B 和 col_C 中有 4 个逗号。 And vice versa.反之亦然。 Here's an example:这是一个例子:

col_A可乐 col_B col_B col_C col_C
first_name,last_name,age名字,姓氏,年龄 John,Appleseed,23,约翰,Appleseed,23, Steve,Jobs, 33,史蒂夫,乔布斯,33 岁,

How can I do to get rid of this problem of formatting?我该怎么做才能摆脱这种格式问题? Adding commas before using str_split don't seem like a good solution to me.在使用 str_split 之前添加逗号对我来说似乎不是一个好的解决方案。

You can use str_remove() across al columns to get rid of the ending commas.您可以在所有列中使用str_remove()来去掉结尾的逗号。 Then you can separate_rows() to get what you want.然后你可以separate_rows()来得到你想要的。 This will not affect the output in rows without ending commas.这不会影响没有逗号结尾的行中的 output。

library(tidyverse)

df1 <- tibble::tribble(
                        ~col_A,              ~col_B,           ~col_C,
    "first_name,last_name,age", "John,Appleseed,23", "Steve,Jobs, 33"
    )

df2 <- tibble::tribble(
                        ~col_A,               ~col_B,            ~col_C,
    "first_name,last_name,age", "John,Appleseed,23,", "Steve,Jobs, 33,"
    )

df1 %>% 
    mutate(across(.fns = ~str_remove(.x, ",$"))) %>% 
    separate_rows(everything(), sep = ",")
#> # A tibble: 3 x 3
#>   col_A      col_B     col_C  
#>   <chr>      <chr>     <chr>  
#> 1 first_name John      "Steve"
#> 2 last_name  Appleseed "Jobs" 
#> 3 age        23        " 33"

df2 %>% 
    mutate(across(.fns = ~str_remove(.x, ",$"))) %>% 
    separate_rows(everything(), sep = ",")
#> # A tibble: 3 x 3
#>   col_A      col_B     col_C  
#>   <chr>      <chr>     <chr>  
#> 1 first_name John      "Steve"
#> 2 last_name  Appleseed "Jobs" 
#> 3 age        23        " 33"

Created on 2021-03-02 by the reprex package (v0.3.0)代表 package (v0.3.0) 于 2021 年 3 月 2 日创建

Maybe you can use regmatches like below也许您可以使用regmatches的正则匹配

list2DF(lapply(df, function(x) unlist(regmatches(x, gregexpr("\\w+", x)))))

which gives这使

       col_A     col_B col_C
1 first_name      John Steve
2  last_name Appleseed  Jobs
3        age        23    33

Data数据

> dput(df)
structure(list(col_A = "first_name,last_name,age", col_B = "John,Appleseed,23,,,,",
    col_C = "Steve,Jobs, 33"), row.names = c(NA, -1L), class = "data.frame")

> df
                     col_A                 col_B          col_C
1 first_name,last_name,age John,Appleseed,23,,,, Steve,Jobs, 33

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM