简体   繁体   中英

Split delimited strings in a column and insert as new rows for 2 columns

I have a dataframe look like this.

col1    col2        col3
a        1,2,3      A,B,C 
b        ["1","2"]  A,C
c        4          D,E

Desired output:

col1  col2 col3
a      1    A
a      2    B
a      3    C 
b      1    A
b      2    C
c      4    D
c      4    E

I have tried this

df %>% 
  mutate((col2 = strsplit(as.character(col2), ","))&(col3 = strsplit(as.character(col3), ","))) %>% 
  unnest((col2)&(col3))

It didn't work. Any help will be appreciated.

Any other way to do this apart from dplyr developer version????

Using dplyr and tidyr , we can do(see note below):

 df %>% 
   mutate(across(2:3,~gsub('\\[|\\]|"',"",as.character(.)))) %>% 
   tidyr::separate_rows(2:3, sep=",") # need to automate this, maybe use selectors? or -1?
  col1 col2 col3
1    a    1    A
2    a    2    B
3    a    3    C
4    b    1    A
5    b    2    C
6    c    4    D
7    c    4    E

NOTE :

  1. I'm using the developer version of dplyr(0.8.9.9000) . You can use mutate_at instead of mutate(across...)
  2. You need to automate the manipulation, either use -1 or another selector instead of 2:3. This is purely for this example.

Data :

df<- structure(list(col1 = structure(1:3, .Label = c("a", "b", "c"
), class = "factor"), col2 = structure(c(2L, 1L, 3L), .Label = c("[\"1\",\"2\"]", 
"1,2,3", "4"), class = "factor"), col3 = structure(1:3, .Label = c("A,B,C", 
"A,C", "D,E"), class = "factor")), class = "data.frame", row.names = c(NA, 
-3L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM