简体   繁体   中英

Splitting a single column into multiple columns

I need to split a dataframe into multiple columns to make sure only one value is contained within each cell. The current dataframe looks like:

 projectid|  count | Name |  Type                         |   count   |
 .....................................................................
 ABC      |  211   | jack |abc(Apple, Orange, Water melon)|   Multiple|
 DBG      | 90     | jill | Plum                          |   single  |

The new dataframe should look like

 projectid|  count | Name |  Type          |  count |
 ....................................................
 ABC      |  211   | jack |  Apple         |Multiple|
 ABC      |  211   | jack |  Orange        |Multiple|
 ABC      |  211   | jack |  Water melon   |Multiple|
 DBG      |  90    | jill |  Plum          |single  |

I can split the single cell using regular expression based on the "()" and "," as a separator. However, I can't figure out how to populate multiple columns.

One way would be to extract everything between brackets and then use separate_rows

library(dplyr)
library(tidyr)

df %>%
  mutate(Type = sub(".*\\((.*)\\).*", "\\1", Type)) %>%
  separate_rows(Type, sep = ",")

#  projectid count Name         Type  count.1
#1       ABC   211 jack        Apple Multiple
#2       ABC   211 jack       Orange Multiple
#3       ABC   211 jack  Water melon Multiple
#4       DBG    90 jill         Plum   single

The main part is the regex to extract everything in between round brackets. Once we do that we can use any of the method from this link to separate comma-separated value in different rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM