I need to split a dataframe into multiple columns to make sure only one value is contained within each cell. The current dataframe looks like:
projectid| count | Name | Type | count |
.....................................................................
ABC | 211 | jack |abc(Apple, Orange, Water melon)| Multiple|
DBG | 90 | jill | Plum | single |
The new dataframe should look like
projectid| count | Name | Type | count |
....................................................
ABC | 211 | jack | Apple |Multiple|
ABC | 211 | jack | Orange |Multiple|
ABC | 211 | jack | Water melon |Multiple|
DBG | 90 | jill | Plum |single |
I can split the single cell using regular expression based on the "()" and "," as a separator. However, I can't figure out how to populate multiple columns.
One way would be to extract everything between brackets and then use separate_rows
library(dplyr)
library(tidyr)
df %>%
mutate(Type = sub(".*\\((.*)\\).*", "\\1", Type)) %>%
separate_rows(Type, sep = ",")
# projectid count Name Type count.1
#1 ABC 211 jack Apple Multiple
#2 ABC 211 jack Orange Multiple
#3 ABC 211 jack Water melon Multiple
#4 DBG 90 jill Plum single
The main part is the regex to extract everything in between round brackets. Once we do that we can use any of the method from this link to separate comma-separated value in different rows.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.