简体   繁体   中英

Using tidyr::separate with quoted values containing delimiter

I have a fairly straight forward question & i'm hoping there's a very simple answer that I just haven't stumbled upon yet.

I'm attempting to use tidyr::separate() to create two columns within a data.frame from a single character string column (using a comma as a delimiter). The issue is that the data has multiple commas; however, there are quotes around the left-most column. Is there a way to separate this value into two columns while respecting the contents within the quotes?

#trying to re-create the issue
band_members <- data.frame(col = paste0('"Paul,George,John,Ringo','"',',','Beatles'))


#trying to separate
new_dat <- band_members %>% tidyr::separate(col = col,into = c('members','band'),sep = ',')

 members    band  
--------- --------
  "Paul    George 

^ This is not ideal. What I'd like (below):

         members             band   
-------------------------- ---------
 "Paul,George,John,Ringo"   Beatles 

Any help would be greatly appreciated!

If format is always like "members",band , using sep = '",' instead of "," may helps.

band_members %>% 
  tidyr::separate(col = col,into = c('members','band'),sep = '",') %>%
  mutate(members = paste0(members, "\""))

                   members    band
1 "Paul,George,John,Ringo" Beatles

You can use tidyr::extract() rather than separate , and then it's just a case of finding the right regex:

band_members %>% 
  extract(col, c("members", "band"), "^\"(.*?)\",(.*?)$")


                 members    band
1 Paul,George,John,Ringo Beatles

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM