简体   繁体   中英

str_split for column values and then turn it into vector in R

This is somewhat similar to my previous question Split data frame string column and count items. (dplyr and R) ,but what I would like to know is how to split column items and turn the return value into vector instead of list.

dat <- data.frame(ID = c("A", "B"),
                  gene_ids = c(

tmp <- dat %>% mutate(ids = str_split(gene_ids, "/")) 
#> [[1]]
#> [1] "101739" "20382"  "13006"  "212377" "114714" "66622"  "140917"
#> [[2]]
#> [1] "75717"  "103573" "14852"  "18141"  "12567"  "26429"  "20842"  "17975" 
#> [9] "12545"
#>   ID                                               gene_ids
#> 1  A          101739/20382/13006/212377/114714/66622/140917
#> 2  B 75717/103573/14852/18141/12567/26429/20842/17975/12545
#>                                                              ids
#> 1            101739, 20382, 13006, 212377, 114714, 66622, 140917
#> 2 75717, 103573, 14852, 18141, 12567, 26429, 20842, 17975, 12545

dat %>% mutate(please_be_vector = str_split(gene_ids, "/") %>% unlist())
#> Error: Problem with `mutate()` input `please_be_vector`.
#> x Input `please_be_vector` can't be recycled to size 2.
#> ℹ Input `please_be_vector` is `str_split(gene_ids, "/") %>% unlist()`.
#> ℹ Input `please_be_vector` must be size 2 or 1, not 16.

I would like tmp$ids to be vector instead of list like the below. Is this possible using dplyr?

"101739" "20382"  "13006"  "212377" "114714" "66622"  "140917"
"75717"  "103573" "14852"  "18141"  "12567"  "26429"  "20842"  "17975" "12545"

Is it possible?

Update: Maybe this one:

dat %>% 
  separate_rows(gene_ids) %>% 
  arrange(ID, gene_ids) %>% 
  group_by(ID) %>% 
  mutate(id = row_number()) %>% 
    names_from = ID,
    values_from = gene_ids
  ) %>% 
  pull(A) # alternative pull(B)
[1] "101739" "114714" "13006"  "140917" "20382"  "212377" "66622"  NA      
[9] NA   

First answer:


dat %>% mutate(ids = str_split(gene_ids, "/")) %>% 
  unnest(ids) %>% 


 [1] "101739" "20382"  "13006"  "212377" "114714" "66622"  "140917" "75717" 
 [9] "103573" "14852"  "18141"  "12567"  "26429"  "20842"  "17975"  "12545" 


temp <- dat %>% mutate(ids = str_split(gene_ids, "/")) 


[1] "101739" "20382"  "13006"  "212377" "114714" "66622"  "140917" "75717" 
 [9] "103573" "14852"  "18141"  "12567"  "26429"  "20842"  "17975"  "12545

tmp$ids is a list of two character vectors, one for each row of the data. When you subset a list using [ , you get a list. Instead use [[ :

> tmp$ids[[1]]
[1] "101739" "20382"  "13006"  "212377" "114714" "66622"  "140917"

A good resource to understand this better is the chapter on subsetting in Advanced R .

We can simply use unclass on the nested data, to have a list of vectors


dat %>% separate_rows(everything(), sep = "/")%>%
        pivot_wider(names_from = ID, values_from = gene_ids)%>%

[1] "101739" "20382"  "13006"  "212377" "114714" "66622"  "140917"

[1] "75717"  "103573" "14852"  "18141"  "12567"  "26429"  "20842"  "17975"  "12545" 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM