简体   繁体   中英

R-Tidying multiple columns containing data in lists

I have a dataset arranged such that the data is stored as a list of multiple observations within each 'cell'. See below:

partID | Var 1 | Var 2
1      | 1,2,3 | 4,5,6
2      | 7,8,9 | 1,2,3

I would like to get the data in a format more like this:

partID | Var 1 | Var 2
1      | 1     | 4
1      | 2     | 5
1      | 3     | 6

I've been trying various combinations of melt , unlist , and data.table but I haven't had much luck applying the various ways to expand the lists while simultaneously preserving multiple columns and their names. Am I reduced to looping through the dataset and binding the columns together?

If for each row, the cells have the same number of entries and they are strings, then this is what you can do, using data.table.

require(data.table)
DT<-data.table(partID=c(1,2),Var1=c("1,2,3","7,8,9"),Var2=c("4,5,6","1,2,3"))

DT2<-DT[,list(Var1=unlist(strsplit(Var1,",")),Var2=unlist(strsplit(Var2,","))),by=partID]

You use strsplit() to split the strings by the commas. You use unlist() to make the entries into a vector, not a list.

If, on the other hand, each cell is already a list, then all you need to do is unlist() .

require(data.table)
DT3<-data.table(partID=c(1,2),Var1=list(c(1,2,3),c(7,8,9)),Var2=list(c(4,5,6),c(1,2,3)))

DT4<-DT3[,list(Var1=unlist(Var1),Var2=unlist(Var2)),by=partID]

Either way, you get this:

   partID Var1 Var2
      1    1    4
      1    2    5
      1    3    6
      2    7    1
      2    8    2
      2    9    3

We can do this easily with cSplit

library(splitstackshape)
cSplit(DT, c("Var1", "Var2"), ",", "long")
#    partID Var1 Var2
#1:      1    1    4
#2:      1    2    5
#3:      1    3    6
#4:      2    7    1
#5:      2    8    2
#6:      2    9    3

data

DT<-data.frame(partID=c(1,2),Var1=c("1,2,3","7,8,9"),Var2=c("4,5,6","1,2,3"))

The separate_rows() function in tidyr is the boss for observations with multiple delimited values...

# create data
library(tidyverse)
d <- data_frame(
  partID = c(1, 2),
  Var1 = c("1,2,3", "7,8,9"),
  Var2 = c("4,5,6","1,2,3")
)
d
# # A tibble: 2 x 3
#   partID  Var1  Var2
#    <dbl> <chr> <chr>
# 1      1 1,2,3 4,5,6
# 2      2 7,8,9 1,2,3

# tidy data
separate_rows(d, Var1, Var2, convert = TRUE)
# # A tibble: 6 x 3
#   partID  Var1  Var2
#    <dbl> <int> <int>
# 1      1     1     4
# 2      1     2     5
# 3      1     3     6
# 4      2     7     1
# 5      2     8     2
# 6      2     9     3

You can also use dplyr and tidyr which provides the unnest function to expand the columns:

library(dplyr); library(tidyr);
df %>% mutate(Var.1 = strsplit(Var.1, ","), Var.2 = strsplit(Var.2, ",")) %>% unnest()

Source: local data frame [6 x 3]

  partID Var.1 Var.2
   (dbl) (chr) (chr)
1      1     1     4
2      1     2     5
3      1     3     6
4      2     7     1
5      2     8     2
6      2     9     3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM