I have a dataset arranged such that the data is stored as a list of multiple observations within each 'cell'. See below:
partID | Var 1 | Var 2
1 | 1,2,3 | 4,5,6
2 | 7,8,9 | 1,2,3
I would like to get the data in a format more like this:
partID | Var 1 | Var 2
1 | 1 | 4
1 | 2 | 5
1 | 3 | 6
I've been trying various combinations of melt
, unlist
, and data.table
but I haven't had much luck applying the various ways to expand the lists while simultaneously preserving multiple columns and their names. Am I reduced to looping through the dataset and binding the columns together?
If for each row, the cells have the same number of entries and they are strings, then this is what you can do, using data.table.
require(data.table)
DT<-data.table(partID=c(1,2),Var1=c("1,2,3","7,8,9"),Var2=c("4,5,6","1,2,3"))
DT2<-DT[,list(Var1=unlist(strsplit(Var1,",")),Var2=unlist(strsplit(Var2,","))),by=partID]
You use strsplit()
to split the strings by the commas. You use unlist()
to make the entries into a vector, not a list.
If, on the other hand, each cell is already a list, then all you need to do is unlist()
.
require(data.table)
DT3<-data.table(partID=c(1,2),Var1=list(c(1,2,3),c(7,8,9)),Var2=list(c(4,5,6),c(1,2,3)))
DT4<-DT3[,list(Var1=unlist(Var1),Var2=unlist(Var2)),by=partID]
Either way, you get this:
partID Var1 Var2
1 1 4
1 2 5
1 3 6
2 7 1
2 8 2
2 9 3
We can do this easily with cSplit
library(splitstackshape)
cSplit(DT, c("Var1", "Var2"), ",", "long")
# partID Var1 Var2
#1: 1 1 4
#2: 1 2 5
#3: 1 3 6
#4: 2 7 1
#5: 2 8 2
#6: 2 9 3
DT<-data.frame(partID=c(1,2),Var1=c("1,2,3","7,8,9"),Var2=c("4,5,6","1,2,3"))
The separate_rows()
function in tidyr
is the boss for observations with multiple delimited values...
# create data
library(tidyverse)
d <- data_frame(
partID = c(1, 2),
Var1 = c("1,2,3", "7,8,9"),
Var2 = c("4,5,6","1,2,3")
)
d
# # A tibble: 2 x 3
# partID Var1 Var2
# <dbl> <chr> <chr>
# 1 1 1,2,3 4,5,6
# 2 2 7,8,9 1,2,3
# tidy data
separate_rows(d, Var1, Var2, convert = TRUE)
# # A tibble: 6 x 3
# partID Var1 Var2
# <dbl> <int> <int>
# 1 1 1 4
# 2 1 2 5
# 3 1 3 6
# 4 2 7 1
# 5 2 8 2
# 6 2 9 3
You can also use dplyr
and tidyr
which provides the unnest
function to expand the columns:
library(dplyr); library(tidyr);
df %>% mutate(Var.1 = strsplit(Var.1, ","), Var.2 = strsplit(Var.2, ",")) %>% unnest()
Source: local data frame [6 x 3]
partID Var.1 Var.2
(dbl) (chr) (chr)
1 1 1 4
2 1 2 5
3 1 3 6
4 2 7 1
5 2 8 2
6 2 9 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.