简体   繁体   中英

split data.frame into list based on row values across columns

I would like to split a data.frame into a list based on row values/characters across all columns of the data.frame.

I wrote lists of data.frames to file using write.list {erer}

So now when I read them in again, they look like this:

dummy data

set.seed(1)
df <- cbind(data.frame(col1=c(sample(LETTERS, 4),"col1",sample(LETTERS, 7))),
            data.frame(col2=c(sample(LETTERS, 4),"col2",sample(LETTERS, 7))),
            data.frame(col3=c(sample(LETTERS, 4),"col3",sample(LETTERS, 7))))
   col1 col2 col3
1     G    E    Q
2     J    R    D
3     N    J    G
4     U    Y    I
5  col1 col2 col3
6     F    M    A
7     W    R    J
8     Y    X    U
9     P    I    H
10    N    Y    K
11    B    T    M
12    E    E    Y

And I would like to split into lists by c("col1","col2","col3") producing

[[1]]
       col1 col2 col3
    1     G    E    Q
    2     J    R    D
    3     N    J    G
    4     U    Y    I

[[2]]     
       col1 col2 col3
    1     F    M    A
    2     W    R    J
    3     Y    X    U
    4     P    I    H
    5     N    Y    K
    6     B    T    M
    7     E    E    Y

Feels like it should be straightforward using split , but my attempts so far have failed. Also, as you see, I can't split by a certain row interval.

Any pointers would be highly appreciated, thanks!

Try

lapply(split(d1, cumsum(grepl(names(d1)[1], d1$col1))), function(x) x[!grepl(names(d1)[1], x$col1),])
#$`0`
#  col1 col2 col3
#1    G    E    Q
#2    J    R    D
#3    N    J    G
#4    U    Y    I

#$`1`
#   col1 col2 col3
#6     F    M    A
#7     W    R    J
#8     Y    X    U
#9     P    I    H
#10    N    Y    K
#11    B    T    M
#12    E    E    Y

This should be general, if you want to split if a line is exactly like the colnames :

dfSplit<-split(df,cumsum(Reduce("&",Map("==",df,colnames(df)))))
for (i in 2:length(dfSplit)) dfSplit[[i]]<-dfSplit[[i]][-1,]

The second line can be written a little more R-style as @DavidArenburg suggested in the comments.

dfSplit[-1] <- lapply(dfSplit[-1], function(x) x[-1, ])

It has also the added benefit of doing nothing if dfSplit has length 1 (opposite to my original second line, which would throw an error).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM