简体   繁体   中英

R how to “split” on combined subsets?

Let's say I have a factor that has a bunch of levels in it. And I have grouped some of those levels as represented by the grps variable.

I can use "split" to split my data frame, but is it possible to split the data frame so that the combined levels represented by grps are in the same split?

set.seed(42)
fooLevels <- c(115,119,156,120,158,219)
foo <- fooLevels[round(runif(20, min=1, max=6))]
doo <- rnorm(20)
df <- data.frame(foo,doo)

grps <- c("{115}","{119}","{156}{120}{158}{219}")
splits <- split(df, f = df$foo)

I'd like the output to look like:

>splits
$`{115}`
   foo         doo
8  115  0.08983289
9  115 -2.99309008
12 115  0.18523056

$`{119}`
   foo        doo
2  119 -0.7838389
7  119  0.6792888
13 119  0.5818237

$`{156}{120}{158}{219}`
   foo        doo
1  120  0.3219253
4  120  0.6428993
6  120  0.2765507
11 120 -0.3672346
18 120  1.0385061
20 120  0.7208782
3  156 1.5757275
10 156 0.2848830
17 156 0.3358481
5  158 0.08976065
16 158 1.30254263
19 158 0.92072857
14 219  1.3997368
15 219 -0.7272921

The order of the rows in the list(data.frame) is of no consequence.

You can set the names of the list and change the expression in str_split to whatever works for you.

lapply(
    strsplit(
        grps, 
        '}\\{|\\{|}'
    ), 
    function(x) {
        df[df$foo %in% x,]
    }
)
[[1]]
[1] foo doo
<0 rows> (or 0-length row.names)

[[2]]
   foo       doo
3  119 -1.388861
8  119 -2.656455
14 119  1.214675
18 119 -1.763163

[[3]]
   foo        doo
1  219  1.3048697
2  219  2.2866454
4  158 -0.2787888
5  120 -0.1333213
6  120  0.6359504
7  158 -0.2842529
9  120 -2.4404669
10 158  1.3201133
11 156 -0.3066386
12 158 -1.7813084
13 219 -0.1719174
15 156  1.8951935
16 219 -0.4304691
17 219 -0.2572694
19 156  0.4600974
20 120 -0.6399949

If your grp object does not already exist, you could do something like this

x = split(df, df$foo)
y = Reduce(`rbind`, x[names(x)> 120])
o = c(x[names(x) <= 120], setNames(list(y), paste(unique(y$foo), collapse = ' ')))

#> o
#$`119`
#   foo       doo
#3  119 -1.388861
#8  119 -2.656455
#14 119  1.214675
#18 119 -1.763163

#$`120`
#   foo        doo
#5  120 -0.1333213
#6  120  0.6359504
#9  120 -2.4404669
#20 120 -0.6399949

#$`156 158 219`
#   foo        doo
#11 156 -0.3066386
#15 156  1.8951935
#19 156  0.4600974
#4  158 -0.2787888
#7  158 -0.2842529
#10 158  1.3201133
#12 158 -1.7813084
#1  219  1.3048697
#2  219  2.2866454
#13 219 -0.1719174
#16 219 -0.4304691
#17 219 -0.2572694

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM