简体   繁体   English

R如何在组合子集上“拆分”?

[英]R how to “split” on combined subsets?

Let's say I have a factor that has a bunch of levels in it. 假设我有一个含有大量关卡的因素。 And I have grouped some of those levels as represented by the grps variable. 我将这些级别中的一些归为grps变量所代表的。

I can use "split" to split my data frame, but is it possible to split the data frame so that the combined levels represented by grps are in the same split? 我可以使用“split”来分割我的数据框,但是是否可以拆分数据框,以便grps表示的组合级别处于相同的分割中?

set.seed(42)
fooLevels <- c(115,119,156,120,158,219)
foo <- fooLevels[round(runif(20, min=1, max=6))]
doo <- rnorm(20)
df <- data.frame(foo,doo)

grps <- c("{115}","{119}","{156}{120}{158}{219}")
splits <- split(df, f = df$foo)

I'd like the output to look like: 我希望输出看起来像:

>splits
$`{115}`
   foo         doo
8  115  0.08983289
9  115 -2.99309008
12 115  0.18523056

$`{119}`
   foo        doo
2  119 -0.7838389
7  119  0.6792888
13 119  0.5818237

$`{156}{120}{158}{219}`
   foo        doo
1  120  0.3219253
4  120  0.6428993
6  120  0.2765507
11 120 -0.3672346
18 120  1.0385061
20 120  0.7208782
3  156 1.5757275
10 156 0.2848830
17 156 0.3358481
5  158 0.08976065
16 158 1.30254263
19 158 0.92072857
14 219  1.3997368
15 219 -0.7272921

The order of the rows in the list(data.frame) is of no consequence. 列表中的行的顺序(data.frame)无关紧要。

You can set the names of the list and change the expression in str_split to whatever works for you. 您可以设置列表的名称,并将str_split的表达式更改为适合您的表达式。

lapply(
    strsplit(
        grps, 
        '}\\{|\\{|}'
    ), 
    function(x) {
        df[df$foo %in% x,]
    }
)
[[1]]
[1] foo doo
<0 rows> (or 0-length row.names)

[[2]]
   foo       doo
3  119 -1.388861
8  119 -2.656455
14 119  1.214675
18 119 -1.763163

[[3]]
   foo        doo
1  219  1.3048697
2  219  2.2866454
4  158 -0.2787888
5  120 -0.1333213
6  120  0.6359504
7  158 -0.2842529
9  120 -2.4404669
10 158  1.3201133
11 156 -0.3066386
12 158 -1.7813084
13 219 -0.1719174
15 156  1.8951935
16 219 -0.4304691
17 219 -0.2572694
19 156  0.4600974
20 120 -0.6399949

If your grp object does not already exist, you could do something like this 如果您的grp对象尚不存在,您可以执行类似的操作

x = split(df, df$foo)
y = Reduce(`rbind`, x[names(x)> 120])
o = c(x[names(x) <= 120], setNames(list(y), paste(unique(y$foo), collapse = ' ')))

#> o
#$`119`
#   foo       doo
#3  119 -1.388861
#8  119 -2.656455
#14 119  1.214675
#18 119 -1.763163

#$`120`
#   foo        doo
#5  120 -0.1333213
#6  120  0.6359504
#9  120 -2.4404669
#20 120 -0.6399949

#$`156 158 219`
#   foo        doo
#11 156 -0.3066386
#15 156  1.8951935
#19 156  0.4600974
#4  158 -0.2787888
#7  158 -0.2842529
#10 158  1.3201133
#12 158 -1.7813084
#1  219  1.3048697
#2  219  2.2866454
#13 219 -0.1719174
#16 219 -0.4304691
#17 219 -0.2572694

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM