简体   繁体   English

按值拆分 data.frame

[英]Split data.frame by value

how can I split the following data.frame我如何拆分以下 data.frame

df <- data.frame(var1 = c("a", 1, 2, 3, "a", 1, 2, 3, 4, 5, 6, "a", 1, 2), var2 = 1:14)

into lists of / groups of进入/组的列表

a 1
1 2
2 3
3 4

a 5
1 6
2 7
3 8
4 9
5 10
6 11

a 12
1 13
2 14

So basically, value "a" in column 1 is the tag / identifier I want to split the data frame on.所以基本上,第 1 列中的值“a”是我想要拆分数据框的标签/标识符。 I know about the split function but that means I have to add another column and since, as can be seen from my example, the size of the groups can vary I do not know how to automatically create such a dummy column to fit my needs.我知道 split 函数,但这意味着我必须添加另一列,因为从我的示例中可以看出,组的大小可能会有所不同,我不知道如何自动创建这样一个虚拟列来满足我的需要。

Any ideas on that?对此有何想法?

Cheers,干杯,

Sven斯文

You could find which values of the indexing vector equal "a", then create a grouping variable based on that and then use split.您可以找到索引向量的哪些值等于“a”,然后基于该值创建一个分组变量,然后使用 split。

df[,1] == "a"
# [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
#[13] FALSE FALSE
cumsum(df[,1] == "a")
# [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3
split(df, cumsum(df[,1] == "a"))
#$`1`
#  var1 var2
#1    a    1
#2    1    2
#3    2    3
#4    3    4
#
#$`2`
#   var1 var2
#5     a    5
#6     1    6
#7     2    7
#8     3    8
#9     4    9
#10    5   10
#11    6   11
#
#$`3`
#   var1 var2
#12    a   12
#13    1   13
#14    2   14

You could create a loop that loops through the entire first column of the data frame and saves the positions of non-numeric characters in a vector.您可以创建一个循环,循环遍历数据框的整个第一列,并将非数字字符的位置保存在向量中。 Thus, you'd have something like:因此,你会有类似的东西:

data <- df$var1 #this gives you a vector of the values you'll sort through

positions <- c()

for (i in seq(1:length(data))){
    if (is.numeric(data[i]) == TRUE) {
        #nothing
    }
    else positions <- append(positions, i) #saves the positions of the non-numeric characters
}

With those positions, you shouldn't have a problem accessing splitting up the data frame from there.有了这些位置,从那里访问拆分数据框应该没有问题。 It's just a matter of using sequences between the values in the position vector.这只是在位置向量中的值之间使用序列的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM