简体   繁体   English

在R中以NA分割单列数据帧

[英]Split single column data frame in R at NA

I have a large data set that I want to split into individual units. 我有一个大型数据集,我想分成单独的单位。 Right now, these unit barriers are marked by NA, but how do I split them? 现在,这些单位障碍以NA标记,但我该如何拆分? Sample set: 样品组:

df=matrix(c(1,2,3,4,NA,6,7,8,NA,10,11,12),ncol=1,byrow=TRUE)

gives us 给我们

       [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
 [5,]   NA
 [6,]    6
 [7,]    7
 [8,]    8
 [9,]   NA
[10,]    10
[11,]    11
[12,]    12

I would like these three stored in separate variables, such that 我希望这三个存储在单独的变量中,这样

a
      [,1]
 [1,]    1
 [2,]    2
 [3,]    3
 [4,]    4
b
      [,1]
 [1,]    6
 [2,]    7
 [3,]    8
c
      [,1]
 [1,]    10
 [2,]    11
 [3,]    12

Does this make sense? 这有意义吗? Thanks. 谢谢。

One line solution using split and cumsum after removing missing values: 删除缺失值后使用splitcumsum一行解决方案:

 split(df[!is.na(df)],cumsum(is.na(df))[!is.na(df)])
$`0`
[1] 1 2 3 4

$`1`
[1] 6 7 8

$`2`
[1] 10 11 12

I wasn't sure if by "data set" you meant a true matrix or a data.frame. 我不确定“数据集”是否表示真正的矩阵或data.frame。 Here's a data.frame example, a matrix would be similar 这是一个data.frame示例,矩阵类似

df <- data.frame(a=c(1,2,3,4,NA,6,7,8,NA,10,11,12))
gg <- ifelse(is.na(df$a),NA, cumsum(is.na(df$a)))
split(df, gg)

We just use gg as a new variable to count up every time we see an NA so we can divide the sections into groups. 我们只是使用gg作为新变量来计算每次看到NA时的数量,这样我们就可以将这些部分分成几组。 We also retain the NA values to drop them for the splitting. 我们还保留NA值以丢弃它们以进行拆分。 And finally split() with this new categorical variable does what we want. 最后用这个新的分类变量split()做我们想要的。

$`0`
  a
1 1
2 2
3 3
4 4

$`1`
  a
6 6
7 7
8 8

$`2`
    a
10 10
11 11
12 12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM