使用 data.table 按组删除特定列中具有前导缺失值的行

Question

I have a data.table like this:我有一个像这样的 data.table：

DT <- data.table(id = c(rep("a", 3), rep("b", 3)),
                 col1 = c(NA,1,2,NA,3,NA), col2 = c(NA,NA,5,NA,NA,NA))
   id col1 col2
1:  a   NA   NA
2:  a    1   NA
3:  a    2    5
4:  b   NA   NA
5:  b    3   NA
6:  b   NA   NA

For each id, I would like to remove rows with leading NA s in 'col1' using zoo::na.trim .对于每个 id，我想使用zoo::na.trim删除 'col1' 中带有前导NA的行。 Here's the result I'm expecting:这是我期待的结果：

   id col1 col2
1:  a    1   NA
2:  a    2    5
3:  b    3   NA
4:  b   NA   NA

Here's what I have tried so far.这是我到目前为止所尝试的。 This indeed removes leading NA in 'col1', but it omits 'col2' from the result:这确实删除了“col1”中的前导NA ，但它从结果中省略了“col2”：

DT[ , na.trim(col1), by = id]
   id V1
1:  a  1
2:  a  2
3:  b  3

This is also not working:这也不起作用：

DT[ , .SD[na.trim(col1)], by = id]
   id col1 col2
1:  a   NA   NA
2:  a    1   NA
3:  b   NA   NA

Answer 1

A possible solution without using the zoo -package: 不使用zoo -package的可能解决方案：

DT[DT[, .I[!!cumsum(!is.na(col1))], by = id]$V1]

you get: 你得到：

   id col1 col2
1:  a    1   NA
2:  a    2    5
3:  b    3   NA
4:  b   NA   NA

What this does: 这是做什么的：

With DT[, .I[!!cumsum(!is.na(col1))], id]$V1 you create a vector of rownumbers to keep. 使用DT[, .I[!!cumsum(!is.na(col1))], id]$V1您可以创建一个rownumbers矢量来保存。 By using !!cumsum(!is.na(col1)) you make sure that only the leading missing values of col1 are omitted. 通过使用!!cumsum(!is.na(col1))您可以确保只省略col1缺失值。
Next you use that vector to subset the data.table. 接下来，您使用该向量来对data.table进行子集化。
!!cumsum(!is.na(col1)) does the same as cumsum(!is.na(col1))!=0 . !!cumsum(!is.na(col1))和cumsum(!is.na(col1))!=0 。 Using !! 使用!! converts all number higher than zero to TRUE and all zeros to FALSE . 将所有大于零的数字转换为TRUE ，将所有零转换为FALSE 。
.I isn't necessarily needed, you can also use: DT[DT[, !!cumsum(!is.na(col1)), by = id]$V1] which subsets the data.table with a logical vector. .I不一定需要，你也可以使用： DT[DT[, !!cumsum(!is.na(col1)), by = id]$V1] ，它使用逻辑向量对data.table进行子集化。

Two alternatives with cummax by @lmo from the comments: 来自评论的cummax的两个替代品：cummax：

# alternative 1:
DT[DT[, !!(cummax(!is.na(col1))), by = id]$V1]

# alternative 2:
DT[as.logical(DT[, cummax(!is.na(col1)), by = id]$V1)]

Another alternative by @jogo: @jogo的另一个选择：

DT[, .SD[!!cumsum(!is.na(col1))], by = id]

Another alternative by @Frank: @Frank的另一个选择：

DT[, .SD[ rleid(col1) > 1L | !is.na(col1) ], by = id]

Answer 2

na.trim would be used like this with data.table. na.trim将与data.table一样使用。 See ?na.trim for more info on its arguments. 有关其参数的更多信息，请参阅?na.trim 。

DT[, na.trim(.SD, sides = "left", is.na = "all"), by = id]

giving: 赠送：

   id col1 col2
1:  a    1   NA
2:  a    2    5
3:  b    3   NA
4:  b   NA   NA

ADDED: 添加：

In comment poster clarified that only column 1 NAs should be operated on by na.trim . 在评论中，海报澄清说，只有第1列na.trim操作。 In that case append a column of row numbers, .I, and after involing na.trim subset using those row numbers. 在这种情况下，添加一列行号，.I，并在使用这些行号后使用na.trim子集。

DT[DT[, na.trim(data.table(col1, .I), "left"), by = id]$.I, ]

Answer 3

We can use 1:.N >= which.max(...) to subset the required rows我们可以使用1:.N >= which.max(...)来子集所需的行

> DT[, .SD[1:.N >= which.max(!is.na(col1))], id]
   id col1 col2
1:  a    1   NA
2:  a    2    5
3:  b    3   NA
4:  b   NA   NA

使用 data.table 按组删除特定列中具有前导缺失值的行

问题描述

3 个解决方案

解决方案1
4 已采纳 2017-05-17 13:11:39

解决方案2
4 2017-05-17 14:19:00

解决方案3
0 2022-09-09 22:29:14

使用 data.table 按组删除特定列中具有前导缺失值的行

问题描述

3 个解决方案

解决方案1 4 已采纳 2017-05-17 13:11:39

解决方案2 4 2017-05-17 14:19:00

解决方案3 0 2022-09-09 22:29:14

解决方案1
4 已采纳 2017-05-17 13:11:39

解决方案2
4 2017-05-17 14:19:00

解决方案3
0 2022-09-09 22:29:14