data.table 中的子集行（按 id）

Question

I feel there must be a simple data.table solution for this problem.我觉得这个问题一定有一个简单的data.table解决方案。 I have the following data:我有以下数据：

library(data.table)

data <- data.table(
    id = c(1,1,1,2,2,3),
    income_year0 = c(NA,NA,100,NA,200,NA),
    income_year1 = c(NA, 105, NA, 202,NA, 255),
    income_year2 = c(102, NA,NA,NA,NA,NA)
)

I want to for each unique id to create a new column income that takes the value in income_year0 (if not NA ), otherwise the value in income_year1 (if not NA), otherwise the value in income_year2 , and if all are NA, then income is NA.我想为每个唯一 id 创建一个新列收入，该列采用income_year0中的值（如果不是NA ），否则为income_year1中的值（如果不是NA），否则为income_year2中的值，如果全部为NA，则为income是不适用。

That is, I want one row per id with one income column like so:也就是说，我希望每个 id 有一行，有一个收入列，如下所示：

data_want <- data.table(
    id = c(1,2,3),
    income = c(100,200,255)
)

Answer 1

You can unlist the columns and select the first non-NA value.您可以取消列出列和 select 第一个非 NA 值。

library(data.table)
data[, .(income = na.omit(unlist(.SD))[1]), id]

#   id income
#1:  1    100
#2:  2    200
#3:  3    255

Answer 2

Another option with as.matrix + is.na as.matrix + is.na的另一个选项

> data[, .(income = first(as.matrix(.SD)[!is.na(.SD)])), id]
   id income
1:  1    100
2:  2    200
3:  3    255

Answer 3

We can reshape wide-to-long , then get first non-na row:我们可以重塑wide-to-long ，然后得到第一个非 na 行：

melt(data, id.vars = "id", na.rm = TRUE)[, .(income = first(value)), id]
#    id income
# 1:  1    100
# 2:  2    200
# 3:  3    255

Answer 4

I know this answer has got nothing to do with data.table , however, since I would like to challenge myself to alternative solutions, here is another one that might be of interest to you:我知道这个答案与data.table ，但是，由于我想挑战自己的替代解决方案，这里还有一个您可能感兴趣的答案：

library(dplyr)
library(tidyr)

data %>%
  pivot_longer(-id, values_to = "income") %>%
  drop_na() %>%
  arrange(id, name) %>%
  group_by(id) %>%
  slice_head(n = 1) %>%
  select(- name)

# A tibble: 3 x 2
# Groups:   id [3]
     id income
  <dbl>  <dbl>
1     1    100
2     2    200
3     3    255

Answer 5

For the sake of completeness, here is a version which uses fcoalesce() :为了完整起见，这里是一个使用fcoalesce()的版本：

data[, .(income = do.call(fcoalesce, as.list(unlist(.SD)))), by = id]

 id income 1: 1 100 2: 2 200 3: 3 255

data.table 中的子集行（按 id）

问题描述

5 个解决方案

解决方案1
7 已采纳 2021-03-26 09:01:39

解决方案2
5 2021-03-26 09:11:21

解决方案3
5 2021-03-26 09:23:46

解决方案4
1 2021-03-26 10:07:43

解决方案5
1 2021-03-26 15:24:07

data.table 中的子集行（按 id）

问题描述

5 个解决方案

解决方案1 7 已采纳 2021-03-26 09:01:39

解决方案2 5 2021-03-26 09:11:21

解决方案3 5 2021-03-26 09:23:46

解决方案4 1 2021-03-26 10:07:43

解决方案5 1 2021-03-26 15:24:07

解决方案1
7 已采纳 2021-03-26 09:01:39

解决方案2
5 2021-03-26 09:11:21

解决方案3
5 2021-03-26 09:23:46

解决方案4
1 2021-03-26 10:07:43

解决方案5
1 2021-03-26 15:24:07