简体   繁体   English

data.table 中的子集行(按 id)

[英]subset row in data.table by id

I feel there must be a simple data.table solution for this problem.我觉得这个问题一定有一个简单的data.table解决方案。 I have the following data:我有以下数据:

library(data.table)

data <- data.table(
    id = c(1,1,1,2,2,3),
    income_year0 = c(NA,NA,100,NA,200,NA),
    income_year1 = c(NA, 105, NA, 202,NA, 255),
    income_year2 = c(102, NA,NA,NA,NA,NA)
)

I want to for each unique id to create a new column income that takes the value in income_year0 (if not NA ), otherwise the value in income_year1 (if not NA), otherwise the value in income_year2 , and if all are NA, then income is NA.我想为每个唯一 id 创建一个新列收入,该列采用income_year0中的值(如果不是NA ),否则为income_year1中的值(如果不是NA),否则为income_year2中的值,如果全部为NA,则为income是不适用。

That is, I want one row per id with one income column like so:也就是说,我希望每个 id 有一行,有一个收入列,如下所示:

data_want <- data.table(
    id = c(1,2,3),
    income = c(100,200,255)
)

You can unlist the columns and select the first non-NA value.您可以取消列出列和 select 第一个非 NA 值。

library(data.table)
data[, .(income = na.omit(unlist(.SD))[1]), id]

#   id income
#1:  1    100
#2:  2    200
#3:  3    255

Another option with as.matrix + is.na as.matrix + is.na的另一个选项

> data[, .(income = first(as.matrix(.SD)[!is.na(.SD)])), id]
   id income
1:  1    100
2:  2    200
3:  3    255

We can reshape wide-to-long , then get first non-na row:我们可以重塑wide-to-long ,然后得到第一个非 na 行:

melt(data, id.vars = "id", na.rm = TRUE)[, .(income = first(value)), id]
#    id income
# 1:  1    100
# 2:  2    200
# 3:  3    255

I know this answer has got nothing to do with data.table , however, since I would like to challenge myself to alternative solutions, here is another one that might be of interest to you:我知道这个答案与data.table ,但是,由于我想挑战自己的替代解决方案,这里还有一个您可能感兴趣的答案:

library(dplyr)
library(tidyr)

data %>%
  pivot_longer(-id, values_to = "income") %>%
  drop_na() %>%
  arrange(id, name) %>%
  group_by(id) %>%
  slice_head(n = 1) %>%
  select(- name)

# A tibble: 3 x 2
# Groups:   id [3]
     id income
  <dbl>  <dbl>
1     1    100
2     2    200
3     3    255

For the sake of completeness, here is a version which uses fcoalesce() :为了完整起见,这里是一个使用fcoalesce()的版本:

data[, .(income = do.call(fcoalesce, as.list(unlist(.SD)))), by = id]
 id income 1: 1 100 2: 2 200 3: 3 255

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM