[英]subset row in data.table by id
I feel there must be a simple data.table
solution for this problem.我觉得这个问题一定有一个简单的data.table
解决方案。 I have the following data:我有以下数据:
library(data.table)
data <- data.table(
id = c(1,1,1,2,2,3),
income_year0 = c(NA,NA,100,NA,200,NA),
income_year1 = c(NA, 105, NA, 202,NA, 255),
income_year2 = c(102, NA,NA,NA,NA,NA)
)
I want to for each unique id to create a new column income that takes the value in income_year0
(if not NA
), otherwise the value in income_year1
(if not NA), otherwise the value in income_year2
, and if all are NA, then income is NA.我想为每个唯一 id 创建一个新列收入,该列采用income_year0
中的值(如果不是NA
),否则为income_year1
中的值(如果不是NA),否则为income_year2
中的值,如果全部为NA,则为income是不适用。
That is, I want one row per id with one income column like so:也就是说,我希望每个 id 有一行,有一个收入列,如下所示:
data_want <- data.table(
id = c(1,2,3),
income = c(100,200,255)
)
You can unlist the columns and select the first non-NA value.您可以取消列出列和 select 第一个非 NA 值。
library(data.table)
data[, .(income = na.omit(unlist(.SD))[1]), id]
# id income
#1: 1 100
#2: 2 200
#3: 3 255
Another option with as.matrix
+ is.na
as.matrix
+ is.na
的另一个选项
> data[, .(income = first(as.matrix(.SD)[!is.na(.SD)])), id]
id income
1: 1 100
2: 2 200
3: 3 255
We can reshape wide-to-long , then get first non-na row:我们可以重塑wide-to-long ,然后得到第一个非 na 行:
melt(data, id.vars = "id", na.rm = TRUE)[, .(income = first(value)), id]
# id income
# 1: 1 100
# 2: 2 200
# 3: 3 255
I know this answer has got nothing to do with data.table
, however, since I would like to challenge myself to alternative solutions, here is another one that might be of interest to you:我知道这个答案与data.table
,但是,由于我想挑战自己的替代解决方案,这里还有一个您可能感兴趣的答案:
library(dplyr)
library(tidyr)
data %>%
pivot_longer(-id, values_to = "income") %>%
drop_na() %>%
arrange(id, name) %>%
group_by(id) %>%
slice_head(n = 1) %>%
select(- name)
# A tibble: 3 x 2
# Groups: id [3]
id income
<dbl> <dbl>
1 1 100
2 2 200
3 3 255
For the sake of completeness, here is a version which uses fcoalesce()
:为了完整起见,这里是一个使用fcoalesce()
的版本:
data[, .(income = do.call(fcoalesce, as.list(unlist(.SD)))), by = id]
id income 1: 1 100 2: 2 200 3: 3 255
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.