[英]Create an summarizing variable for multiple columns in data.table r
I have the following data.table
我有以下
data.table
dt <- data.table(id=c(1,2,2,2,3,3,4),
date=c("2019-09-13", "2018-12-06", "2017-12-14", "2018-02-08", "2015-12-06", "2012-12-14", "2011-02-08"),
variable_1=c("a","b",NA,NA,"b","c",NA),
variable_2=c(NA,NA,"a",NA,"a","c",NA),
variable_3=c(NA,NA,NA,"b","c","c",NA))
dt
id date variable_1 variable_2 variable_3
1: 1 2019-09-13 a <NA> <NA>
2: 2 2018-12-06 b <NA> <NA>
3: 2 2017-12-14 <NA> a <NA>
4: 2 2018-02-08 <NA> <NA> b
5: 3 2015-12-06 b a c
6: 3 2012-12-14 c c c
7: 4 2011-02-08 <NA> <NA> <NA>
I want to create a variable y
that is summarizing all the columns.我想创建一个汇总所有列的变量
y
。 Everything that has one .is.na()
among the variable should be 0
.变量中有一个
.is.na()
的所有东西都应该是0
。 Every row that has only is.na
among all the variables should be 1
.所有变量中只有
is.na
的每一行都应该是1
。 Like this:像这样:
id date variable_1 variable_2 variable_3 y
1: 1 2019-09-13 a <NA> <NA> 0
2: 2 2018-12-06 b <NA> <NA> 0
3: 2 2017-12-14 <NA> a <NA> 0
4: 2 2018-02-08 <NA> <NA> b 0
5: 3 2015-12-06 b a c 0
6: 3 2012-12-14 c c c 0
7: 4 2011-02-08 <NA> <NA> <NA> 1
In the original data.table
I have 22 variables that I am looking at among 830 total variables.在原始
data.table
中,我在 830 个总变量中查看了 22 个变量。 So I would prefer not to look for every Variable with _1
to _22
separately.因此,我不希望分别查找具有
_1
到_22
的每个变量。 Is there a way in data.table
? data.table
有办法吗?
dt[, y := +(rowSums(!is.na(.SD)) == 0L), .SDcols = patterns("^variable_")]
# id date variable_1 variable_2 variable_3 y
# 1: 1 2019-09-13 a <NA> <NA> 0
# 2: 2 2018-12-06 b <NA> <NA> 0
# 3: 2 2017-12-14 <NA> a <NA> 0
# 4: 2 2018-02-08 <NA> <NA> b 0
# 5: 3 2015-12-06 b a c 0
# 6: 3 2012-12-14 c c c 0
# 7: 4 2011-02-08 <NA> <NA> <NA> 1
Walk-through:演练:
.SDcols=patterns(...)
defines the columns to be processed as .SD
in the j
component. .SDcols=patterns(...)
将要处理的列定义为j
组件中的.SD
。 This doesn't involve removing / selecting columns for the output, just the ones that will be referenced internally..is.na(.SD)
returns a logical
matrix
, same dims as .SD
, indicating if its value is NA
. .is.na(.SD)
返回一个logical
matrix
,与.SD
相同,表示其值是否为NA
。rowSums(...)
returns the count of non- NA
s in the row. rowSums(...)
返回行中非NA
的计数。NA
values in a row", we're able to not care about the number of columns being processed;NA
值的数量”的反转逻辑,我们可以不关心正在处理的列数; this is what allows me to use == 0L
.== 0L
的原因。+(...)
is a shorthand trick for converting logical
to 0:1
+(...)
是将logical
转换为0:1
的速记技巧
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.