[英]replacing all NA with a 0 in data.table in R
I have a data.table
with many columns.我有一个
data.table
有很多列。 There are 4 columns where I want to replace NA
with an 0.有 4 列我想用 0 替换
NA
。
I have a working solution:我有一个可行的解决方案:
claimsMonthly[is.na(claim9month),claim9month := 0
][is.na(claim10month),claim10month := 0
][is.na(claim11month),claim11month := 0
][is.na(claim12month),claim12month := 0]
However, this is quite repetitive and I wanted to reduce this by using an loop (not sure if that is the smartest idea though?):但是,这是非常重复的,我想通过使用循环来减少这种情况(虽然不确定这是否是最聪明的想法?):
for (i in 9:12){
claimsMonthly[is.na(paste0("claim", i, "month")), paste0("claim", i, "month") := 0]
}
When I run this loop nothing happens.当我运行这个循环时,什么也没有发生。 I guess it is due to the pact that the
paste0()
returns "claim12month"
, so I get in.na("claim12month")
.我想这是由于
paste0()
返回"claim12month"
的协议,所以我得到in.na("claim12month")
。 The result of that is FALSE
despite the fact that there are NA
in my data.尽管我的数据中有
NA
,但结果是FALSE
。 I guess this has something to do with the quotes?我想这与引号有关吗?
This is not the first time i have issues with using paste0()
or running loops with data.table
, so I must be missing something important here.这不是我第一次遇到使用
paste0()
或使用data.table
运行循环的问题,所以我必须在这里遗漏一些重要的东西。
Any ideas how to fix this?任何想法如何解决这一问题?
We can either specify the .SDcols
with the names of the columns ('nm1'), loop over the .SD
(Subset of Data.table) and assign the NA to 0 ( replace_na
from tidyr
)我们可以使用列名('nm1')指定
.SDcols
,遍历.SD
的子集)并将 NA 分配为 0( replace_na
from tidyr
)
library(data.table)
library(tidyr)
nm1 <- paste0("claim", 9:12, "month")
setDT(claimsMonthly)[, (nm1) := lapply(.SD, replace_na, 0), .SDcols = nm1]
Or as @jangorecki mentioned in the comments, nafill
from data.table
would be better或者正如评论中提到的
nafill
,来自 data.table 的data.table
会更好
setDT(claimsMonthly)[, (nm1) := lapply(.SD, nafill, fill = 0), .SDcols = nm1]
or using a loop with set
, assign the columns of interest with 0 based on the NA values in each column by specifying the i
(for row index) and j
for column index/name或使用带有
set
的循环,通过指定i
(用于行索引)和j
为列索引/名称,根据每列中的 NA 值将感兴趣的列分配为 0
for(j in nm1){
set(claimsMonthly, i = which(is.na(claimsMonthly[[j]])), j =j, value = 0)
}
Or with setnafill
或使用
setnafill
setnafill(claimsMonthly, cols = nm1, fill = 0)
You can use:您可以使用:
claimsMonthly[, 9:12][is.na(claimsMonthly[, 9:12])] <- 0
Also you can use variable names:您也可以使用变量名:
claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")][is.na(claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")])] <- 0
Or even better you can use a vector with all variables with "claimXXmonth" pattern.或者更好的是,您可以使用带有“claimXXmonth”模式的所有变量的向量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.