简体   繁体   English

在 R 中将所有 NA 替换为 data.table 中的 0

[英]replacing all NA with a 0 in data.table in R

I have a data.table with many columns.我有一个data.table有很多列。 There are 4 columns where I want to replace NA with an 0.有 4 列我想用 0 替换NA

I have a working solution:我有一个可行的解决方案:

  claimsMonthly[is.na(claim9month),claim9month := 0
          ][is.na(claim10month),claim10month := 0
            ][is.na(claim11month),claim11month := 0
              ][is.na(claim12month),claim12month := 0]

However, this is quite repetitive and I wanted to reduce this by using an loop (not sure if that is the smartest idea though?):但是,这是非常重复的,我想通过使用循环来减少这种情况(虽然不确定这是否是最聪明的想法?):

  for (i in 9:12){
    claimsMonthly[is.na(paste0("claim", i, "month")), paste0("claim", i, "month") := 0]
  }

When I run this loop nothing happens.当我运行这个循环时,什么也没有发生。 I guess it is due to the pact that the paste0() returns "claim12month" , so I get in.na("claim12month") .我想这是由于paste0()返回"claim12month"的协议,所以我得到in.na("claim12month") The result of that is FALSE despite the fact that there are NA in my data.尽管我的数据中有NA ,但结果是FALSE I guess this has something to do with the quotes?我想这与引号有关吗?

This is not the first time i have issues with using paste0() or running loops with data.table , so I must be missing something important here.这不是我第一次遇到使用paste0()或使用data.table运行循环的问题,所以我必须在这里遗漏一些重要的东西。

Any ideas how to fix this?任何想法如何解决这一问题?

We can either specify the .SDcols with the names of the columns ('nm1'), loop over the .SD (Subset of Data.table) and assign the NA to 0 ( replace_na from tidyr )我们可以使用列名('nm1')指定.SDcols ,遍历.SD的子集)并将 NA 分配为 0( replace_na from tidyr

library(data.table)
library(tidyr)
nm1 <- paste0("claim", 9:12, "month")
setDT(claimsMonthly)[, (nm1) := lapply(.SD, replace_na, 0), .SDcols = nm1]

Or as @jangorecki mentioned in the comments, nafill from data.table would be better或者正如评论中提到的nafill ,来自 data.table 的data.table会更好

setDT(claimsMonthly)[, (nm1) := lapply(.SD, nafill, fill = 0), .SDcols = nm1]

or using a loop with set , assign the columns of interest with 0 based on the NA values in each column by specifying the i (for row index) and j for column index/name或使用带有set的循环,通过指定i (用于行索引)和j为列索引/名称,根据每列中的 NA 值将感兴趣的列分配为 0

for(j in nm1){
    set(claimsMonthly, i = which(is.na(claimsMonthly[[j]])), j =j, value = 0)
 }

Or with setnafill或使用setnafill

setnafill(claimsMonthly, cols = nm1, fill = 0)

You can use:您可以使用:

claimsMonthly[, 9:12][is.na(claimsMonthly[, 9:12])] <- 0

Also you can use variable names:您也可以使用变量名:

claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")][is.na(claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")])] <- 0

Or even better you can use a vector with all variables with "claimXXmonth" pattern.或者更好的是,您可以使用带有“claimXXmonth”模式的所有变量的向量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM