在 R 中将所有 NA 替换为 data.table 中的 0

Question

I have a data.table with many columns.我有一个data.table有很多列。 There are 4 columns where I want to replace NA with an 0.有 4 列我想用 0 替换NA 。

I have a working solution:我有一个可行的解决方案：

  claimsMonthly[is.na(claim9month),claim9month := 0
          ][is.na(claim10month),claim10month := 0
            ][is.na(claim11month),claim11month := 0
              ][is.na(claim12month),claim12month := 0]

However, this is quite repetitive and I wanted to reduce this by using an loop (not sure if that is the smartest idea though?):但是，这是非常重复的，我想通过使用循环来减少这种情况（虽然不确定这是否是最聪明的想法？）：

  for (i in 9:12){
    claimsMonthly[is.na(paste0("claim", i, "month")), paste0("claim", i, "month") := 0]
  }

When I run this loop nothing happens.当我运行这个循环时，什么也没有发生。 I guess it is due to the pact that the paste0() returns "claim12month" , so I get in.na("claim12month") .我想这是由于paste0()返回"claim12month"的协议，所以我得到in.na("claim12month") 。 The result of that is FALSE despite the fact that there are NA in my data.尽管我的数据中有NA ，但结果是FALSE 。 I guess this has something to do with the quotes?我想这与引号有关吗？

This is not the first time i have issues with using paste0() or running loops with data.table , so I must be missing something important here.这不是我第一次遇到使用paste0()或使用data.table运行循环的问题，所以我必须在这里遗漏一些重要的东西。

Any ideas how to fix this?任何想法如何解决这一问题？

Answer 1

We can either specify the .SDcols with the names of the columns ('nm1'), loop over the .SD (Subset of Data.table) and assign the NA to 0 ( replace_na from tidyr )我们可以使用列名（'nm1'）指定.SDcols ，遍历.SD的子集）并将 NA 分配为 0（ replace_na from tidyr ）

library(data.table)
library(tidyr)
nm1 <- paste0("claim", 9:12, "month")
setDT(claimsMonthly)[, (nm1) := lapply(.SD, replace_na, 0), .SDcols = nm1]

Or as @jangorecki mentioned in the comments, nafill from data.table would be better或者正如评论中提到的nafill ，来自 data.table 的data.table会更好

setDT(claimsMonthly)[, (nm1) := lapply(.SD, nafill, fill = 0), .SDcols = nm1]

or using a loop with set , assign the columns of interest with 0 based on the NA values in each column by specifying the i (for row index) and j for column index/name或使用带有set的循环，通过指定i （用于行索引）和j为列索引/名称，根据每列中的 NA 值将感兴趣的列分配为 0

for(j in nm1){
    set(claimsMonthly, i = which(is.na(claimsMonthly[[j]])), j =j, value = 0)
 }

Or with setnafill或使用setnafill

setnafill(claimsMonthly, cols = nm1, fill = 0)

Answer 2

You can use:您可以使用：

claimsMonthly[, 9:12][is.na(claimsMonthly[, 9:12])] <- 0

Also you can use variable names:您也可以使用变量名：

claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")][is.na(claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")])] <- 0

Or even better you can use a vector with all variables with "claimXXmonth" pattern.或者更好的是，您可以使用带有“claimXXmonth”模式的所有变量的向量。

在 R 中将所有 NA 替换为 data.table 中的 0

问题描述

2 个解决方案

解决方案1
4 已采纳 2020-05-26 19:03:21

解决方案2
0 2020-05-26 19:28:57

在 R 中将所有 NA 替换为 data.table 中的 0

问题描述

2 个解决方案

解决方案1 4 已采纳 2020-05-26 19:03:21

解决方案2 0 2020-05-26 19:28:57

解决方案1
4 已采纳 2020-05-26 19:03:21

解决方案2
0 2020-05-26 19:28:57