简体   繁体   中英

replacing all NA with a 0 in data.table in R

I have a data.table with many columns. There are 4 columns where I want to replace NA with an 0.

I have a working solution:

  claimsMonthly[is.na(claim9month),claim9month := 0
          ][is.na(claim10month),claim10month := 0
            ][is.na(claim11month),claim11month := 0
              ][is.na(claim12month),claim12month := 0]

However, this is quite repetitive and I wanted to reduce this by using an loop (not sure if that is the smartest idea though?):

  for (i in 9:12){
    claimsMonthly[is.na(paste0("claim", i, "month")), paste0("claim", i, "month") := 0]
  }

When I run this loop nothing happens. I guess it is due to the pact that the paste0() returns "claim12month" , so I get in.na("claim12month") . The result of that is FALSE despite the fact that there are NA in my data. I guess this has something to do with the quotes?

This is not the first time i have issues with using paste0() or running loops with data.table , so I must be missing something important here.

Any ideas how to fix this?

We can either specify the .SDcols with the names of the columns ('nm1'), loop over the .SD (Subset of Data.table) and assign the NA to 0 ( replace_na from tidyr )

library(data.table)
library(tidyr)
nm1 <- paste0("claim", 9:12, "month")
setDT(claimsMonthly)[, (nm1) := lapply(.SD, replace_na, 0), .SDcols = nm1]

Or as @jangorecki mentioned in the comments, nafill from data.table would be better

setDT(claimsMonthly)[, (nm1) := lapply(.SD, nafill, fill = 0), .SDcols = nm1]

or using a loop with set , assign the columns of interest with 0 based on the NA values in each column by specifying the i (for row index) and j for column index/name

for(j in nm1){
    set(claimsMonthly, i = which(is.na(claimsMonthly[[j]])), j =j, value = 0)
 }

Or with setnafill

setnafill(claimsMonthly, cols = nm1, fill = 0)

You can use:

claimsMonthly[, 9:12][is.na(claimsMonthly[, 9:12])] <- 0

Also you can use variable names:

claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")][is.na(claimsMonthly[c("claim9month", "claim10month","claim11month","claim12month")])] <- 0

Or even better you can use a vector with all variables with "claimXXmonth" pattern.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM