简体   繁体   中英

Replacing multiple character strings in specific data frame columns in R

I've looked all around for this, but have found no answers. I have a data frame that contains columns with multiple levels along the lines of "Unknown" "No response" or "Refused to answer" and the like. All of these are useless to me for analysis, so I want to replace them all with NA.

Note that I do not want to replace them across the entire data frame, only specific columns! There are other columns that contain values with the same names that are actually useful to me and I want to leave them alone.

I've managed to replace them one at a time by using:

data$col1 <- factor(gsub("Unknown", "NA", data$col1))

but that only works for one string at a time. If I try to add multiple strings, R throws an error. Is there a more efficient way to do this?

I'm relatively new to coding, please be gentle!

If we need to change multiple values to NA, one option is using na.strings in read.csv/read.table while reading the data

dat <- read.csv("yourfile.csv", na.strings = c("Unknown", "No response", 
             "Refused to answer"))

However, here the problem is with specific columns, in that case, create an index of the columns, loop through the columns and replace the values by creating a logical index with %in% (assuming that these are not substrings)

columnsOfInterest <- c(1, 4, 5) #just for an example
df1[columnsOfInterest] <- lapply(df1[columnsOfInterest], function(x)
         replace(x, x %in% c("Unknown", "No response", "Refused to answer"), NA))

NOTE: changing to quoted NA ie "NA" is rather useless, instead we need just NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM