简体   繁体   中英

replace values with NA in R

I have a character vector that looks like:

"Internet" "Internet" "-1"       "-5"       "Internet" "Internet" 

I want to replace all values that would be negative numeric values (-1, -5, etc) with NA .

I did that with this code:

hintsData$WhereSeekHealthInfo[hintsData$WhereSeekHealthInfo < 0] <- NA

That seemed to work:

head(hintsData$WhereSeekHealthInfo)
# [1] "Internet" "Internet" NA         NA         "Internet" "Internet"

But then when I did

> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] NA

Basically I couldn't sum the values anymore because I changed the vector in some way?

Prior to running the NA code I was able to run the code and get this:

> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] 1691

So, how can I replace the "-1", "-5" etc values with NA, but still get:

> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] 1691

Please let me know if you have an idea. I did find other questions about replacing with NA but as I don't know why I can't count values anymore once I replace with NA I'm not sure what to search on or rule out.

sum has a na.rm argument, set that to TRUE , and you will remove the NA . (in general, 1+NA = NA , so you want to remove the NA values)

That being said, you are being slightly sneaky with your <0 condition given that your vector is character (it does work in this case, but I wouldn't want to presume it was robust)

The idiomatic approach to setting NA values in R is to use is.na<- , eg

is.na(hintsData$WhereSeekHealthInfo) <- hintsData$WhereSeekHealthInfo <0

Depending on how you read in your data, you could set up this to process your information

Eg, if you knew the valid responses prior to reading in a text file, you could create your own class

 setAs("character","Q1", function(from) factor(from ,levels = c('Internet','Newspaper'))

 read.csv('mytextfile.csv', colClasses = list(WhereSeekHealthInfo = 'Q1')

or perhaps (being more explicit about NA values and less explicit about what valid values are.

  setAs("character","Q1b", function(from) {is.na(from) <- suppressWarnings(as.numeric(from)) <0;from})

The reason for this, is that x == NA returns NA for any value of x (even if x is itself NA ).

So you should use Arun's suggestion, sum(..., na.rm=TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM