I have a character vector that looks like:
"Internet" "Internet" "-1" "-5" "Internet" "Internet"
I want to replace all values that would be negative numeric values (-1, -5, etc) with NA
.
I did that with this code:
hintsData$WhereSeekHealthInfo[hintsData$WhereSeekHealthInfo < 0] <- NA
That seemed to work:
head(hintsData$WhereSeekHealthInfo)
# [1] "Internet" "Internet" NA NA "Internet" "Internet"
But then when I did
> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] NA
Basically I couldn't sum the values anymore because I changed the vector in some way?
Prior to running the NA code I was able to run the code and get this:
> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] 1691
So, how can I replace the "-1", "-5" etc values with NA, but still get:
> sum(hintsData$WhereSeekHealthInfo == "Internet")
# [1] 1691
Please let me know if you have an idea. I did find other questions about replacing with NA but as I don't know why I can't count values anymore once I replace with NA I'm not sure what to search on or rule out.
sum
has a na.rm
argument, set that to TRUE
, and you will remove the NA
. (in general, 1+NA = NA
, so you want to remove the NA
values)
That being said, you are being slightly sneaky with your <0
condition given that your vector is character (it does work in this case, but I wouldn't want to presume it was robust)
The idiomatic approach to setting NA
values in R
is to use is.na<-
, eg
is.na(hintsData$WhereSeekHealthInfo) <- hintsData$WhereSeekHealthInfo <0
Depending on how you read in your data, you could set up this to process your information
Eg, if you knew the valid responses prior to reading in a text file, you could create your own class
setAs("character","Q1", function(from) factor(from ,levels = c('Internet','Newspaper'))
read.csv('mytextfile.csv', colClasses = list(WhereSeekHealthInfo = 'Q1')
or perhaps (being more explicit about NA values and less explicit about what valid values are.
setAs("character","Q1b", function(from) {is.na(from) <- suppressWarnings(as.numeric(from)) <0;from})
The reason for this, is that x == NA
returns NA
for any value of x
(even if x
is itself NA
).
So you should use Arun's suggestion, sum(..., na.rm=TRUE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.