简体   繁体   中英

R: NA on mean for a numerical variable

I am trying to determine some values based on cellular carrier. I have a main data frame that contains data from all carriers, and I have created 3 individual data frames from the main data frame by provider:

verizondf <- maindata[maindata$network == "Verizon",]
attdf <- maindata[maindata$network=="ATT",]
tmobiledf <- maindata[maindata$network=="TMobile",]

I want to get the average for one of the variables, "download", which is a numerical value.

On the verizondf data frame, it works fine:

> mean(verizondf$download)
[1] 462004.4

For the other 2, I get NA:

> mean(attdf$download)
[1] NA

I wondered if the data type had changed at some point, but I checked and it is still numeric:

> str(attdf$download)
 num [1:5516] 321585 50722 400085 287968 138301 ...

What could be causing this issue?

Others have pointed this out with their comments, I can give a "fuller" explanation here.

When you look at the help manual pages using ?mean you will get the description, including this info:

Usage

mean(x, ...)

## Default S3 method: mean(x, trim = 0, na.rm = FALSE, ...)

Looking under the "Arguments" section, you will see this:

na.rm
a logical value indicating whether NA values should be stripped before the > computation proceeds.

This tells you that the default for mean is to not strip out NA's, which will lead to a mean of NA if your data contains NA's.

If you want a numeric mean computed when you have NA values (and this is ok, given the fact you have NA's...something that is not always true!!!) you would use mean with the argument na.rm = TRUE .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM