I want to examine my dataset - flights, and use summary()
function.
summary(flights["tailnum"])
Results:
tailnum
Length:336776
Class :character
Mode :character
In particular, it does not show that the character variable tailnum
has any NAs.
However, when I use sum(is.na(flights$tailnum))
, it shows it has NAs.
[1] 2512
What is the best function to examine a categorical variable - show its levels, missing values, total number of rows and frequencies for each level?
Apparently the summary()
method for character variables doesn't report NAs. (This does seem a bit inconsistent, might be worth reporting/discussing on the r-devel@r-project.org
mailing list...)
If you convert the variable to a factor and apply summary()
to it specifically you'll get a table of the counts of the first 98 levels (followed by an "Other" category and the number of NAs).
summary(factor(flights$tailnum))
If you really want a full tabulation:
tt <- table(flights$tailnum, useNA = "ifany")
print(tt)
Although length(tt)
is 4044, telling you that there are 4043 distinct non-NA values (+ NA
values): head(table(tt))
and tail(table(tt))
tell you that there are hundreds of values that occur only a few times, and a few values that occur hundreds (or thousands) of times.
If you're using tidyverse and want to convert all character variables to factors:
flights %>% mutate(across(where(is.character), factor))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.