I have a the following data frame with factor columns.
set.seed(1234)
df <- data.frame(a=sample(c("1","2",NA), 10, replace=T),
b=sample(c("1","2",NA), 10, replace=T),
c=sample(c("1","2","3",NA), 10, replace=T))
which is
df
a b c
1 1 <NA> 2
2 2 2 2
3 2 1 1
4 2 <NA> 1
5 <NA> 1 1
6 2 <NA> <NA>
7 1 1 3
8 1 1 <NA>
9 2 1 <NA>
10 2 1 1
Now, I want to create a new level "N" for selected columns and convert all NA in those column to "N". I make a vector of selected column names by
selected <- c("b", "c")
and then try to use apply
in the following way
apply(df, 2, function(x) {(if x %in% selected) x <- factor(x, levels=c(levels(x), 'N'))})
But it gives error:
Error: unexpected symbol in "apply(df, 2, function(x) {(if x"
In my original data, I have lots of columns. So I want to avoid doing it column by column.
The 'levels' of the 'selected' columns before the operation is:
lapply(df[selected], levels)
#$b
#[1] "1" "2"
#$c
#[1] "1" "2" "3"
We can 'loop' over the columns in the 'selected' with lapply
, include 'N' as one more level in each column, and replace
the 'NA' values with 'N'.
df[selected] <- lapply(df[selected], function(x) {
levels(x) <- c(levels(x), 'N')
replace(x, which(is.na(x)), 'N')
})
Or another option is recode
from car
, where we can directly change 'NA' to 'N'. It will automatically update the levels.
library(car)
df[selected] <- lapply(df[selected], recode, "NA='N'")
lapply(df[selected], levels)
#$b
#[1] "1" "2" "N"
#$c
#[1] "1" "2" "3" "N"
Another useful function is addNA
if we want to add "NA" one of the levels
df[selected] <- lapply(df[selected], addNA)
NOTE: The output of apply
on a non-numeric column will be 'character' class. I guess that is not you wanted.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.