简体   繁体   中英

Unspecified levels in factor()

I'm working with a dataset in R that comes with a codebook, which basically tells me what the labels for the different levels of my factor variables should be. For example, with the codebook, I can see that in my "Sex" variable, the 0s are "Female" and the 1s are "Male." I'm using this information to label the values in my variables accordingly.

However, I've recently discovered, to my dismay, that the codebook is not complete. For example, it tells me for one variable that 1s are "Yes" and 2s are "No," but doesn't tell me what 7s, 8s, and 9s are, which I can see in the data. What I would like to do is label this variable as follows (or something like this):

data$variable <- factor(data$variable,
                        levels=c(1, 2, 7, 8, 9),
                        labels=c("Yes", "No", "7", "8", "9"))

Basically, I would like for all levels that weren't specified in the codebook to be labeled as themselves. The problem I am having is that this codebook is missing quite a few of these, and I would really rather not have to manually look at all undefined values in my data to construct the above code for ever variable. Plus, if I just leave out those missing levels, R automatically labels them as "NA," which I do not want.

Summary: I am trying to figure out how to use factor() such that instead of labeling all unspecified levels as "NA," it labels them as themselves.

You can convert levels after you create a factor so we can use that to our advantage.

mydat <- c(1, 2, 3,2,3,4,3,2,1,2,4,4,6,5,7,8,9)
# convert to factor ignoring code book
dat <- factor(mydat)
# Create map corresponding to codebook levels
mymap <- c("1" = "Yes", "2" = "No")
# Figure out which levels are accounted for by codebook
id <- levels(dat) %in% names(mymap)
# Convert to appropriate values
levels(dat)[id] <- mymap[levels(dat)[id]]

Alternatively (and probably a little easier)

# alternatively we can construct the map if we have two vectors
# of the value and the codebook value
val <- c(1, 2)
lev <- c("Yes", "No")

dat <- factor(mydat)
levels(dat)[val] <- lev

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM