简体   繁体   English

因子()中未指定的级别

[英]Unspecified levels in factor()

I'm working with a dataset in R that comes with a codebook, which basically tells me what the labels for the different levels of my factor variables should be. 我正在使用R中带有代码簿的数据集,它基本上告诉我我的因子变量的不同级别的标签应该是什么。 For example, with the codebook, I can see that in my "Sex" variable, the 0s are "Female" and the 1s are "Male." 例如,使用代码簿,我可以看到在我的“性别”变量中,0是“女性”而1是“男性”。 I'm using this information to label the values in my variables accordingly. 我正在使用此信息相应地标记变量中的值。

However, I've recently discovered, to my dismay, that the codebook is not complete. 但是,我最近发现,令我沮丧的是,码本不完整。 For example, it tells me for one variable that 1s are "Yes" and 2s are "No," but doesn't tell me what 7s, 8s, and 9s are, which I can see in the data. 例如,它告诉我一个变量1表示“是”,2表示“否”,但不告诉我7s,8s和9是什么,我可以在数据中看到。 What I would like to do is label this variable as follows (or something like this): 我想做的是将此变量标记为如下(或类似的东西):

data$variable <- factor(data$variable,
                        levels=c(1, 2, 7, 8, 9),
                        labels=c("Yes", "No", "7", "8", "9"))

Basically, I would like for all levels that weren't specified in the codebook to be labeled as themselves. 基本上,我希望将代码簿中未指定的所有级别标记为自己。 The problem I am having is that this codebook is missing quite a few of these, and I would really rather not have to manually look at all undefined values in my data to construct the above code for ever variable. 我遇到的问题是这个码本缺少其中的一些,我真的不必手动查看我的数据中的所有未定义值来构造上述代码以用于变量。 Plus, if I just leave out those missing levels, R automatically labels them as "NA," which I do not want. 另外,如果我只是遗漏那些缺失的等级,R会自动将它们标记为“NA”,这是我不想要的。

Summary: I am trying to figure out how to use factor() such that instead of labeling all unspecified levels as "NA," it labels them as themselves. 简介:我试图弄清楚如何使用因子(),而不是将所有未指定的级别标记为“NA”,它将它们标记为自己。

You can convert levels after you create a factor so we can use that to our advantage. 您可以在创建因子后转换级别,以便我们可以将其用于我们的优势。

mydat <- c(1, 2, 3,2,3,4,3,2,1,2,4,4,6,5,7,8,9)
# convert to factor ignoring code book
dat <- factor(mydat)
# Create map corresponding to codebook levels
mymap <- c("1" = "Yes", "2" = "No")
# Figure out which levels are accounted for by codebook
id <- levels(dat) %in% names(mymap)
# Convert to appropriate values
levels(dat)[id] <- mymap[levels(dat)[id]]

Alternatively (and probably a little easier) 或者(可能更容易)

# alternatively we can construct the map if we have two vectors
# of the value and the codebook value
val <- c(1, 2)
lev <- c("Yes", "No")

dat <- factor(mydat)
levels(dat)[val] <- lev

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM