简体   繁体   English

改变多因素变量的水平

[英]Change level of multiple factor variables

everyone - 大家-

I want to preface this by saying that I already looked at this link to try to solve my problem: 我想通过说我已经查看了此链接来尝试解决我的问题作为开头:

Applying the same factor levels to multiple variables in an R data frame 将相同的因子水平应用于R数据框中的多个变量

The difference is that in that problem, the OP wanted to change the levels of factors that all had the same levels. 不同之处在于,在该问题中,OP希望更改所有具有相同水平的因素的水平。 In my instance, I'm looking to change just the first level, which is set to ' ', to something like 'Unknown' and leave the rest of the levels alone. 以我为例,我只是想将设置为“”的第一个级别更改为“未知”之类,而将其余级别保留下来。 I know I could do this in a "non-R" way with something like this: 我知道我可以使用“ non-R”方式执行以下操作:

for (i in 64:88) {
  var.name <- colnames(df[i])
  levels(eval(parse(text=paste('df$', var.name, sep=''))))[levels(eval(parse(text=paste('df$', var.name, sep='')))) == ' '] <- 'Unknown'
}

But that's an inefficient way to do it. 但这是一种低效的方法。 Trying to use the method proposed in the question linked above gave me this code: 尝试使用上面链接的问题中提出的方法给我以下代码:

df[64:88] <- lapply(df[64:88], factor, levels=c('Unknown', ??))

I don't know what to put in place of the question marks. 我不知道该用什么代替问号。 I tried using just "levels[-1]" but it's obvious why that didn't work. 我尝试仅使用“ levels [-1]”,但是很明显为什么不起作用。 I also tried "levels(df[64:88])[-1]" but again no good. 我也尝试过“ levels(df [64:88])[-1]”,但同样不好。 So I tried to revamp the code with the following: 因此,我尝试使用以下代码修改代码:

df[64:88] <- lapply(df[64:88], function(x) levels(x)[levels(x) == ' '] <- 'Unknown')

but I get NULL whenever I call levels$transaction_type1 (where transaction_type1 is the column name of df[64]). 但是每当我调用levels $ transaction_type1(其中transaction_type1是df [64]的列名)时,我都会得到NULL。

What am I missing here? 我在这里想念什么?

Thanks in advance for your help! 在此先感谢您的帮助!

Per a couple of requests, here is an example of my data: 根据几个请求,这是我的数据示例:

df$transaction_type1[1:100]
  [1]                                                                                                                                                
 [13] HOME RENEW                                                                                                                                     
 [25]                                                                                                                                                
 [37]                                                                                                                                                
 [49]                                                                                                                                                
 [61] AUTO MANAGE                                                                                     AUTO RENEW                                     
 [73]             AUTO MANAGE                                                                                     AUTO RENEW                         
 [85]                                                                                                                                                
 [97]                                                
Levels:   AUTO CLAIM AUTO MANAGE AUTO PURCHASE AUTO RENEW HOME CLAIM HOME RENEW

As you can see, there is a lot of values equal to ' ' and all 25 variables look just like this, but with different levels. 如您所见,有很多等于''的值,所有25个变量看起来都像这样,但是级别不同。 My data consists of 222 variables and 24,850 rows, so I don't know what the standard is on SO for giving example data. 我的数据由222个变量和24,850行组成,所以我不知道用于提供示例数据的标准是什么。 Also, this snippet of code might help as well: 此外,以下代码片段也可能会有所帮助:

> levels(df$transaction_type1)
#[1] " "             "AUTO CLAIM"    "AUTO MANAGE"   "AUTO PURCHASE" "AUTO RENEW"    "HOME CLAIM"    "HOME RENEW"

> levels(df$transaction_type1)[levels(df$transaction_type1) == ' '] <- 'Unknown'
> levels(df$transaction_type1)
#[1] "Unknown"       "AUTO CLAIM"    "AUTO MANAGE"   "AUTO PURCHASE" "AUTO RENEW"    "HOME CLAIM"    "HOME RENEW"   

If more information is needed, please let me know so I can provide it and also learn the SO standards of asking for help. 如果需要更多信息,请让我知道,以便我可以提供它,并了解寻求帮助的SO标准。 Thanks! 谢谢!

Something like this? 像这样吗

# it seems like your original data has a structure like this
df <- data.frame(x = factor(c("a", "", "b"), levels = c("", "a", "b")),
                 y = factor(c("c", "", "d"), levels = c("", "c", "d")))

lapply(df, levels)
# $x
# [1] ""  "a" "b"
# 
# $y
# [1] ""  "c" "d"    

# change the "" level to "unknown", and return the updated vector
df[] <- lapply(df, function(x){
 levels(x)[levels(x) == ""] <- "unknown"
 x
 })

lapply(df, levels)
# $x
# [1] "unknown" "a"       "b"      
# 
# $y
# [1] "unknown" "c"       "d"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM