简体   繁体   English

将值从分类更改为标称值

[英]Change values from categorical to nominal in R

I want to change all the values in categorical columns by rank. 我想按等级更改分类列中的所有值。 Rank can be decided using the index of the sorted unique elements in the column. 可以使用列中已排序的唯一元素的索引来确定排名。

For instance, 例如,

> data[1:5,1] 
[1] "B2" "C4" "C5" "C1" "B5"

then I want these entries in the column replacing categorical values 然后我希望这些列中的条目替换分类值

> data[1:5,1]  
[1] "1" "4" "5" "3" "2"

Another column: 另一栏:

> data[1:5,3]
[1] "Verified"        "Source Verified" "Not Verified"    "Source Verified" "Source Verified"

Then the updated column: 然后是更新的列:

> data[1:5,3]
[1] "3" "2" "1" "2" "2"

I used this code for this task but it is taking a lot of time. 我将此代码用于此任务,但要花费很多时间。

for(i in 1:ncol(data)){
  if(is.character(data[,i])){
    temp <- sort(unique(data[,i]))
    for(j in 1:nrow(data)){
      for(k in 1:length(temp)){
        if(data[j,i] == temp[k]){
          data[j,i] <- k}
      }
    }
  }
}

Please suggest me the efficient way to do this, if possible. 如果可能的话,请向我建议有效的方法。 Thanks. 谢谢。

Here a solution in base R. I create a helper function that convert each column to a factor using its unique sorted values as levels. 这是base R中的解决方案。我创建了一个辅助函数,该函数使用其唯一的排序值作为级别将每列转换为一个因子。 This is similar to what you did except I use as.integer to get the ranking values. 除了我使用as.integer获取排名值外,这与您所做的类似。

rank_fac <- function(col1) 
   as.integer(factor(col1,levels = unique(col1)))

Some data example: 一些数据示例:

dx <- data.frame(
  col1= c("B2" ,"C4" ,"C5", "C1", "B5"),
  col2=c("Verified"    ,    "Source Verified", "Not Verified"  ,  "Source Verified", "Source Verified")
)

Applying it without using a for loop. 在不使用for循环的情况下应用它。 Better to use lapply here to avoid side-effect. 最好在这里使用lapply以避免副作用。

data.frame(lapply(dx,rank_fac)

Results: 结果:

#       col1 col2
# [1,]    1    3
# [2,]    4    2
# [3,]    5    1
# [4,]    3    2
# [5,]    2    2

using data.table syntax-sugar 使用data.table语法糖

library(data.table)
setDT(dx)[,lapply(.SD,rank_fac)]
#    col1 col2
# 1:    1    3
# 2:    4    2
# 3:    5    1
# 4:    3    2
# 5:    2    2

simpler solution: 更简单的解决方案:

Using only as.integer : 仅使用as.integer

setDT(dx)[,lapply(.SD,as.integer)]

Using match : 使用match

# df is your data.frame    
df[] <- lapply(df, function(x) match(x, sort(unique(x))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM