简体   繁体   English

以与从 0 到唯一值长度的因子相同的顺序将因子转换为数字

[英]Convert factor to numeric in the same order of the factor from 0 to length of the unique values

I am able to convert the new_target column into numerical form.我能够将new_target列转换为数字形式。 But as the factor form is already numerical, I am left with a bunch of numbers.但由于因子形式已经是数字,我只剩下一堆数字。 I want them ordered and reassigned to their equivalent from 0 to the length of the factor.我希望他们订购并重新分配到从 0 到因子长度的等价物。 I have a numerical target at first, then I quantize it to 20 bins.一开始我有一个数字目标,然后我将它量化为 20 个 bin。 As a result, I obtain new_target column which consists of the unique values (0,1,3,14,16,18,19) .结果,我获得了由唯一值(0,1,3,14,16,18,19)组成的new_target列。 Instead of these unique values I need values ordered from 0 to length of the unique values in new_target .我需要从 0 到new_target中唯一值长度的值,而不是这些唯一值。 Which are c(0,1,2,3,4,5,6) .这是c(0,1,2,3,4,5,6) The expected output is given in new_target_expected column.预期的 output 在new_target_expected列中给出。 How can I create new_target_expected column without manually creating it?如何在不手动创建的情况下创建new_target_expected列? I have a bigger dataframe I am dealing with and it is not possible to do this manually.我有一个更大的 dataframe 我正在处理,无法手动执行此操作。

require(stringr)
require(data.table)

cat_var <- c("rock", "indie", "rock", "rock", "pop", "indie", "pop", "rock", "pop")
cat_var_2 <- c("blue", "green", "red", "red", "blue", "red", "green", "blue", "green")
target_var <- c(30, 10, 27, 14, 29, 25, 27, 12, 10)
df <- data.table("categorical_variable" = cat_var, "categorical_variable_2" = cat_var_2, "target_variable" =  target_var)

targetVariable <- "target_variable"

number_of_buckets = 20
# Each bucket should contain equal number of objects
a <- cut(df[[targetVariable]] , breaks = number_of_buckets, labels = 0:(number_of_buckets - 1)) 

df[["new_target"]] <- a
df[["new_target"]] <- as.numeric(as.character(df[["new_target"]]))
df[["new_target_expected"]] <- c(6, 0, 4, 2, 5, 3, 4, 1, 0)

We could remove the unused levels with droplevels and coerce the factor to integer .我们可以使用droplevels删除未使用的levels并将该factor强制为integer Indexing in R starts from 1, so subtract 1 to make the values start from 0. R中的索引从 1 开始,所以减 1 使值从 0 开始。

library(data.table)
df[, (targetVariable) := as.integer(droplevels(a))-1]

-output -输出

> df
   categorical_variable categorical_variable_2 target_variable
1:                 rock                   blue               6
2:                indie                  green               0
3:                 rock                    red               4
4:                 rock                    red               2
5:                  pop                   blue               5
6:                indie                    red               3
7:                  pop                  green               4
8:                 rock                   blue               1
9:                  pop                  green               0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM