简体   繁体   English

用R中可变数量的案例重新编码

[英]Recode with a variable number of cases in R

I am creating a function that takes a list of user-specified words and then labels them as a number depending on the order of the number in the list. 我正在创建一个函数,该函数接受用户指定的单词列表,然后根据列表中数字的顺序将它们标记为数字。 The user can specify different list lengths. 用户可以指定不同的列表长度。

For example: 例如:

myNotableWords<-c("No_IM","IM","LGD","HGD","T1a")

aa<-c("No_IM","IM","No_IM","HGD","T1a","HGD","T1a","IM","LGD")
aa<-data.frame(aa,stringsAsFactors=FALSE)

Intended Output 预期输出

new<-(1,2,1,4,5,4,5,2,3)

Is there a way of maybe getting the index of the original list and then looking up where the each element of the target list is in that index and replacing it with the index number? 有没有办法获取原始列表的索引,然后查找目标列表中每个元素在该索引中的位置,然后将其替换为索引号?

new <- c()
for (item in aa) {
  new <- c(new, which(myNotableWords == item))
}
print(new)
#[1] 1 2 1 4 5 4 5 2 3

You can do this using data.frame ; 您可以使用data.frame来执行此操作; the syntax shouldn't change. 语法不应更改。 I prefer using data.table though. 我更喜欢使用data.table

library(data.table)
myWords <- c("No_IM","IM","LGD","HGD","T1a")
myIndex <- data.table(keywords = myWords, word_index = seq(1, length(myWords)))

The third line simply adds an index to the vector myWords . 第三行只是向向量myWords添加一个索引。

aa <- data.table(keywords = c("No_IM","IM","No_IM","HGD","T1a",
                         "HGD","T1a","IM","LGD"))
aa <- merge(aa, myIndex, by = "keywords", all.x = TRUE)

And now you have a table that shows the keyword and its unique number. 现在,您有了一个显示关键字及其唯一编号的表格。

Why not just use the factor functionality of R? 为什么不只使用R的factor功能呢?

A "factor data type" stores an integer that references a "level" (= character string) via the index number: “因子数据类型”存储一个通过索引号引用“级别”(=字符串)的整数:

myNotableWords<-c("No_IM","IM","LGD","HGD","T1a")
aa<-c("No_IM","IM","No_IM","HGD","T1a","HGD","T1a","IM","LGD")

aa <- as.integer(factor(aa, myNotableWords, ordered = TRUE))

aa
# [1] 1 2 1 4 5 4 5 2 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM