繁体   English   中英

用R中可变数量的案例重新编码

[英]Recode with a variable number of cases in R

我正在创建一个函数,该函数接受用户指定的单词列表,然后根据列表中数字的顺序将它们标记为数字。 用户可以指定不同的列表长度。

例如:

myNotableWords<-c("No_IM","IM","LGD","HGD","T1a")

aa<-c("No_IM","IM","No_IM","HGD","T1a","HGD","T1a","IM","LGD")
aa<-data.frame(aa,stringsAsFactors=FALSE)

预期输出

new<-(1,2,1,4,5,4,5,2,3)

有没有办法获取原始列表的索引,然后查找目标列表中每个元素在该索引中的位置,然后将其替换为索引号?

new <- c()
for (item in aa) {
  new <- c(new, which(myNotableWords == item))
}
print(new)
#[1] 1 2 1 4 5 4 5 2 3

您可以使用data.frame来执行此操作; 语法不应更改。 我更喜欢使用data.table

library(data.table)
myWords <- c("No_IM","IM","LGD","HGD","T1a")
myIndex <- data.table(keywords = myWords, word_index = seq(1, length(myWords)))

第三行只是向向量myWords添加一个索引。

aa <- data.table(keywords = c("No_IM","IM","No_IM","HGD","T1a",
                         "HGD","T1a","IM","LGD"))
aa <- merge(aa, myIndex, by = "keywords", all.x = TRUE)

现在,您有了一个显示关键字及其唯一编号的表格。

为什么不只使用R的factor功能呢?

“因子数据类型”存储一个通过索引号引用“级别”(=字符串)的整数:

myNotableWords<-c("No_IM","IM","LGD","HGD","T1a")
aa<-c("No_IM","IM","No_IM","HGD","T1a","HGD","T1a","IM","LGD")

aa <- as.integer(factor(aa, myNotableWords, ordered = TRUE))

aa
# [1] 1 2 1 4 5 4 5 2 3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM