[英]Recode with a variable number of cases in R
我正在创建一个函数,该函数接受用户指定的单词列表,然后根据列表中数字的顺序将它们标记为数字。 用户可以指定不同的列表长度。
例如:
myNotableWords<-c("No_IM","IM","LGD","HGD","T1a")
aa<-c("No_IM","IM","No_IM","HGD","T1a","HGD","T1a","IM","LGD")
aa<-data.frame(aa,stringsAsFactors=FALSE)
预期输出
new<-(1,2,1,4,5,4,5,2,3)
有没有办法获取原始列表的索引,然后查找目标列表中每个元素在该索引中的位置,然后将其替换为索引号?
new <- c()
for (item in aa) {
new <- c(new, which(myNotableWords == item))
}
print(new)
#[1] 1 2 1 4 5 4 5 2 3
您可以使用data.frame
来执行此操作; 语法不应更改。 我更喜欢使用data.table
。
library(data.table)
myWords <- c("No_IM","IM","LGD","HGD","T1a")
myIndex <- data.table(keywords = myWords, word_index = seq(1, length(myWords)))
第三行只是向向量myWords
添加一个索引。
aa <- data.table(keywords = c("No_IM","IM","No_IM","HGD","T1a",
"HGD","T1a","IM","LGD"))
aa <- merge(aa, myIndex, by = "keywords", all.x = TRUE)
现在,您有了一个显示关键字及其唯一编号的表格。
为什么不只使用R的factor
功能呢?
“因子数据类型”存储一个通过索引号引用“级别”(=字符串)的整数:
myNotableWords<-c("No_IM","IM","LGD","HGD","T1a")
aa<-c("No_IM","IM","No_IM","HGD","T1a","HGD","T1a","IM","LGD")
aa <- as.integer(factor(aa, myNotableWords, ordered = TRUE))
aa
# [1] 1 2 1 4 5 4 5 2 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.