簡體   English   中英

用R中可變數量的案例重新編碼

[英]Recode with a variable number of cases in R

我正在創建一個函數,該函數接受用戶指定的單詞列表,然后根據列表中數字的順序將它們標記為數字。 用戶可以指定不同的列表長度。

例如:

myNotableWords<-c("No_IM","IM","LGD","HGD","T1a")

aa<-c("No_IM","IM","No_IM","HGD","T1a","HGD","T1a","IM","LGD")
aa<-data.frame(aa,stringsAsFactors=FALSE)

預期輸出

new<-(1,2,1,4,5,4,5,2,3)

有沒有辦法獲取原始列表的索引,然后查找目標列表中每個元素在該索引中的位置,然后將其替換為索引號?

new <- c()
for (item in aa) {
  new <- c(new, which(myNotableWords == item))
}
print(new)
#[1] 1 2 1 4 5 4 5 2 3

您可以使用data.frame來執行此操作; 語法不應更改。 我更喜歡使用data.table

library(data.table)
myWords <- c("No_IM","IM","LGD","HGD","T1a")
myIndex <- data.table(keywords = myWords, word_index = seq(1, length(myWords)))

第三行只是向向量myWords添加一個索引。

aa <- data.table(keywords = c("No_IM","IM","No_IM","HGD","T1a",
                         "HGD","T1a","IM","LGD"))
aa <- merge(aa, myIndex, by = "keywords", all.x = TRUE)

現在,您有了一個顯示關鍵字及其唯一編號的表格。

為什么不只使用R的factor功能呢?

“因子數據類型”存儲一個通過索引號引用“級別”(=字符串)的整數:

myNotableWords<-c("No_IM","IM","LGD","HGD","T1a")
aa<-c("No_IM","IM","No_IM","HGD","T1a","HGD","T1a","IM","LGD")

aa <- as.integer(factor(aa, myNotableWords, ordered = TRUE))

aa
# [1] 1 2 1 4 5 4 5 2 3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM