[英]Searching and coding using a list in R
我有一個字符串的“向量”的“列表”和一個字符串的“ data.frame”,如下所示
lst <- list( c("key", "parking", "velvet"), c("sumatra", "cap"), c("sled", "card"), c("notice", "piece", "page"))
df <- c("key", "sumatra", "band", "cattle", "camp", "sled", "page", "wire", "key", "card", "cap", "page")
df <- data.frame(df, stringsAsFactors=FALSE)
我想使用基於列表lst
中向量的隸屬關系的代碼向數據幀df
添加列。 所需的輸出是這樣的。
df$code <- c("G1", "G2", "", "", "", "G3", "G4", "", "G1", "G3", "G2", "G4")
df
df code
1 key G1
2 sumatra G2
3 band
4 cattle
5 camp
6 sled G3
7 page G4
8 wire
9 key G1
10 card G3
11 cap G2
12 page G4
我如何在R
做到這一點?
df$code <- paste0("G",cumsum(c(TRUE, diff(sequence(sapply(lst,length)))<0)))[match(df$df, unlist(lst))]
df$code[is.na(df$code)] <- ''
這是一種方法:
names(lst) <- paste0('G', seq_along(lst))
transform(df, code=with(stack(lst), ind[match(df, values)]))
# df code
# 1 key G1
# 2 sumatra G2
# 3 band <NA>
# 4 cattle <NA>
# 5 camp <NA>
# 6 sled G3
# 7 page G4
# 8 wire <NA>
# 9 key G1
# 10 card G3
# 11 cap G2
# 12 page G4
這是使用qdapTools軟件包的一種方法:
library(qdapTools)
names(lst) <- paste0("G", 1:length(lst))
df$code <- df[, 1] %l% lst
還有一個很好的措施...
lst <- list(c("key", "parking", "velvet"), c("sumatra", "cap"),
c("sled", "card"), c("notice", "piece", "page"))
d <- c("key", "sumatra", "band", "cattle", "camp",
"sled", "page", "wire", "key", "card", "cap", "page")
DF <- data.frame(d, stringsAsFactors=FALSE)
> l <- rep(seq_along(lst), sapply(lst, length))
> m <- l[match(d, unlist(lst))]
> DF$code <- ifelse(is.na(m), "", paste0("G", m))
> DF
## df code
## 1 key G1
## 2 sumatra G2
## 3 band
## 4 cattle
## 5 camp
## 6 sled G3
## 7 page G4
## 8 wire
## 9 key G1
## 10 card G3
## 11 cap G2
## 12 page G4
我假設您從這樣的代碼開始:
MyCode <- c("G1", "G2","G3", "G4", "G1", "G3", "G2", "G4")
但是您需要知道將哪些行放入其中。請嘗試以下操作:
df$code<-NA
df[df$df %in% unlist(lst),]$code<-MyCode
unlist()
部分會將您的列表變成向量。 %in%
部分將返回df$df
與lst
匹配的任何行。 如果沒有匹配項,則df$code
下將有一個NA
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.