簡體   English   中英

使用R中的列表進行搜索和編碼

[英]Searching and coding using a list in R

我有一個字符串的“向量”的“列表”和一個字符串的“ data.frame”,如下所示

lst <- list( c("key", "parking", "velvet"), c("sumatra", "cap"), c("sled", "card"), c("notice", "piece", "page"))

df <-  c("key", "sumatra", "band", "cattle", "camp", "sled", "page", "wire", "key", "card", "cap", "page")
df <- data.frame(df, stringsAsFactors=FALSE)

我想使用基於列表lst中向量的隸屬關系的代碼向數據幀df添加列。 所需的輸出是這樣的。

df$code <- c("G1", "G2", "", "", "", "G3", "G4", "", "G1", "G3", "G2", "G4")

 df
        df code
1      key   G1
2  sumatra   G2
3     band     
4   cattle     
5     camp     
6     sled   G3
7     page   G4
8     wire     
9      key   G1
10    card   G3
11     cap   G2
12    page   G4

我如何在R做到這一點?

df$code <- paste0("G",cumsum(c(TRUE, diff(sequence(sapply(lst,length)))<0)))[match(df$df, unlist(lst))]
df$code[is.na(df$code)] <- ''

這是一種方法:

names(lst) <- paste0('G', seq_along(lst))
transform(df, code=with(stack(lst), ind[match(df, values)]))
#         df code
# 1      key   G1
# 2  sumatra   G2
# 3     band <NA>
# 4   cattle <NA>
# 5     camp <NA>
# 6     sled   G3
# 7     page   G4
# 8     wire <NA>
# 9      key   G1
# 10    card   G3
# 11     cap   G2
# 12    page   G4

這是使用qdapTools軟件包的一種方法:

library(qdapTools)
names(lst) <- paste0("G", 1:length(lst))
df$code <- df[, 1] %l% lst

還有一個很好的措施...

lst <- list(c("key", "parking", "velvet"), c("sumatra", "cap"), 
            c("sled", "card"), c("notice", "piece", "page"))
d <- c("key", "sumatra", "band", "cattle", "camp", 
        "sled", "page", "wire", "key", "card", "cap", "page")
DF <- data.frame(d, stringsAsFactors=FALSE)

> l <- rep(seq_along(lst), sapply(lst, length))
> m <- l[match(d, unlist(lst))]
> DF$code <- ifelse(is.na(m), "", paste0("G", m))
> DF
##         df code
## 1      key   G1
## 2  sumatra   G2
## 3     band     
## 4   cattle     
## 5     camp     
## 6     sled   G3
## 7     page   G4
## 8     wire     
## 9      key   G1
## 10    card   G3
## 11     cap   G2
## 12    page   G4

我假設您從這樣的代碼開始:

 MyCode <- c("G1", "G2","G3", "G4", "G1", "G3", "G2", "G4")

但是您需要知道將哪些行放入其中。請嘗試以下操作:

df$code<-NA
df[df$df %in% unlist(lst),]$code<-MyCode

unlist()部分會將您的列表變成向量。 %in%部分將返回df$dflst匹配的任何行。 如果沒有匹配項,則df$code下將有一個NA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM