[英]How to check whether all elements of a nested list are a subset of another list in R
我為此嘗試了許多不同的方法,包括此堆棧,但沒有任何方法可以正常工作。
我的數據框“SiteVisits”(一個小的子集 dput 位於底部)由Date
(類 = 日期)、 TagID
(類 = 數字)、 SiteVisits
(字符列表)和NumSites
(類 = 數字)列組成。 每一行都列出了在每個日期找到單個生物體 ( TagID
) 的所有站點。
我想根據標簽訪問的站點來分配標簽是在“內部”、“外部”還是“傳輸”中度過了一整天。 如果它從不訪問外部站點,則只能是“內部”,如果它從不訪問內部站點,則只能是“外部”
首先,我想確定此列表中是否包含某個日期的 TagID 的所有站點:
inside <- list(c("Release","IC1", "IC2", "IC3","RGD1"))
如果 TRUE SiteVisit$Location = "INSIDE"
ELSE 測試某個日期的 TagID 的所有站點是否包含在此列表中:
outside <- list(c("ORS1","WC1","WC2","WC3","RGU1","ORN1","ORN2","ORS3","GL1","CVP1","CLRS"))
如果 TRUE SiteVisit$Location = "OUTSIDE"
ELSE SiteVisit$Location = "TRANSITING"
我已經嘗試了許多不同的dplyr
和base
版本來實現這一點,但似乎沒有一個是正確的。 我認為這是因為我沒有正確檢查SiteVisit$SiteVisits
每個元素
我目前的嘗試是:
SiteVisit <- SiteVisit %>%
mutate(Location = ifelse(all(SiteVisits[[]] %in% inside), "INSIDE",
ifelse(all(SiteVisits[[]] %in% outside),"OUTSIDE","TRANSITING")))
產生所有“內部”
和
SiteVisit <- SiteVisit %>%
mutate(Location = ifelse(all(SiteVisits[] %in% inside), "INSIDE",
ifelse(all(SiteVisits[] %in% outside),"OUTSIDE","TRANSITING")))
產生所有“TRANSITING”
此外,嘗試在 for 循環中執行此操作並不完全有效
for (i in 1: nrow(SiteVisit)) {SiteVisit$Inside <-
all(SiteVisit$SiteVisits[[i]] %in% inside)}
產生所有 FALSE 而
all(SiteVisit$SiteVisits[[2]] %in% inside)
是真的
這是我的數據框“SiteVisit”dput 的一小部分:
structure(list(Date = structure(c(15828, 15828, 15847, 15847,
15847, 15847, 15847, 15847, 15848, 15848, 15848, 15848, 15848,
15848, 15848, 15848, 15849, 15849, 15849, 15849, 15849, 15849,
15849, 15850, 15850, 15850, 15850, 15850, 15850, 15850, 15851,
15851, 15851, 15851, 15851, 15851, 15851, 15851, 15852, 15852,
15852, 15852, 15852, 15852, 15852, 15853, 15853, 15853, 15853,
15853, 15853, 15853, 15853, 15853, 15854, 15854, 15854, 15854,
15854, 15854, 15854, 15854, 15855, 15855, 15855, 15855, 15855,
15855, 15855, 15855, 15855, 15855, 15855, 15855, 15855, 15855,
15856, 15856, 15856, 15856, 15856, 15856, 15856, 15856, 15856,
15856, 15856, 15856, 15856, 15857, 15857, 15857, 15857, 15857,
15857, 15857, 15857, 15857, 15857, 15857), class = "Date"), TagID = c(5717.06,
6277.06, 5073.06, 5717.06, 11121.1, 11191.1, 11387.1, 11415.1,
5717.06, 6277.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11317.1,
11387.1, 11415.1, 5717.06, 6277.06, 11191.1, 11219.1, 11289.1,
11387.1, 11415.1, 5717.06, 6277.06, 9015.01, 9833.06, 11191.1,
11219.1, 11289.1, 11387.1, 11415.1, 5717.06, 6277.06, 9015.01,
11191.1, 11219.1, 11289.1, 11387.1, 11415.1, 5641.22, 5717.06,
6221.06, 6277.06, 7909.22, 9015.01, 9833.06, 11121.1, 11191.1,
11219.1, 11289.1, 11317.1, 11387.1, 11415.1, 5717.06, 6277.06,
6529.06, 8119.01, 8545.06, 9015.01, 9497.06, 9833.06, 11191.1,
11219.1, 11289.1, 11387.1, 11415.1, 5717.06, 6277.06, 6529.06,
9015.01, 9497.06, 9833.06, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1), SiteVisits = list("Release", "Release", c("IC2", "IC1",
"Release"), "IC3", "WC2", "RGD1", c("WC1", "WC3"), "WC3", "IC3",
"IC3", "WC2", "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3",
"WC2", "RGD1", c("IC2", "IC1"), "IC1", "WC1", "WC3", "IC3",
"WC2", "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "WC2",
"RGD1", "IC2", "IC1", "WC1", "WC1", "WC3", "IC3", "IC3",
"RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "IC3", c("IC3",
"Release"), c("IC3", "IC2", "IC1", "Release"), "RGD1", "IC2",
"IC1", "WC1", "WC3", "IC3", "IC3", c("IC3", "IC2"), "RGD1",
"IC2", "IC1", "WC1", "WC3", "Release", "IC3", "Release",
"IC3", c("RGD1", "Release"), c("IC3", "IC2"), c("IC3", "IC1"
), "WC2", "RGD1", "IC2", "IC1", "WC1", "WC1", "WC3", "IC3",
"IC3", c("RGD1", "Release"), c("RGD1", "Release"), "Release",
c("IC3", "IC2", "IC1"), "Release", c("IC3", "IC2", "IC1",
"RGD1"), "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "IC3",
"RGD1", c("IC3", "IC2", "IC1"), "RGD1", c("IC3", "IC1", "RGD1"
), "RGD1", "IC2", c("IC2", "IC1"), "WC1", "WC3"), NumSites = c(1L,
1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 4L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L,
3L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 1L,
2L, 1L, 1L)), row.names = c(NA, -100L), groups = structure(list(
Date = structure(c(15828, 15847, 15848, 15849, 15850, 15851,
15852, 15853, 15854, 15855, 15856, 15857), class = "Date"),
.rows = list(1:2, 3:8, 9:16, 17:23, 24:30, 31:38, 39:45,
46:54, 55:62, 63:76, 77:89, 90:100)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
以下工作一次inside
和outside
存儲為array
而不是list
inside <- c("Release", "IC1", "IC2", "IC3", "RGD1")
outside <- c("ORS1", "WC1", "WC2", "WC3", "RGU1", "ORN1", "ORN2", "ORS3", "GL1", "CVP1", "CLRS")
df1$Location <- lapply(df1$SiteVisits, function(x) ifelse(all(x %in% inside), "INSIDE", ifelse(all(x %in% outside), "OUTSIDE", "TRANSIT")))
想要一個快 1/100 的答案? (不是打字錯誤*,這比 manotheshark 的答案更糟糕,但它適用於您的結構化數據)。 *這是一個錯字! 1/100 不是 1/10
for (i in 1:nrow(SiteVisit)) {
SiteVisit_test$Location[i] <- if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(inside))) {
"INSIDE"
} else if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(outside))) {
"OUTSIDE"
} else {"TRANSITIONING"}
}
兩種方法的基准:
microbenchmark(
for_statement = for (i in 1:nrow(SiteVisit)) {
SiteVisit_test$Location[i] <- if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(inside))) {
"INSIDE"
} else if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(outside))) {
"OUTSIDE"
} else {"TRANSITIONING"}
},
lapply_statemnt = lapply(SiteVisit$SiteVisits, function(x) ifelse(all(x %in% inside2), "INSIDE", ifelse(all(x %in% outside2), "OUTSIDE", "TRANSIT")))
)
Unit: microseconds
expr min lq mean median uq max neval
for_statement 28874.4 30082.0 32411.968 31008.3 33108.90 48878.1 100
lapply_statemnt 268.4 284.2 346.201 295.5 310.85 4114.9 100
我真的不明白為什么 lapply 方法在這里要快得多......可能是因為我正在為循環中的每個 i 取消列出。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.