[英]nested loops through a structured list in R
我有一個示例數據集garden
,如下所示。 真正的東西是數千行。 我也有一個示例列表。 productFruit
。 我想知道每個fruit
的calories
,考慮到garden
報告的usage
。 我基本上想遍歷表中的所有行,檢查使用情況是否recorded
在productFruit
列表中,並返回calories
或以下錯誤消息之一:
productFruit
列表中沒有發現usage
則“使用范圍外”productFruit
列表中未發現任何usage
則為“超出范圍的水果” garden
:
fruit = c("Apple", "Kiwi", "Banana", "Orange", "Blueberry")
usage = c("cooking", "cooking", "NA", "drinking", "medicine")
reported = c(200, 500, 77, 520, 303)
garden <- cbind(fruit, usage, reported)
garden <- as.data.table(garden)
productFruit
:
productFruit <- list(Basket = c('DUH'),
type = list (
Apple = list(ID = 1,
color = "poor",
usage = list(eating = list(ID = 1,
quality = "good",
calories = 500),
medicine = list(ID = 2,
quality = "poor",
calories = 300))),
Orange = list(ID = c(1,2,3),
color = c(3,4,5),
usage = list(eating = list(ID = 1,
quality = "poor",
calories = 420),
cooking = list(ID = 2,
quality = "questionable",
calories = 600),
drinking = list(ID = 3,
quality = "good",
calories = 800),
medicine = list(ID = 4,
quality = "good",
calories = 0))),
Banana = list(ID = c(1,2,3),
color = c(3,4,5),
usage = list(cooking = list(ID = 1,
quality = "good",
calories = 49),
drinking = list(ID = 2,
quality = "questionable",
calories = 11),
medicine = list(ID = 3,
quality = "poor",
calories = 55)))))
我試圖將其分解為更小的步驟並使用循環來執行此操作,但是我對lists
經驗很少,並且遇到了很多錯誤。 任何想法如何以有效和可讀的方式解決這個問題? 下面是我嘗試匹配fruits
眾多嘗試之一。 我知道該字段不匹配,我只是想讓循環運行...
for (i in seq_len(nrow(garden))){
if (garden$fruit[i] == productFruit$type){
garden$calories = productFruit$type[[i]]$ID
}
garden$calories = "error"
}
所需的輸出是這樣的:
fruit = c("Apple", "Kiwi", "Banana", "Orange", "Blueberry")
usage = c("cooking", "cooking", "NA", "drinking", "medicine")
reported = c(200, 500, 77, 520, 303)
calories = c("usage out of scope", "fruit out of scope", "erroneous data", 800, "fruit out of scope")
garden_with_calories <- cbind(fruit, usage, reported, calories)
garden_with_calories <- as.data.table(garden)
對於大型數據集,不建議使用for
循環。 以下代碼是替代方案
步驟 1 檢查產品列表中是否存在水果
fruitExist <- fruit %in% names(productFruit$type)
Step 2 針對每個水果,檢查產品列表中是否存在相應的用法
usageExist <- sapply(fruit, function(f){
sapply(usage, `%in%`, x = names(productFruit$type[[f]][["usage"]]))})
usageExist <- as.data.frame(unique(sapply(usageExist[sapply(usageExist, is.logical)], colSums)))
usageExist$usage <- row.names(usageExist)
步驟 3 提取卡路里
calories <- data.frame(unique(
sapply(fruit, function(f){
sapply(usage, function(u){productFruit$type[[f]][["usage"]][[u]][["calories"]]})}
)))
calories <- unlist(as.data.frame(unique(
sapply(fruit, function(f){
sapply(usage, function(u){productFruit$type[[f]][["usage"]][[u]][["calories"]]})}
))))
calories <- as.data.frame(calories)
names(calories) <- "cal"
calories$fruitUsage <- row.names(calories)
第 4 步合並並完成
library(tidyverse)
garden %>%
mutate(fruitExist = fruitExist) %>%
left_join(usageExist %>% pivot_longer(-usage, names_to = "fruit", values_to = "usageExist")) %>%
left_join(calories %>% separate(fruitUsage, c("fruit","usage"))) %>%
mutate(calories = case_when(
fruit == "NA" | usage == "NA" ~ "erroneous data",
usageExist == FALSE ~ "usage out of scope",
fruitExist == FALSE ~ "fruit out of scope",
TRUE ~ as.character(cal))) %>%
select(fruit, usage, reported, calories)
輸出
garden
# fruit usage reported calories
# 1 Apple cooking 200 usage out of scope
# 2 Kiwi cooking 500 fruit out of scope
# 3 Banana NA 77 erroneous data
# 4 Orange drinking 520 800
# 5 Blueberry medicine 303 fruit out of scope
以前的代碼
嘗試這個:
cal <- as.character()
for(i in 1:length(fruit)){
fruitName <- fruit[i]
usageName <- usage[i]
if(fruitName == "NA" | usageName == "NA") {
out <- "erroneous data"
} else if(!(fruitName %in% names(productFruit[["type"]]))){
out <- "fruit out of scope"
} else if(!(usageName %in% names(productFruit[["type"]][[fruitName]][["usage"]]))){
out <- "usage out of scope"
} else {
out <- productFruit[["type"]][[fruitName]][["usage"]][[usageName]][["calories"]]
}
cal <- c(cal, out)
}
garden$calories <- cal
garden
# fruit usage reported calories
# 1: Apple cooking 200 usage out of scope
# 2: Kiwi cooking 500 fruit out of scope
# 3: Banana NA 77 erroneous data
# 4: Orange drinking 520 800
# 5: Blueberry medicine 303 fruit out of scope
從嵌套列表中提取數據可能非常乏味。 以下是一些適用於您提供的示例的代碼,但如果您的條目與示例數據不同,可能仍然會遇到困難。 您可能必須使其更加健壯並檢查數據是否具有您期望的class
等。
library(tidyverse)
我們創建了一些代碼,一次提取一個水果:
# this creates a tibble with a column for each usage entry (eating, drinking,
# etc.)
type_df <- as.tibble(productFruit$type[[1]]$usage)
# With map*() we apply as.tibble() to each column to get a one-row data frame
# per "usage" case. We use map_dfr() in order to bind togeter the resulting
# rows into one dataframe. This is the line that might need to be made more
# robust in order to not fail on unexpected input.)
res <- map_dfr(type_df, as.tibble, .id = "usage")
# When there is no usage entry, `res` will be empty and we create a dummy
# dataframe for that case that has `NA` for the "colories" column.
if (nrow(res) < 1)
tibble(calories = NA)
else
res
現在我們把前面的幾行放到一個函數中,這樣我們就可以把它應用到所有的水果上。
extract_fruit_data <-
function(fruit) {
type_df <- as.tibble(fruit$usage)
res <- map_dfr(type_df, as.tibble, .id = "usage")
if (nrow(res) < 1)
tibble(calories = NA)
else
res
}
我們將extract_fruit_data
應用於每個水果的條目,並使用map_dfr()
將結果行綁定在一起。 然后我們刪除並重命名一些變量,為下一步做准備。
fruits_df <-
map_dfr(productFruit$type, extract_fruit_data, .id = "type") %>%
select(-ID, -quality) %>%
rename(fruit = type)
我們使用left_join()
將兩個數據集連接left_join()
,這樣花園中的每個條目都會保留下來,而在fruits_df 中不匹配的條目在卡路里列中獲得NA
。 使用case_when()
我們根據您的規格對每一列進行分類
left_join(garden, fruits_df) %>%
mutate(calories = case_when(
usage == "NA" ~ "erroneous data",
!fruit %in% fruits_df$fruit ~ "fruit out of scope",
is.na(calories) ~ "usage out of scope",
TRUE ~ as.character(calories)
))
我在Base R
中制作了這段代碼,它只查找和報告實際存在的水果及其各自的用法。 我知道這不完全是你所要求的,但當我意識到這有點太晚了。 它與其他提議的解決方案截然不同。
FruitNames <- unlist(lapply(productFruit,names)[2])
UsageByFruit <- lapply(FruitNames, function(X) names(productFruit[["type"]][[X]][["usage"]]))
LengthByFruit<- lapply(UsageByFruit, length)
gardenlength <- sum(unlist(LengthByFruit))
garden <- data.frame(matrix(ncol=3,nrow=gardenlength, dimnames=list(NULL, c("Fruit", "Usage", "Calories"))))
garden[,2] <- unlist(UsageByFruit)
garden[,1] <- unlist(lapply(1:length(FruitNames), function(X) replicate(LengthByFruit[[X]],FruitNames[X])))
garden[,3] <- unlist(lapply(1:length(FruitNames), function(X) unlist(lapply(unlist(UsageByFruit[X]), function(Y) productFruit[["type"]][[FruitNames[X]]][["usage"]][[Y]][["calories"]] ))))
輸出:
> garden
Fruit Usage Calories
1 Apple eating 500
2 Apple medicine 300
3 Orange eating 420
4 Orange cooking 600
5 Orange drinking 800
6 Orange medicine 0
7 Banana cooking 49
8 Banana drinking 11
9 Banana medicine 55
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.