[英]R add column where cell values based on values in a different row
我有一个data.frame
,其中每一行表示是否在特定位置发现了动物。
我想在此示例data.frame
中创建一个标记为"prey"
的新列。 该值将为1或0,具体取决于在同一位置(每个位置都有唯一的ID
)发现掠食者的猎物。
问题在于每只动物都有单独的一行,因此有关捕食者存在的信息与捕食者不在同一行。 两种掠食者是狮子和猎豹。
对于此示例,狮子的猎物是羚羊和斑马,因此:
ID
1,由于在该位置发现了羚羊和狮子,因此猎物栏中的狮子行应为1。 ID
2,没有发现羚羊或斑马,因此狮子行的猎物列为0。 猎豹的猎物是羚羊,瞪羚,黑斑羚。
下面是示例data.frame
,我想出的解决方案效率很低,我正在寻找更快/更整洁的东西。
df <- data.frame(ID=c(1,1,1,1,1,1, 2, 2, 2, 2, 2, 2),
species=c("lion", "antelope", "zebra", "cheetah", "impala", "gazelles", "lion", "antelope", "zebra", "cheetah", "impala", "gazelles"),
present=c(1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1),
stringsAsFactors=FALSE)
k=list(list())
for (i in 1:2) { ### for loop ofr 2 unique IDs
k[[i]]=df[which(df$ID == unique(df$ID[i])),]
k[[i]]$antelope=0
k[[i]]$zebra=0
k[[i]]$impala=0
k[[i]]$gazelle=0
k[[i]]$lionprey=0
k[[i]]$cheetahprey=0
k[[i]]$antelope[[1]]=ifelse(k[[i]]$pres[[2]]==1, 1, 0)
k[[i]]$zebra[[1]]=ifelse(k[[i]]$pres[[3]]==1, 1, 0)
k[[i]]$lionprey[[1]]=ifelse (k[[i]]$antelope[[1]] == 1 ||
k[[i]]$zebra[[1]] == 1, 1, 0)
k[[i]]$antelope[[4]]=ifelse(k[[i]]$pres[[2]]==1, 1, 0)
k[[i]]$gazelle[[4]]=ifelse(k[[i]]$pres[[6]]==1, 1, 0)
k[[i]]$impala[[4]]=ifelse(k[[i]]$pres[[5]]==1, 1, 0)
k[[i]]$cheetahprey[[4]]= ifelse(k[[i]]$antelope[[4]] == 1 ||
k[[i]]$gazelle[[4]] == 1 || k[[i]]$impala[[4]]==1, 1, 0)
}
k=do.call("rbind", k)
k$antelope=NULL
k$zebra=NULL
k$impala=NULL
k$gazelle=NULL
k$prey=k$lionprey+k$cheetahprey
k$lionprey=NULL
k$cheetahprey=NULL
考虑使用tidyr::spread
简化第一个数据帧的结构。
df <- df %>% spread(species, present)
#> ID antelope cheetah gazelles impala lion zebra
#>1 1 1 1 0 1 1 0
#>2 2 0 1 1 1 1 0
然后继续dplyr
。
df %>%
spread(species, present) %>%
mutate(lion_prey = case_when(antelope == 1 | zebra == 1 ~ 1,
TRUE ~ 0),
cheetah_prey = case_when(antelope == 1 | gazelles == 1 | impala == 1 ~ 1,
TRUE ~ 0)) %>%
gather(species, present, -ID, -lion_prey, -cheetah_prey) %>%
mutate(prey = case_when(species == "lion" ~ lion_prey,
species == "cheetah" ~ cheetah_prey,
TRUE ~ 0)) %>%
select(-lion_prey, -cheetah_prey)
#> ID species present prey
#> 1 1 antelope 1 0
#> 2 2 antelope 0 0
#> 3 1 cheetah 1 1
#> 4 2 cheetah 1 1
#> 5 1 gazelles 0 0
#> 6 2 gazelles 1 0
#> 7 1 impala 1 0
#> 8 2 impala 1 0
#> 9 1 lion 1 1
#> 10 2 lion 1 0
#> 11 1 zebra 0 0
#> 12 2 zebra 0 0
出于您描述的原因,这涉及到一些混乱的逻辑表达式,但这是一种实现方法。 这具有可推广的优点。 如果要添加捕食者,只需将它们添加到predators
然后将其猎物添加到predators_prey
。 predators_prey
是一个列表,用于容纳具有不同猎物数量的捕食者(如此处所示):
# define the predators
predators <- c("lion", "cheetah")
# create a list of their prey from which to programmatically extract
predators_prey <- list(lion = c("antelope", "zebra"), cheetah = c("antelope", "gazelles", "impala"))
# initialize the $prey column
df$prey <- 0
# use for loop because we're assigning a value in global env
for (predator in predators ){
for (ID in unique(df$ID)){
# is the predator here?
predator_here = df[df$ID == ID & df$species == predator,]$present
# is that predator's prey here?
prey_here = any(df[df$ID == ID & df$present == 1,]$species %in% predators_prey[[predator]])
# if both, then set $prey to 1
if(predator_here & prey_here){
df[df$ID == ID & df$species == predator,]$prey <- 1
}
}
}
# lets look at the result
df
# ID species present prey
# 1 1 lion 1 1
# 2 1 antelope 1 0
# 3 1 zebra 0 0
# 4 1 cheetah 1 1
# 5 1 impala 1 0
# 6 1 gazelles 0 0
# 7 2 lion 1 0
# 8 2 antelope 0 0
# 9 2 zebra 0 0
# 10 2 cheetah 1 1
# 11 2 impala 1 0
# 12 2 gazelles 1 0
数据:
df <- data.frame(ID=c(1,1,1,1,1,1, 2, 2, 2, 2, 2, 2),
species=c("lion", "antelope", "zebra", "cheetah", "impala", "gazelles", "lion", "antelope", "zebra", "cheetah", "impala", "gazelles"),
present=c(1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1),
stringsAsFactors=FALSE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.