[英]How to sum values from one column based on specific conditions from other column in R?
I have a dataset that looks something like this:我有一个看起来像这样的数据集:
df <- data.frame(plot = c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C"),
species = c("Fagus","Fagus","Quercus","Picea", "Abies","Fagus","Fagus","Quercus","Picea", "Abies","Fagus","Fagus","Quercus","Picea", "Abies"),
value = sample(100, size = 15, replace = TRUE))
head(df)
plot species value
1 A Fagus 53
2 A Fagus 48
3 A Quercus 5
4 A Picea 25
5 A Abies 12
6 B Fagus 12
Now, I want to create a new data frame containing per plot
values for share.conifers
and share.broadleaves
by basically summing the values
with conditions applied for species
.现在,我想创建一个新的数据框,其中包含
share.conifers
和share.broadleaves
的每个plot
值,方法是将values
与适用于species
的条件相加。 I thought about using case_when
but I am not sure how to write the syntax:我考虑过使用
case_when
但我不确定如何编写语法:
df1 <- df %>% share.broadleaves = case_when(plot = plot & species = "Fagus" or species = "Quercus" ~ FUN="sum")
df1 <- df %>% share.conifers = case_when(plot = plot & species = "Abies" or species = "Picea" ~ FUN="sum")
I know this is not right, but I would like something like this.我知道这是不对的,但我想要这样的东西。
Using dplyr
/ tidyr
:使用
dplyr
/ tidyr
:
First construct the group, do the calculation and then spread into columns.首先构建组,进行计算,然后散布到列中。
library(dplyr)
library(tidyr)
df |>
mutate(type = case_when(species %in% c("Fagus", "Quercus") ~ "broadleaves",
species %in% c("Abies", "Picea") ~ "conifers")) |>
group_by(plot, type) |>
summarise(share = sum(value)) |>
ungroup() |>
pivot_wider(values_from = "share", names_from = "type", names_prefix = "share.")
Output: Output:
# A tibble: 3 × 3
plot share.broadleaves share.conifers
<chr> <int> <int>
1 A 159 77
2 B 53 42
3 C 204 63
I am not sure if you want to sum or get the share, but the code could easily be adapted to whatever goal you have.我不确定你是想求和还是分享,但代码可以很容易地适应你的任何目标。
One way could just be summarizing by plot
and species
:一种方法可能只是通过
plot
和species
进行总结:
library(dplyr)
df |>
group_by(plot, species) |>
summarize(share = sum(value))
If you really want to get the share of a specific species per plot you could also do:如果你真的想根据 plot 获得特定物种的份额,你也可以这样做:
df |>
group_by(plot) |>
summarize(share_certain_species = sum(value[species %in% c("Fagus", "Quercus")]) / sum(value))
which gives:这使:
# A tibble: 3 × 2
plot share_certain_species
<chr> <dbl>
1 A 0.546
2 B 0.583
3 C 0.480
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.