[英]Averaging values in R using if statements
對於示例數據框:
df1 <- structure(list(name = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "x", "y", "z", "a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "x", "y", "z", "a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "x", "y", "z"), amount = c(5.5, 5.4, 5.2, 5.3, 5.1,
5.1, 5, 5, 4.9, 4.5, 6, 5.9, 5.7, 5.4, 5.3, 5.1, 5.6, 5.4, 5.3,
5.6, 4.6, 4.2, 4.5, 4.2, 4, 3.8, 6, 5.8, 5.7, 5.6, 5.3, 5.6,
5.4, 5.5, 5.4, 5.1, 9, 8.8, 8.6, 8.4, 8.2, 8, 7.8, 7.6, 7.4,
7.2, 6, 5.75, 5.5, 5.25, 5, 4.75, 10, 8.9, 7.8, 6.7, 5.6, 4.5,
3.4, 2.3, 1.2, 0.1, 6, 5.8, 5.7, 5.6, 5.5, 5.5, 5.4, 5.6, 5.8,
5.1, 6, 5.5, 5.4, 5.3, 5.2, 5.1), decile = c(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L), time = c(2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("name", "amount",
"decile", "time"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-78L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character",
"collector")), amount = structure(list(), class = c("collector_double",
"collector")), decile = structure(list(), class = c("collector_integer",
"collector")), time = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("name", "amount", "decile", "time"
)), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
我最終希望生成一個ggplot圖表,其中按五分位數詳細說明每年的平均“金額”(即,每年數據的5個小條形圖)。
為此,我需要能夠計算五分位數(將十分位1和2、3和4、5和6、7和8以及9和10的所有值取平均值,並且還包括95%CI。
我過去曾嘗試過濾我的數據,但是我在努力使用if語句將其概念化。
任何幫助,將不勝感激。
您可以使用dplyr函數通過管道執行此操作,方法是將除以2並四舍五入將十分位數轉換為五分位數。 在這里,我只是做了一個非常快而骯臟的置信區間2 x標准偏差,但是您可能需要其他方法。
library(dplyr)
library(ggplot2)
plot_data <- df1 %>%
mutate(quintile = ceiling(decile/2)) %>%
group_by(time, quintile) %>%
summarize(average_amount = mean(amount),
sd_amount = sd(amount),
ci_min = average_amount - 2 * sd_amount,
ci_max = average_amount + 2 * sd_amount)
這是一個(丑陋的)ggplot,帶有按年和五分位數划分的條形圖。
ggplot(plot_data, aes(x = quintile, y = average_amount)) +
geom_col() +
geom_errorbar(aes(ymin = ci_min, ymax = ci_max)) +
facet_wrap(~ time)
如果您只是在尋找平均值,請嘗試以下操作:
library(tidyverse)
df1 %>%
mutate(quintile = floor((decile - 1) / 2) + 1) %>%
group_by(time, quintile) %>%
summarise(AvgAmount = mean(amount)) %>%
ggplot(aes(quintile, AvgAmount)) +
geom_bar(stat = "identity") +
facet_grid(time ~ .)
如果您想更好地了解五分位數內的分布,可以使用箱形圖:
df1 %>%
mutate(quintile = floor((decile - 1) / 2) + 1) %>%
ggplot(aes(quintile, amount, group = quintile)) +
geom_boxplot() +
facet_grid(time ~ .)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.