[英]Comparing value for different frequencies using graphs in r
我的数据如下所示:
ID Date Reduction Collected Provided Freq Gender
1 AAA016000 2018-04-10 0 0 7 1 <NA>
2 AAA059717 2017-03-21 1 0 45 10 Female
3 AAA059717 2017-04-22 0 0 10 10 Female
4 AAA059717 2017-05-09 0 0 10 2 Female
5 AAA059717 2017-06-09 1 0 40 6 Female
6 AAA059717 2018-07-03 NA 180 200 35 Female
7 AAA059717 2018-09-26 NA 10 30 15 Female
8 AAA059717 2018-09-26 1 NA NA NA Female
9 AAA059717 2018-10-12 NA 0 20 3 Female
10 AAA059717 2018-11-07 NA 30 50 20 Female
11 AAA059717 2018-11-07 0 NA NA NA Female
12 AAA059717 2018-11-08 NA 2 20 10 Female
'data.frame': 190122 obs. of 7 variables:
$ ID : chr "AAA016000" "AAA059717" "AAA059717" "AAA059717" ...
$ Date : Date, format: "2018-04-10" "2017-03-21" "2017-04-22" "2017-05-09" ...
$ Reduction : num 0 1 0 0 1 NA NA 1 NA NA ...
$ Collected : num 0 0 0 0 0 180 10 NA 0 30 ...
$ Provided : num 7 45 10 10 40 200 30 NA 20 50 ...
$ Freq : num 1 10 10 2 6 35 15 NA 3 20 ...
$ Gender : chr NA "Female" "Female" "Female" ...
当我试图找出更高的频率是否也有更高的提供时,我这样做了:
ggplot(data = df, aes(x = Freq, y = Provided)) +
geom_point()+
geom_line()
如果较高的频率比较低的频率提供的更高,我如何制作更好的图表来可视化? 最后,我如何想象 10 或以上的频率是否比低于 10 的频率更频繁地提供? 谢谢你的回复,我很感激。
Freq
和Provided
之间存在很强的显着线性相关性(Pearson,效果大小 R = 0.89,p < 0.001)。
高于或等于 10 的频率没有明显更高的提供值(Wilcoxon 秩和检验,p = 0.16)。 请记住,将 Freq 变量离散化为两个二元类别(高和低)通常是任意的,并且重要性可能高度依赖于阈值(此处为 10)。
library(tidyverse)
library(ggpubr)
df <- tribble(
~row_id, ~ID, ~Date, ~Reduction, ~Collected, ~Provided, ~Freq, ~Gender,
1, "AAA016000", " 2018-04-10", 0, 0, 7, 1, NA,
2, "AAA059717", " 2017-03-21", 1, 0, 45, 10, "Female",
3, "AAA059717", "2017-04-22", 0, 0, 10, 10, "Female",
4, "AAA059717", "2017-05-09", 0, 0, 10, 2, "Female",
5, "AAA059717", "2017-06-09", 1, 0, 40, 6, "Female",
6, "AAA059717", "2018-07-03", NA, 180, 200, 35, "Female",
7, "AAA059717", "2018-09-26", NA, 10, 30, 15, "Female",
8, "AAA059717", "2018-09-26", 1, NA, NA, NA, "Female",
9, "AAA059717", "2018-10-12", NA, 0, 20, 3, "Female",
10, "AAA059717", "2018-11-07", NA, 30, 50, 20, "Female",
11, "AAA059717", "2018-11-07", 0, NA, NA, NA, "Female",
12, "AAA059717", "2018-11-08", NA, 2, 20, 10, "Female"
)
df %>%
ggplot(aes(Freq, Provided)) +
geom_point() +
stat_smooth(method = "lm") +
stat_cor(method = "pearson")
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 2 rows containing non-finite values (stat_smooth).
#> Warning: Removed 2 rows containing non-finite values (stat_cor).
#> Warning: Removed 2 rows containing missing values (geom_point).
df %>%
mutate(high_Freq = Freq >= 10) %>%
filter(!is.na(high_Freq)) %>%
ggplot(aes(high_Freq, Provided)) +
geom_boxplot() +
stat_compare_means()
由reprex 包(v2.0.1) 于 2021 年 11 月 10 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.