简体   繁体   English

使用 r 中的图表比较不同频率的值

[英]Comparing value for different frequencies using graphs in r

My data looks like this:我的数据如下所示:

        ID       Date                                          Reduction     Collected Provided         Freq Gender
1   AAA016000 2018-04-10                                           0              0              7        1   <NA>
2   AAA059717 2017-03-21                                           1              0             45       10 Female
3   AAA059717 2017-04-22                                           0              0             10       10 Female
4   AAA059717 2017-05-09                                           0              0             10        2 Female
5   AAA059717 2017-06-09                                           1              0             40        6 Female
6   AAA059717 2018-07-03                                          NA            180            200       35 Female
7   AAA059717 2018-09-26                                          NA             10             30       15 Female
8   AAA059717 2018-09-26                                           1             NA             NA       NA Female
9   AAA059717 2018-10-12                                          NA              0             20        3 Female
10  AAA059717 2018-11-07                                          NA             30             50       20 Female
11  AAA059717 2018-11-07                                           0             NA             NA       NA Female
12  AAA059717 2018-11-08                                          NA              2             20       10 Female

'data.frame':   190122 obs. of  7 variables:
 $ ID                                         : chr  "AAA016000" "AAA059717" "AAA059717" "AAA059717" ...
 $ Date                                       : Date, format: "2018-04-10" "2017-03-21" "2017-04-22" "2017-05-09" ...
 $ Reduction                                  : num  0 1 0 0 1 NA NA 1 NA NA ...
 $ Collected                                  : num  0 0 0 0 0 180 10 NA 0 30 ...
 $ Provided                                   : num  7 45 10 10 40 200 30 NA 20 50 ...
 $ Freq                                       : num  1 10 10 2 6 35 15 NA 3 20 ...
 $ Gender                                     : chr  NA "Female" "Female" "Female" ...

And when i try to find out if higher freq also has higher Provided, i did this:当我试图找出更高的频率是否也有更高的提供时,我这样做了:

ggplot(data = df, aes(x = Freq, y = Provided)) + 
  geom_point()+
  geom_line()

But the graph doesn't look right??但是图表看起来不对?? 图形

How do i make a better graph to visualize if higher freq has higher provided than lower freq?如果较高的频率比较低的频率提供的更高,我如何制作更好的图表来可视化? and lastly, How do I visualize whether a freq of 10 or over is Provided more often than freq under 10?最后,我如何想象 10 或以上的频率是否比低于 10 的频率更频繁地提供? Thank you for your response, I apreciate it.谢谢你的回复,我很感激。

There is a strong significant linear correlation between Freq and Provided (Pearson, effect size R = 0.89, p < 0.001). FreqProvided之间存在很强的显着线性相关性(Pearson,效果大小 R = 0.89,p < 0.001)。

Frequencies above or equal to 10 have not significantly higher provided values (Wilcoxon rank sum test, p = 0.16).高于或等于 10 的频率没有明显更高的提供值(Wilcoxon 秩和检验,p = 0.16)。 Keep in mind that this discretization of the Freq variable into two binary categories (high and low) is often arbitrary and significance can be highly depended on the threshold (here 10).请记住,将 Freq 变量离散化为两个二元类别(高和低)通常是任意的,并且重要性可能高度依赖于阈值(此处为 10)。

library(tidyverse)
library(ggpubr)

df <- tribble(
  ~row_id, ~ID, ~Date, ~Reduction, ~Collected, ~Provided, ~Freq, ~Gender,
  1, "AAA016000", " 2018-04-10", 0, 0, 7, 1, NA,
  2, "AAA059717", " 2017-03-21", 1, 0, 45, 10, "Female",
  3, "AAA059717", "2017-04-22", 0, 0, 10, 10, "Female",
  4, "AAA059717", "2017-05-09", 0, 0, 10, 2, "Female",
  5, "AAA059717", "2017-06-09", 1, 0, 40, 6, "Female",
  6, "AAA059717", "2018-07-03", NA, 180, 200, 35, "Female",
  7, "AAA059717", "2018-09-26", NA, 10, 30, 15, "Female",
  8, "AAA059717", "2018-09-26", 1, NA, NA, NA, "Female",
  9, "AAA059717", "2018-10-12", NA, 0, 20, 3, "Female",
  10, "AAA059717", "2018-11-07", NA, 30, 50, 20, "Female",
  11, "AAA059717", "2018-11-07", 0, NA, NA, NA, "Female",
  12, "AAA059717", "2018-11-08", NA, 2, 20, 10, "Female"
)

df %>%
  ggplot(aes(Freq, Provided)) +
  geom_point() +
  stat_smooth(method = "lm") +
  stat_cor(method = "pearson")
#> `geom_smooth()` using formula 'y ~ x'
#> Warning: Removed 2 rows containing non-finite values (stat_smooth).
#> Warning: Removed 2 rows containing non-finite values (stat_cor).
#> Warning: Removed 2 rows containing missing values (geom_point).

df %>%
  mutate(high_Freq = Freq >= 10) %>%
  filter(!is.na(high_Freq)) %>%
  ggplot(aes(high_Freq, Provided)) +
  geom_boxplot() +
  stat_compare_means()

Created on 2021-11-10 by the reprex package (v2.0.1)reprex 包(v2.0.1) 于 2021 年 11 月 10 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM