簡體   English   中英

ggplot2:兩個離散變量+個案百分比+連續變量

[英]ggplot2: two discrete variables + percentage of cases + continuos variable

我正在嘗試生成一個 plot,在 x 軸上有一個離散變量,在 y 軸上有另一個離散變量,點的顏色由val的平均值決定,大小由x中的個案比例決定。

數據如下所示:

df1 <- data.frame(y=c("a","b","c","a","d","a","a","c","d","a","b","c","a","d","a","a","c","d","d","a","b","c","a","d","a","a","c","d"),
                  x=c("x","y","z","t","r","x","x","x","y","z","t","r","r","x","y","z","t","r","x","x","y","z","t","r","r","x","r","x"),
                  val=c(1,4,1,6,3,6,2,7,8,2,5,7,2,8,5,8,6,4,2,4,5,7,6,5,4,4,3,3))

我試過 geom_count 和以下內容:

ggplot(data = df1, aes(x=x, y=y, fill=val))+
  stat_sum(aes(size=..prop.., group=x))+
  scale_size_area(max_size = 10)

plot

但是一定有一些我不知道的奇怪的覆蓋。 size 參數中產生的道具不正確,就好像我從 plot 中刪除了填充變量一樣,它們是不同的。 誰能幫我? 我仔細檢查了谷歌,但我沒有找到任何解決方案。

一種選擇是在ggplot之外計算填充值的計數、百分比和平均值,並使用geom_point到 plot 聚合數據:

library(ggplot2)
library(dplyr)

df2 <- df1 |> 
  group_by(x, y) |> 
  summarise(n = n(), val = mean(val)) |> 
  mutate(pct = n / sum(n)) |> 
  ungroup()
#> `summarise()` has grouped output by 'x'. You can override using the `.groups`
#> argument.

ggplot(df2, aes(x, y, size = pct, fill = val)) +
  geom_point(shape = 21) +
  scale_size_area(max_size = 10)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM