ggplot 顏色、形狀和大小按 dataframe 中的因子變量在幾個帶有圖例的區域

Question

我有以下 dataframe：

structure(list(PS_position = c(54733745L, 54736536L, 54734312L, 54735312L, 54733745L, 54736536L, 54734312L, 54735312L),
           chr_key = c(19L,19L, 19L, 19L, 19L, 19L, 19L, 19L),
           hit_count = c(20L, 1L, 5L,15L, 20L, 1L, 5L, 15L),
           pconvert = c(0.448, 0.55, 0.8, 0.92, 0.448, 0.55, 0.8, 0.92),
           probe_type = c("Non_polymorphic", "preselected", "unvalidated", "validated", "Non_polymorphic", "preselected", "unvalidated", "validated"),
           region_name = c("DL1", "DL1", "DL1", "DL1", "DL2", "DL2", "DL2", "DL2"),
           start = c(54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L),
           stop = c(54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L)),
      row.names = c(NA, -8L), class = c("data.table",   "data.frame"))

我想 plot PS_position在 x 軸上的每個region_name上用probe_type着色，形狀基於pconvert類別（0.3 - 0.5、0.51-0.7、0.71-0.9、> 0.9）和形狀的大小基於所有唯一的hit_count dataframe 中的region_names和描述相同的圖例。 xlim的 xlim 將從 dataframe start / stop 。

有點像這樣：

當然，每個唯一region_name的實際值會有所不同。 關於如何最好地實現這一目標的任何想法？ 謝謝！

編輯：我在基礎 R 中開發了一些沒有hitcount或pconvert

region = unique(df$region_name)
for(i in seq_along(region))
{
probes = df$PS_position
probe_type = factor(df$probe_type)
df$cols = as.numeric(as.factor(df$probe_type))
legend.cols = as.numeric(as.factor(levels(df$probe_type)))


#should also send the start and stop into PS_position 
cols = c("black", "blue", "green", "yellow")
#Use logarithmic scale
par(xpd = T)

plot(1, 1, ylim = c(0.5, length(probes)), xlim = c(min(probes) - 20, max(probes)+10),#, main = paste("Probes ", region, sep = ""), 
     xlab = "PS_position", bty="n", type = "n", yaxt = "n", ylab = "")

title(region[i], line=0)

begin = min(probes)
end = max(probes)
n = length(probes)

然后我依次 plot 探測器一個接一個，但我不再需要那個了。 我只想一次 plot 所有PS_position並且它們應該反映這些范圍內的實際start-stop和相對 position。 注意上面和下面的基本 R 代碼是一個塊。 請復制粘貼在一起。

for(i in 1:length(probes))
{
  lines(x = c(begin, end), y = c(n+1-i, n+1-i), col = "blue", lwd = .8)
  xs = probes[1:i]
  #cols_i = cols[probe_type[1:i]]
  points(x = xs, y = rep(n+1-i, length(xs)), pch = 18, cex = 1.0, col = df$cols)
  text(i, x = -50, y = n+1-i, adj = 1.5)
 
}
add_legend("topright", "Probe_Type", levels(probe_type), fill = legend.cols, horiz=T)

}

dev.off()

試圖將其轉換為ggplot2

Answer 1

這個怎么樣：

我已經獲取了您的數據並添加了分類pconvert_cat變量：

# comparison of the two variables:
> df[, c(4, 9)]
  pconvert pconvert_cat
1    0.448      0.3-0.5
2    0.550     0.51-0.7
3    0.800     0.71-0.9
4    0.920         >0.9
5    0.448      0.3-0.5
6    0.550     0.51-0.7
7    0.800     0.71-0.9
8    0.920         >0.9

我已經嘗試使用 ggplot2 從您的問題中得到您想要的ggplot2 。 本質上，您希望按region_name分面，然后將所有其他變量設置為您在問題中提到的給定美學（ aes ）。

ggplot(df, aes(x = PS_position, y = 0,
               colour = probe_type, shape = pconvert_cat, size = hit_count)) +
        geom_point() +
        scale_shape_manual(values = c(3, 15, 16, 17)) +
        coord_cartesian(xlim = c(min(df$start), max(df$stop))) +
        facet_wrap(~ region_name, nrow = 2) +
        theme_minimal() + theme(panel.grid = element_blank(),
                                axis.title.y = element_blank(),
                                axis.text.y = element_blank(),
                                axis.ticks.y = element_blank())

這是它的樣子：

這可能並不理想。 我不知道任何geom_...() function 會簡單地繪制點之間的“x 差異”而不用打擾 y 軸。 SO社區，我們可以做這樣的事情嗎？ 當然，這取決於您是否也需要 y 軸的任何變量。

假設您希望所有東西都在同一個水平面上，我將 y 設置為常數 (0)。 也許您可以設置y = chr_key ，因為我注意到它是恆定的（至少在這個小數據集中）？

此外，設置xlim = c(min(df$start), max(df$stop)意味着您的所有點都在右側，如您在上面看到的。除非您特別想要這個，否則可以考慮使用coord_cartesian() ：

ggplot(df, aes(x = PS_position, y = 0,
               colour = probe_type, shape = pconvert_cat, size = hit_count)) +
        geom_point() +
        scale_shape_manual(values = c(3, 15, 16, 17)) +
        facet_wrap(~ region_name, nrow = 2) +
        theme_minimal() + theme(panel.grid = element_blank(),
                                axis.title.y = element_blank(),
                                axis.text.y = element_blank(),
                                axis.ticks.y = element_blank())

要得到這個：

點的 x 值之間的差異在這里更清楚。

需要考慮的一些事項：

你會給y軸分配一些變量嗎？ 會一直不變嗎？
對於給定的probe_type和pconvert_cat值，會有不止一個觀察結果嗎？ 如果是這樣， colour和shape美學將發揮更大的作用。
您需要特定的 x 范圍嗎？ 您想讓 x 差異盡可能清晰。

最后，我非常同意 Rémi 的評論，即您應該讓我們知道您已經嘗試過什么。 那么我不必在答案中猜測太多。

編輯

在回復您的評論時，使用facet_wrap()並不意味着比例是固定的。 在您的情況下，您可以將scales參數更改為"free_x" ，以便您可以為每個region_name設置不同的start和stop值。 有關不同刻面尺度的更多信息，請查看此處。 您可能希望使用該頁面上討論的geom_blank() 。 您必須決定列出的哪些方法最適合您的數據。 請注意，當您為更多region_name添加更多構面並僅保留一列構面時，它們應該更靠近在一起，並且在那里擁有 y 比例的問題將變得不那么重要，因為不會有那么多空白空間。 （因此，例如，您有五個不同的region_name並且您設置nrow = 5 。）

總之，我認為我的代碼以及您可以決定的一些方面比例更改對 go 來說是好的。

數據

df <- structure(list(PS_position = c(54733745L, 54736536L, 54734312L, 54735312L, 54733745L, 54736536L, 54734312L, 54735312L),
               chr_key = c(19L,19L, 19L, 19L, 19L, 19L, 19L, 19L),
               hit_count = c(20L, 1L, 5L,15L, 20L, 1L, 5L, 15L),
               pconvert = c(0.448, 0.55, 0.8, 0.92, 0.448, 0.55, 0.8, 0.92),
               probe_type = c("Non_polymorphic", "preselected", "unvalidated", "validated", "Non_polymorphic", "preselected", "unvalidated", "validated"),
               region_name = c("DL1", "DL1", "DL1", "DL1", "DL2", "DL2", "DL2", "DL2"),
               start = c(54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L, 54724479L),
               stop = c(54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L, 54736536L)),
          row.names = c(NA, -8L), class = c("data.table",   "data.frame"))
df$pconvert_cat <- as.factor(ifelse(df$pconvert >= 0.3 & df$pconvert <= 0.5, "0.3-0.5",
                                    ifelse(df$pconvert > 0.5 & df$pconvert <= 0.7, "0.51-0.7",
                                           ifelse(df$pconvert > 0.7 & df$pconvert <= 0.9, "0.71-0.9", ">0.9"))))

ggplot 顏色、形狀和大小按 dataframe 中的因子變量在幾個帶有圖例的區域

問題描述

1 個解決方案

解決方案1
1 已采納 2020-06-25 09:15:32

ggplot 顏色、形狀和大小按 dataframe 中的因子變量在幾個帶有圖例的區域

問題描述

1 個解決方案

解決方案1 1 已采納 2020-06-25 09:15:32

解決方案1
1 已采納 2020-06-25 09:15:32