簡體   English   中英

R 中列聯表的維恩圖

[英]Venn diagrame from contingency table in R

我有一個像列聯表這樣的數據,它顯示了豐富的數據,但我想從這個數據幀中繪制維恩圖。

我的數據結構:

species_abundance<-data.frame(Genus = c("Parasphingorhabdus", "Loktanella", "Cytobacillus", "Paracoccus", "Paucisalibacillus", "Kytococcus", "Salinibacterium", "Acinetobacter baumanni","Marinococcus","Bacillus"),
               S3 = c(0, 0, 1, 1, 0, 0, 1,0,4,0),
               S5 = c(0, 0, 0, 1, 1, 0, 1,0,3,5),
               S7 = c(3, 1, 0, 2, 0, 1, 0,0,3,1),
               S9 = c(0, 1, 0, 3, 0, 0, 0,1,2,0)

我如何從此數據框中繪制維恩圖,以便在不同站點(S3、S5、S7 ......)中找到獨特和共享的物種?

如果我按照下面給出的方式轉換數據並嘗試使用 Venny2,我會得到這樣的圖像,類似的圖像並發現我想使用 R 做,請幫忙

species_abundance1<-data.frame(S3 = c("", "", "Cytobacillus", "Paracoccus", "", "", "Salinibacterium","", "Marinococcus", ""),
                          S5 = c("", "", "", "Paracoccus", "Paucisalibacillus", "", "Salinibacterium","", "Marinococcus","Bacillus"),
                          S7 = c("Parasphingorhabdus", "Loktanella", "", "", "", "Kytococcus", "","", "Marinococcus","Bacillus"),
                          S9 = c("", "Loktanella", "", "", "", "", "","Acinetobacter baumanni", "Marinococcus",""))

在此處輸入圖像描述

在 R 中有幾種方法可以得到 4 變量維恩圖,但超過這個類別的維恩圖非常復雜,不是可視化數據的好方法。 以下是來自維基共享資源的 5 類維恩圖示例:

在此處輸入圖像描述

7 類維恩甚至不能使用橢圓繪制,並且涉及復雜的花卉形狀,如鏈接文章中所示。

在任何情況下,您都可以看到即使有 5 個類別的 Venn 也不是一種非常用戶友好的數據表示方式。

在您的情況下,呈現此類數據的自然方式是通過熱圖。 您首先需要將數據重塑為長格式。

library(tidyverse)

species_abundance %>%
  pivot_longer(-Genus, names_to = 'Site', values_to = 'Count') %>%
  mutate(Site = factor(Site, unique(Site))) %>%
  ggplot(aes(Site, Genus, fill = factor(Count))) +
  geom_tile(color = 'black') +
  geom_text(aes(label = ifelse(Count == 0, '', Count))) +
  coord_equal() +
  scale_fill_manual(guide = 'none', 
                    values = c('white', 'lightyellow', 'yellow', 'orange')) +
  theme_minimal(base_size = 16)

在此處輸入圖像描述


附錄

如果你真的想要一個 5 類維恩圖來顯示 5 個站點共有的物種數量,你可以這樣做:

library(VennDiagram)

grid::grid.newpage()

with(sign(species_abundance[-1]),
     draw.quintuple.venn(sum(S3), sum(S5), sum(S7), sum(S9), sum(S10),
        sum(S3 == 1 & S5 == 1),  sum(S3 == 1 & S7 == 1),
        sum(S3 == 1 & S9 == 1),  sum(S3 == 1 & S10 == 1),
        sum(S5 == 1 & S7 == 1),  sum(S5 == 1 & S9 == 1),
        sum(S5 == 1 & S10 == 1), sum(S7 == 1 & S9 == 1),
        sum(S7 == 1 & S10 == 1), sum(S9 == 1 & S10 == 1),
        sum(S3 == 1 & S5 == 1 & S7 == 1),
        sum(S3 == 1 & S5 == 1 & S9 == 1),
        sum(S3 == 1 & S5 == 1 & S10 == 1),
        sum(S3 == 1 & S7 == 1 & S9 == 1),
        sum(S3 == 1 & S7 == 1 & S10 == 1),
        sum(S3 == 1 & S9 == 1 & S10 == 1),
        sum(S5 == 1 & S7 == 1 & S9 == 1),
        sum(S5 == 1 & S7 == 1 & S10 == 1),
        sum(S5 == 1 & S9 == 1 & S10 == 1),
        sum(S7 == 1 & S9 == 1 & S10 == 1),
        sum(S3 == 1 & S5 == 1 & S7 == 1 & S9 == 1),
        sum(S3 == 1 & S5 == 1 & S7 == 1 & S10 == 1),
        sum(S3 == 1 & S5 == 1 & S9 == 1 & S10 == 1),
        sum(S3 == 1 & S7 == 1 & S9 == 1 & S10 == 1),
        sum(S5 == 1 & S7 == 1 & S9 == 1 & S10 == 1),
        sum(S3 == 1 & S5 == 1 & S7 == 1 & S9 == 1 & S10 == 1),
        category = c("S3", "S5", "S7", "S9", "S10"),
        fill = c("orange", "red", "green", "blue", "yellow"),
        cex = 2,
        cat.cex = 2,
        cat.col = 'black'
))

在此處輸入圖像描述

盡管閱讀/理解要困難得多,但它包含的信息也少於熱圖。 例如,我可以從維恩圖中看到只有 S3 和 S5 有一個共同點,但我可以從熱圖中清楚地看到這一點。 此外,我可以告訴您屬(它是副球菌屬),以及使用熱圖在每個站點對其進行的觀察次數。 你不能用維恩圖來做到這一點。 Venn 只是用於呈現您擁有的數據的錯誤工具。

您可以考慮我的nVennR package:

library(nVennR)
species_abundance<-data.frame(Genus = c("Parasphingorhabdus", "Loktanella", "Cytobacillus", "Paracoccus", "Paucisalibacillus", "Kytococcus", "Salinibacterium", "Acinetobacter baumanni","Marinococcus","Bacillus"),
               S3 = c(0, 0, 1, 1, 0, 0, 1,0,4,0),
               S5 = c(0, 0, 0, 1, 1, 0, 1,0,3,5),
               S7 = c(3, 1, 0, 2, 0, 1, 0,0,3,1),
               S9 = c(0, 1, 0, 3, 0, 0, 0,1,2,0))

ct <- colnames(species_abundance)
ct <- ct[-1]
r <- vector("list", length = length(ct))
names(r) <- ct
for (v in ct){
  t <- species_abundance[species_abundance[,v] != 0, ]$Genus
  r[[v]] <- t
}
myV <- plotVenn(r)

來自數據框的維恩圖

也可以瀏覽結果,如vi.nette所示:

> getVennRegion(myV, c("S7"))
[1] "Parasphingorhabdus" "Kytococcus"   
> getVennRegion(myV, c("S3", "S7", "S9", "S5"))
[1] "Paracoccus"   "Marinococcus"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM