简体   繁体   English

如何获取保留或删除重复行的data.frame列表的堆栈条图?

[英]How can I get stack bar plot for list of data.frame where keeping or removing duplicated rows?

I have list of data.frame that needed to be categorized by threshold, finally getting stack bar plot by different category for file bar is desired. 我有需要按阈值分类的data.frame列表,最后需要按文件栏的不同类别获取堆栈栏图。 However, in my data.frame list, some rows are duplicated, and I need to show these duplicated rows in certain plot, but also these duplicated rows should be removed and displayed another plot. 但是,在我的data.frame列表中,有些行是重复的,因此我需要在某些图中显示这些重复的行,但是这些重复的行也应该删除并显示另一个图。 Because, keeping, removing these duplicated rows in different category, could give different insight to understand the result. 因为,保留,删除这些不同类别中的重复行,可能会带来不同的见解以了解结果。 Based on the name of stack bar plot, I intend to keep and remove these duplicated rows in certain category. 基于堆栈条形图的名称,我打算保留并删除某些类别中的这些重复行。 I have bit of hard time to get expected plot as I desired. 我很难获得期望的情节。 Can any one point me how to make this happen easily ? 谁能指出我如何轻松实现这一目标? How can I prepare plot data to get desired plot for my needs ? 如何准备样地数据以获得所需的样地? Any idea ? 任何想法 ?

reproducible data.frame : 可复制的data.frame:

Qualified <- list(
    hotan = data.frame( begin=c(7,13,19,25,31,37,43,49,55,67,79,103,31,49,55,67), 
                        end=  c(10,16,22,28,34,40,46,52,58,70,82,106,34,52,58,70), 
                        pos.score=c(11,19,8,2,6,14,25,10,23,28,15,17,6,10,23,28)),
    aksu = data.frame( begin=c(12,21,30,39,48,57,66,84,111,30,48,66,84), 
                       end=  c(15,24,33,42,51,60,69,87,114,33,51,69,87), 
                       pos.score=c(5,11,15,23,9,13,2,10,16,15,9,2,10)),
    korla = data.frame( begin=c(6,14,22,30,38,46,54,62,70,78,6,30,46,70), 
                        end=c(11,19,27,35,43,51,59,67,75,83,11,35,51,75), 
                        pos.score=c(9,16,12,3,20,7,11,13,14,17,9,3,7,14))
)

unQualified <- list(
    hotan = data.frame( begin=c(21,33,57,69,81,117,129,177,225,249,333,345,33,81,333), 
                        end=  c(26,38,62,74,86,122,134,182,230,254,338,350,38,86,338), 
                        pos.score=c(7,34,29,14,23,20,11,30,19,17,6,4,34,23,6)),
    aksu = data.frame( begin=c(13,23,33,43,53,63,73,93,113,123,143,153,183,33,63,143), 
                       end=  c(19,29,39,49,59,69,79,99,119,129,149,159,189,39,69,149), 
                       pos.score=c(5,13,32,28,9,11,22,12,23,3,6,8,16,32,11,6)),
    korla = data.frame( begin=c(23,34,45,56,67,78,89,122,133,144,166,188,56,89,144), 
                        end=c(31,42,53,64,75,86,97,130,141,152,174,196,64,97,152), 
                        pos.score=c(3,10,19,17,21,8,18,14,4,9,12,22,17,18,9))
)

Edit : 编辑

I did categorize my data in this way : 我确实以这种方式对数据进行了分类:

singleDF <- 
    bind_rows(c(Qualified = Qualified, Unqualified = unQualified), .id = "id") %>% 
    tidyr::separate(id, c("group", "list")) %>%
    mutate(elm = ifelse(pos.score >= 10, "valid", "invalid")) %>% 
    arrange(list, group, desc(elm))

res <- singleDF %>% split(list(.$list, .$elm, .$group))

This is my desired plot: 这是我想要的情节:

在此处输入图片说明

Note that in valid , invalid category, I need duplicate removal for data.frame, while Qualified , UnQualified category, I'll keep these repeated rows. 请注意,在validinvalid类别中,我需要对data.frame进行重复删除,而在QualifiedUnQualified类别中,我将保留这些重复的行。

How can I achieve my desired plot ? 如何获得理想的情节? How can I make this happen by using ggplot2 package ? 如何通过使用ggplot2软件包来实现此ggplot2 Any idea please ? 有什么想法吗? Thanks in advance :) 提前致谢 :)

Something like this perhaps?: 也许是这样的:

library(tidyverse)
library(cowplot)
theme_set(theme_grey())

p1 <- ggplot(filter(singleDF, list == "aksu"), 
             aes(group, fill = elm)) +
  geom_bar() +
  ylim(0, 16) +
  theme(legend.position = 'top', legend.title = element_blank(), axis.title.x = element_blank())

p2 <- ggplot(filter(singleDF, list == "aksu") %>% distinct(), 
             aes(elm, fill = group)) +
  geom_bar() +
  scale_fill_discrete(h.start = 90) +
  ylim(0, 16) +
  theme(legend.position = 'top', legend.title = element_blank(), axis.title.x = element_blank())

plot_grid(p1, p2, align = 'v', nrow = 1)

在此处输入图片说明

If you want to do this for each element of a list, you can use the tidyverse packages and wrap @Axeman's answer into a function. 如果要对列表的每个元素执行此操作,则可以使用tidyverse包并将tidyverse的答案包装到函数中。 I modified @Axeman's code to get the appearance that you wish, although I don't use cowplot so I substituted gridExtra . 我修改了@Axeman的代码来获得所需的外观,尽管我不使用cowplot所以我替换了gridExtra

EDIT: Easy fix to get your desired plot, just simply grid.arrange the results of the map with a single row. 编辑:轻松修复即可获得所需的绘图,只需简单地将grid.arrange the map的结果单行排列即可。 I also tweaked the plot to align more with your desired output. 我还调整了情节,使其与您所需的输出更加一致。 I used geom_label to get the counts, with stat="count" and use of the ..count.. special variable. 我使用geom_label来获取计数,使用stat="count"并使用..count..特殊变量。 You can switch it for geom_text if you wish. 您可以根据需要将其切换为geom_text

library(tidyverse)
library(grid) #for grid.draw
library(gridExtra) #for grid.arrange

split_plot <- function(x) {

  p1 <- ggplot(x, aes(x = group)) +
    geom_bar(aes(fill = elm), color = "black") +
    geom_label(aes(label = ..count.., color = elm), stat = "count", position = position_stack()) +
    ylim(0, 16) +
    labs(y = NULL, x = NULL) +
    theme_minimal() +
    theme(legend.position = 'none',
          panel.grid = element_blank(),
          legend.title = element_blank(),
          axis.ticks.y = element_blank(),
          axis.text.y = element_blank())

  p2 <- ggplot(distinct(x), aes(x = elm)) +
    geom_bar(aes(fill = group), color = "black") +
    geom_label(aes(label = ..count.., color = group), stat = "count", position = position_stack()) +
    scale_fill_discrete(h.start = 90) +
    scale_color_discrete(h.start = 90) +
    labs(y = NULL, x = NULL) +
    ylim(0, 16) +
    theme_minimal() +
    theme(legend.position = 'none',
          panel.grid = element_blank(),
          legend.title = element_blank(),
          axis.ticks.y = element_blank(),
          axis.text.y = element_blank())

  arrangeGrob(p1, p2, nrow = 1, top = unique(x$list)) 
  }

# Call the function over `singleDF`, split by list and plot each

res <- singleDF %>% 
  split(.$list) %>% 
  map(~split_plot(.x))

# Use grid.arange to draw the grobs 
grid.arrange(grobs = res, nrow = 1)

在此处输入图片说明

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在嵌套列表中获取data.frame的堆栈图? - How can I get stack plot for data.frame in the nested list? 如何在ggplot2中置换data.frame列表并创建其带注释的堆栈条形图? - How to permute list of data.frame and create its annotated stack bar plot in ggplot2? 我怎么能 bar_plot 这个data.frame? - How could I bar_plot this data.frame? 如何防止具有重复索引/键的行附加到data.frame? - How to prevent rows with duplicated indices / keys to be appended to a data.frame? 如何获得以意外的长模式命名的data.frame的“ facet_wrap”图? - How can I get `facet_wrap` plot for the data.frame named with unexpected long pattern? 如何按条形图融合R data.frame和plot group - How to melt R data.frame and plot group by bar plot R:如何在data.frame的行上使用apply并获取$ column_name? - R: How can I use apply on rows of a data.frame and get out $column_name? 在使用 unique() function 之后,如何从具有重复案例的 data.frame 中获取行的索引? - How can I get the index of the rows, from a data.frame with repeated cases, after using the unique() function? 将存在/不存在矩阵转换为顶点连接的 Data.frame。 (删除具有 eeuqal 无序值的重复行) - Transforming matrix of presence/absence to Data.frame of vertice connection. (Removing duplicated rows with eeuqal unordered values) 如何改善data.frame的条形图? - How to improve the resulted bar plot for data.frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM