简体   繁体   English

ggplot2 和 dplyr,可视化一个作为字符的列

[英]ggplot2 and dplyr, visualize a column which is a character

i am currently trying to make a beautiful geom_col plot on a huge sample size.我目前正在尝试在巨大的样本量上制作漂亮的 geom_col plot。 The names of the samples (which should be on the x-axis) are both numeric and characters, since i include "N" for negative control.样本的名称(应该在 x 轴上)是数字和字符,因为我包含“N”表示阴性对照。

sample_names <- c(100,22,4,5,6,"N")
size <- c(3,2,3,4,2,3)

Now i would like to have that on in a beautiful order ranging from the lowest sample_name (meaning starting with sample number 4, then sample number 5, then sample number 6, sample number 22, sample number 100) to the highest and ending with the N. Since the values in the colum are identified as characters it always starts with sample 100 (because 1-0-0 is before 2-2).现在我想以一个漂亮的顺序排列它,从最低的 sample_name(意思是从样本号 4,然后是样本号 5,然后是样本号 6,样本号 22,样本号 100)到最高并以N. 由于列中的值被标识为字符,因此它始终以样本 100 开头(因为 1-0-0 在 2-2 之前)。

d <- data.frame(sample_names,size) %>%
     arrange(a)

在此处输入图像描述

This leads me to the problem that the data in the plot is ordered in a not that nice way.这导致我的问题是 plot 中的数据以一种不太好的方式排序。 在此处输入图像描述

It would be more pleasing to have in in the ascending order with the N at the end.以 N 结尾的升序排列会更令人愉悦。

I already tried to transform this colum into a numeric and replace the resultig NA (which come in place of the "N") with a 0.我已经尝试将此列转换为数字并将结果 NA(代替“N”)替换为 0。

The issue with that is, that the plot includes huge gaps between the samples:问题在于,plot 包含样本之间的巨大差距:

d <- data.frame(sample_names,size) %>%
   arrange(a) %>%
   mutate(sample_names = as.numeric(sample_names))%>%
   replace_na(list(sample_names = 0))

在此处输入图像描述

So my question is: Do you know how either sort a character colum into the "correct" ascending way OR do you know how to close the gaps on the x-axis in ggplot2?所以我的问题是:您是否知道如何将字符列排序为“正确”的升序方式,或者您是否知道如何缩小 ggplot2 中 x 轴上的间隙? Thank you谢谢

Order of bars are controlled by factors in the data.条形的顺序由数据中的因素控制。 To automate the factor generation code you can extract the values which are only numbers with regex, change them to numeric, sort them and append the non-numeric values at the end.要自动化因子生成代码,您可以使用正则表达式提取仅是数字的值,将它们更改为数字,对它们进行排序,最后 append 是非数字值。

num <- grep('^\\d+$', d$sample_names)

d$sample_names <- factor(d$sample_names, 
                 c(sort(unique(as.numeric(d$sample_names[num]))), 
                        unique(d$sample_names[-num])))

library(ggplot2)

ggplot(d, aes(sample_names, size)) + geom_col()

A simpler approach as suggested by @Rui Barradas is to use stringr::str_sort or gtools::mixedsort - @Rui Barradas 建议的一种更简单的方法是使用stringr::str_sortgtools::mixedsort -

d$sample_names <- factor(d$sample_names, stringr::str_sort(unique(d$sample_names), numeric = TRUE))

d$sample_names <- factor(d$sample_names, gtools::mixedsort(unique(d$sample_names)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM