简体   繁体   English

在一张图中可视化分类变量与其他变量的频率之间的关系?

[英]visualize relationship between categorical variable and frequency of other variable in one graph?

how in R, should I have a histogram with a categorical variable in x-axis and the frequency of a continuous variable on the y axis?在 R 中,我应该有一个直方图,在 x 轴上有一个分类变量,在 y 轴上有一个连续变量的频率? is this correct?这个对吗?

There are a couple of ways one could interpret "one graph" in the title of the question.有几种方法可以解释问题标题中的“一张图”。 That said, using the ggplot2 package, there are at least a couple of ways to render histograms with by groups on a single page of results.也就是说,使用ggplot2 package,至少有两种方法可以在单页结果中按组呈现直方图。

First, we'll create data frame that contains a normally distributed random variable with a mean of 100 and a standard deviation of 20. We also include a group variable that has one of four values, A, B, C, or D.首先,我们将创建包含平均值为 100、标准差为 20 的正态分布随机变量的数据框。我们还包括一个组变量,该变量具有以下四个值之一:A、B、C 或 D。

set.seed(950141237) # for reproducibility of results 
df <- data.frame(group = rep(c("A","B","C","D"),200),
                 y_value = rnorm(800,mean=100,sd = 20))

The resulting data frame has 800 rows of randomly generated values from a normal distribution, assigned into 4 groups of 200 observations.生成的数据框有 800 行从正态分布随机生成的值,分配到 4 组,每组 200 个观察值。

Next, we will render this in ggplot2::ggplot() as a histogram, where the color of the bars is based on the value of group .接下来,我们将在ggplot2::ggplot()中将其渲染为直方图,其中条形的颜色基于group的值。

ggplot(data = df,aes(x = y_value, fill = group)) + geom_histogram()

...and the resulting chart looks like this: ...结果图表如下所示:

在此处输入图像描述

In this style of histogram the values from each group are stacked atop each other(ie the frequency of group A is added to B, etc. before rendering the chart), which might not be what the original poster intended.在这种直方图样式中,来自每个组的值相互堆叠(即在呈现图表之前将 A 组的频率添加到 B 等),这可能不是原始海报的意图。

We can verify the "stacking" behavior by removing the fill = group argument from aes() .我们可以通过从aes()中删除fill = group参数来验证“堆叠”行为。

# verify the stacking behavior
ggplot(data = df,aes(x = y_value)) + geom_histogram()

...and the output, which looks just like the first chart, but drawn in a single color. ...以及 output,它看起来就像第一个图表,但以单色绘制。

在此处输入图像描述

Another way to render the data is to use group with facet_wrap() , where each distribution appears in a different facet on one chart.呈现数据的另一种方法是使用带有facet_wrap()的组,其中每个分布出现在一个图表的不同方面。

ggplot(data = df,aes(x = y_value)) + geom_histogram() + facet_wrap(~group)

The resulting chart looks like this:生成的图表如下所示:

在此处输入图像描述

The facet approach makes it easier to see differences in frequency of y values between the groups.分面方法可以更容易地查看组之间 y 值的频率差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM