简体   繁体   English

在R中生成“ 2D”直方图

[英]Generating “2D” histogram in R

I am new to R and I would like to know how to generate histograms for the following situation : 我是R的新手,我想知道如何为以下情况生成直方图:

I initially have a regular frequency table with 2 columns : Column A is the category (or bin) and Column B is the number of cases that fall in that category 我最初有一个包含2列的常规频率表:A列是类别(或bin),B列是属于该类别的案件数

Col A    Col B
1-10       7
11-20      4
21-30      5

From this initial frequency table, I create a table with 3 columns : Col A is again the category (or bin), but now Col B is the "fraction of total cases", so for the category 1-10, column B will have the value 7/(7+4+5) = 7/16 . 从这个初始频率表中,我创建了一个包含3列的表:Col A仍然是类别(或bin),但是现在Col B是“总数的分数”,因此对于类别1-10,B列将具有值7 /(7 + 4 + 5)= 7/16。 Now there is also a third column, Col C which is "fraction of total cases falling between the categories 1-20", so for 1-10, the value for Col C would be 7/(7+4) = 7/11. 现在还有第三列“ Col C”,它是“落在类别1-20之间的所有案例的分数”,因此对于1-10,Col C的值为7 /(7 + 4)= 7/11 。 The complete table would look like below : 完整的表格如下所示:

Col A    Col B    Col C
1-10      7/16     7/11
11-20     4/16     4/11
21-30     5/16      0

How do I generate a histogram from this 3-column table above ? 如何从上面的3列表格中生成直方图? My X axis should be the bin (1-10, 11-20 etc.) and my Y axis should be the fraction, however for every bin I have two fractions (Col B and Col C), so there will be two fraction "bars" for every bin in the histogram. 我的X轴应该是bin(1-10、11-20等),我的Y轴应该是分数,但是对于每个bin我都有两个分数(Col B和Col C),因此会有两个分数“柱状图”。

Any help would be greatly appreciated. 任何帮助将不胜感激。

The data: 数据:

dat <- data.frame(A = c("1-10", "11-20", "21-30"), B = c(7, 4, 5))

Now, calculate the proportions and create a new object: 现在,计算比例并创建一个新对象:

dat2 <- rbind(B = dat$B/sum(dat$B), C = c(dat$B[1:2]/sum(dat$B[1:2]), 0))
colnames(dat2) <- dat$A

Plot: 情节:

barplot(dat2, beside = TRUE, legend = rownames(dat2))

在此处输入图片说明

Your title should be changed to "Dodged Bar Chart" instead of 2D histogram, because histograms have continuous scale on x axis unlike bar chart and they are basically used for comparing the distributions of univariate data or the distributions of univariate data modeled on the dependent factor. 您的标题应该更改为“ Dodged Bar Chart”,而不是2D直方图,因为直方图在x轴上具有连续的刻度,这与条形图不同,并且直方图主要用于比较单变量数据的分布或基于因数建模的单变量数据的分布。 You are trying to compare colB vs colC which can be effectively visualized using a 2D scatter plot but not with bar chart. 您正在尝试比较colB与colC,可以使用2D散点图而不是条形图对其进行有效地可视化。 The better way to compare the distributions of colB and colC using histograms would be plotting two histograms separately and check the change in location of the data points. 使用直方图比较colB和colC分布的更好方法是分别绘制两个直方图并检查数据点位置的变化。

If you want to compare distributions of colB and colC, try the following code: I did round up the values for getting a reasonable data per your data description. 如果要比较colB和colC的分布,请尝试以下代码:我对数据值进行了四舍五入以获取合理的数据。 Notice a random sampling by permutation is happening and everytime, you run the same code, there will be slight change in the distribution, but that will not affect the inference of distribution between colB and colC. 请注意,正在按排列进行随机采样,并且每次您运行相同的代码时,分布都会稍有变化,但这不会影响colB和colC之间的分布推断。

library("ggplot2")
# 44 datapoints between 1-10
a <- rep(1:10, 4)
a <- c(a, sample(a, size=4, replace=FALSE))
# 25 datapoints between 11-20
b <- rep(11:20, 2)
b <- c(b, sample(b, size=5, replace=FALSE))
# 31 datapoints between 21-30
c <- rep(21:30, 3)
c <- c(c, sample(c, size=1, replace=FALSE))
colB <- c(a, b, c)
# 64 datapoints between 1-10
a <- rep(1:10, 6)
a <- c(a, sample(a, size=4, replace=FALSE))
# 36 datapoints between 11-20
b <- rep(11:20, 3)
b <- c(b, sample(b, size=6, replace=FALSE))
colC <- c(a, b)
df <- data.frame(cbind(colB, colC=colC))
write.table(df, file = "data")
data <- read.table("data", header=TRUE)
data
ggplot(data=data, aes(x=colB, xmin=1, xmax=30)) + stat_bin(binwidth = 1)
ggplot(data=data, aes(x=colC, xmin=1, xmax=30)) + stat_bin(binwidth = 1)

# if you want density distribution, then you can try something like this:
ggplot(data=data, aes(x=colB, y = ..density.., xmin=1, xmax=30)) + stat_bin(binwidth = 1)
ggplot(data=data, aes(x=colC, y = ..density.., xmin=1, xmax=30)) + stat_bin(binwidth = 1)

HTH -Sathish HTH-沙爹语

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM