简体   繁体   English

如何将两个向量(我想要值的频率)组合到 dataframe 中,用于 R 中的并排栏 plot

[英]How to combine two vectors (I want frequency of values) into a dataframe for a side by side bar plot in R

Can see there are similar questions about actually plotting it but I'm really struggling to get the data organised correctly.可以看到有关于实际绘制它的类似问题,但我真的很难正确组织数据。 I have two vectors storing goals scored from 100,000 simulated football matches for two teams (Home and Away).我有两个向量存储两支球队(主队和客队)的 100,000 场模拟足球比赛的进球数。 My end goal is a side by side bar plot showing the frequency for each number of goals.我的最终目标是并排的栏 plot 显示每个目标数量的频率。

I've used table() to show frequency and then merged them with NA as 0 so that they end up the same length but when I'm trying to use ggplot2 to plot, I'm running into a lot of issues because with how they're merged I end up with HomeGoals (as in 0, 1, 2, 3, 4, 5), Freq.x and Freq.y (frequencies for Horm/Away) as column headings我使用 table() 来显示频率,然后将它们与 NA 合并为 0,以便它们最终具有相同的长度,但是当我尝试使用 ggplot2 到 plot 时,我遇到了很多问题,因为如何它们被合并我最终以 HomeGoals(如 0、1、2、3、4、5)、Freq.x 和 Freq.y(Horm/Away 的频率)作为列标题

Is there a better way to do this?有一个更好的方法吗? Any help appreciated!任何帮助表示赞赏!

It's hard to understand what kind of issues you are running into.很难理解您遇到了什么样的问题。 Here's an attempt with simulated data.这是对模拟数据的尝试。 You may want to consider how you organize your data.您可能需要考虑如何组织数据。 I'm assuming you have a data frame (or two vectors).我假设您有一个数据框(或两个向量)。

set.seed(1)

df = data.frame(home = sample(0:8, size = 1000, replace = T),
                away = sample(0:6, size = 1000, replace = T))

require(ggplot2)
require(gridExtra)

p1 = ggplot(df) +
    geom_histogram(aes(x = home)) + ggtitle("Home") + xlab("Goals")

p2 = ggplot(df) +
   geom_histogram(aes(x = away)) + ggtitle("Away") + xlab("Goals")

grid.arrange(p1, p2, ncol=2)    

在此处输入图像描述

To get bars side by side:要并排获得条形图:

df2 = reshape2::melt(df1, value.name = 'score', variable.names = 'team')

df3 = as.data.frame(table(df2$score, df2$variable))

ggplot(df3, aes(x=Var1, y=Freq, fill=Var2)) + 
    geom_bar(position="dodge", stat="identity")    

在此处输入图像描述

Try using position_dodge() with ggplot2 Keep in mind that the data has to be in a long format like in the example data.尝试将position_dodge()ggplot2一起使用 请记住,数据必须像示例数据一样采用长格式

ggplot(df1) + geom_bar( aes(values, fill=ind), position=position_dodge() )

条形图

Data数据

df1 <- structure(list(values = c(3L, 7L, 4L, 8L, 10L, 5L, 7L, 9L, 1L, 
8L, 7L, 0L, 8L, 7L, 0L, 10L, 6L, 9L, 9L, 2L, 3L, 10L, 9L, 8L, 
5L, 4L, 1L, 6L, 0L, 2L, 5L, 7L, 2L, 9L, 10L, 9L, 2L, 8L, 9L, 
4L, 4L, 3L, 8L, 0L, 5L, 10L, 9L, 9L, 7L, 4L, 10L, 1L, 2L, 7L, 
1L, 4L, 5L, 10L, 5L, 8L, 8L, 2L, 0L, 9L, 1L, 7L, 3L, 5L, 3L, 
10L, 6L, 8L, 6L, 1L, 3L, 7L, 4L, 10L, 0L, 9L, 5L, 0L, 0L, 10L, 
9L, 0L, 5L, 1L, 4L, 9L, 3L, 8L, 4L, 6L, 4L, 8L, 9L, 1L, 6L, 8L, 
3L, 2L, 5L, 5L, 5L, 0L, 0L, 0L, 10L, 7L, 0L, 3L, 3L, 10L, 4L, 
8L, 6L, 3L, 0L, 10L, 1L, 2L, 4L, 5L, 7L, 10L, 1L, 9L, 7L, 4L, 
9L, 2L, 5L, 9L, 0L, 5L, 9L, 0L, 8L, 6L, 10L, 5L, 0L, 4L, 6L, 
2L, 0L, 2L, 9L, 7L, 9L, 4L, 9L, 9L, 0L, 9L, 2L, 9L, 5L, 0L, 10L, 
0L, 3L, 0L, 7L, 3L, 3L, 1L, 6L, 0L, 4L, 6L, 2L, 3L, 4L, 1L, 7L, 
10L, 6L, 1L, 9L, 7L, 2L, 3L, 1L, 7L, 3L, 10L, 10L, 1L, 5L, 2L, 
1L, 3L, 8L, 0L, 8L, 6L, 1L, 8L, 7L, 4L, 4L, 5L, 2L, 2L, 7L, 4L, 
8L, 4L, 4L, 7L, 3L, 8L, 8L, 4L, 7L, 4L, 10L, 2L, 4L, 1L, 0L, 
8L, 5L, 3L, 2L, 0L, 0L, 5L, 8L, 6L, 6L, 9L, 7L, 1L, 1L, 10L, 
10L, 5L, 8L, 10L, 2L, 0L, 2L, 10L, 3L, 10L, 4L, 7L, 1L, 1L, 7L, 
1L, 8L, 8L, 4L, 0L, 9L, 3L, 2L, 3L, 3L, 10L, 3L, 5L, 0L, 2L, 
2L, 2L, 10L, 2L, 7L, 8L, 4L, 10L, 4L, 6L, 3L, 9L, 0L, 9L, 6L, 
5L, 5L, 8L, 3L, 1L, 7L, 4L, 3L, 9L, 6L, 10L, 6L, 8L, 1L, 9L, 
10L, 0L, 1L, 6L, 6L, 8L, 10L, 2L, 8L, 5L, 3L, 8L, 4L, 9L, 10L, 
1L, 8L, 4L, 10L, 5L, 10L, 0L, 6L, 1L, 7L, 5L, 5L, 10L, 8L, 8L, 
7L, 10L, 4L, 4L, 7L, 10L, 10L, 7L, 7L, 8L, 6L, 3L, 5L, 3L, 5L, 
10L, 1L, 5L, 10L, 3L, 4L, 0L, 9L, 7L, 2L, 9L, 1L, 3L, 10L, 9L, 
3L, 4L, 9L, 0L, 2L, 3L, 1L, 10L, 9L, 10L, 0L, 0L, 2L, 8L, 10L, 
10L, 5L, 4L, 1L, 10L, 10L, 5L, 0L, 8L, 6L, 8L, 7L, 1L, 6L, 7L, 
5L, 1L, 3L, 2L, 2L, 8L, 7L, 9L, 9L, 5L, 0L, 9L), ind = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), class = "factor", .Label = c("home", 
"away"))), class = "data.frame", row.names = c(NA, -400L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM