简体   繁体   English

R中的2D直方图:在列中从计数转换为频率

[英]2D Histogram in R: Converting from Count to Frequency within a Column

Would appreciate help with generating a 2D histogram of frequencies, where frequencies are calculated within a column. 希望能帮助您生成频率的2D直方图,其中频率是在列内计算的。 My main issue: converting from counts to column based frequency. 我的主要问题:从计数转换为基于列的频率。

Here's my starting code: 这是我的起始代码:

# expected packages
library(ggplot2)
library(plyr)

# generate example data corresponding to expected data input
x_data = sample(101:200,10000, replace = TRUE)
y_data = sample(1:100,10000, replace = TRUE)
my_set = data.frame(x_data,y_data)

# define x and y interval cut points
x_seq = seq(100,200,10)
y_seq = seq(0,100,10)

# label samples as belonging within x and y intervals
my_set$x_interval = cut(my_set$x_data,x_seq)
my_set$y_interval = cut(my_set$y_data,y_seq)

# determine count for each x,y block
xy_df = ddply(my_set, c("x_interval","y_interval"),"nrow") # still need to convert for use with dplyr

# convert from count to frequency based on formula: freq = count/sum(count in given x interval)
################ TRYING TO FIGURE OUT #################

# plot results
fig_count <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = nrow)) # count
fig_freq <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) + geom_tile(aes(fill = freq)) # frequency

I would appreciate any help in how to calculate the frequency within a column. 我将不胜感激如何计算列中的频率。

Thanks! 谢谢! jac 江淮

EDIT: I think the solution will require the following steps 1) Calculate and store overall counts for each x-interval factor 2) Divide the individual bin count by its corresponding x-interval factor count to obtain frequency. 编辑:我认为该解决方案将需要以下步骤:1)计算并存储每个x间隔因子的总计数2)将单个bin计数除以其相应的x间隔因子计数以获得频率。

Not sure how to carry this out though. 虽然不确定如何执行此操作。 .

If you want to normalize over the x_interval values, you can create a column with a count per interval and then divide by that. 如果要对x_interval值进行归一化,则可以创建一个每个间隔计数的列,然后除以该间隔。 I must admit i'm not a ddply wiz so maybe it has an easier way, but I would do 我必须承认我不是ddply wiz,所以也许它有更简单的方法,但是我会做

xy_df$xnrows<-with(xy_df, ave(nrow, x_interval, FUN=sum))

then 然后

fig_freq <- ggplot(xy_df, aes(x = x_interval, y = y_interval)) +
    geom_tile(aes(fill = nrow/xnrows))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM