[英]R - graph frequency of observations over time with small value range
I'd trying to graph the frequency of observations over time. 我试图绘制观察频率随时间变化的情况。 I have a dataset where hundreds of laws are coded 0-3.
我有一个数据集,其中数百个法则编码为0-3。 I'd like to know if outcomes 2-3 are occurring more often as time progresses.
我想知道结果2-3是否随着时间的推移更频繁地发生。 Here is a sample of mock data:
以下是模拟数据的示例:
Data <- data.frame(
year = sample(1998:2004, 200, replace = TRUE),
score = sample(1:4, 200, replace = TRUE)
)
If i plot 如果我情节
plot(Data$year, Data$score)
I get a checkered matrix where every single spot is filled in, but I can't tell which numbers occur more often. 我得到一个格子矩阵,每个点填充,但我不知道哪些数字更频繁出现。 Is there a way to color or to change the size of each point by the number of observations of a given row/year?
有没有办法根据给定行/年的观察数量来着色或改变每个点的大小?
A few notes may help in answering the question: 一些注释可能有助于回答这个问题:
1). 1)。 I don't know how to sample data where certain numbers occur more frequently than others.
我不知道如何对某些数字出现频率高于其他数字的数据进行抽样。 My sample procedure samples equally from all numbers.
我的样本程序从所有数字中均等地采样。 If there is a better way I should have created my reproducible data to reflect more observations in later years, I would like to know how.
如果有更好的方法我应该创建可重现的数据以反映后来的更多观察结果,我想知道如何。
2). 2)。 this seemed like it would be best to visualize in a scatter plot, but I could be wrong.
这似乎最好是在散点图中可视化,但我可能是错的。 I'm open to other visualizations.
我对其他可视化开放。
Thanks! 谢谢!
Here's how I would approach this (hope this is what you need) 这是我如何接近这个(希望这是你需要的)
Create the data (Note: when using sample
in questions, always use set.seed
too so it will be reproducible) 创建数据(注意:在问题中使用
sample
时,总是使用set.seed
以便它可以重现)
set.seed(123)
Data <- data.frame(
year = sample(1998:2004, 200, replace = TRUE),
score = sample(1:4, 200, replace = TRUE)
)
Find frequncies of score
per year
using table
找到frequncies
score
每year
使用table
Data2 <- as.data.frame.matrix(table(Data))
Data2$year <- row.names(Data2)
Use melt
to convert it back to long format 使用
melt
将其转换为长格式
library(reshape2)
Data2 <- melt(Data2, "year")
Plot the data while showing different color per group and relative size pre frequency 绘制数据,同时显示每组不同的颜色和预先频率的相对大小
library(ggplot2)
ggplot(Data2, aes(year, variable, size = value, color = variable)) +
geom_point()
Alternatively, you could use both fill
and size
to describe frequency, something like 或者,您可以使用
fill
和size
来描述频率,例如
ggplot(Data2, aes(year, variable, size = value, fill = value)) +
geom_point(shape = 21)
Here's another approach: 这是另一种方法:
ggplot(Data, aes(year)) + geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)
Each facet represents one "score" value, as noted in the title of each facet. 每个方面代表一个“得分”值,如每个方面的标题中所述。 You can easily get a feeling for the counts by looking at the hight of the bars + the colour (lighter blue indicating more counts).
您可以通过查看条形图的高度+颜色(浅蓝色表示更多计数)轻松获得计数。
Of course you could also do this only for the score %in% 2:3
, if you don't want score 1 and 4 included. 当然,如果您不希望得分1和4,您也可以仅对
score %in% 2:3
的score %in% 2:3
执行此操作。 In such a case, you could do: 在这种情况下,你可以这样做:
ggplot(Data[Data$score %in% 2:3,], aes(year)) +
geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)
So many answers... You seem to want to know if the frequency of outcomes 2-3 is increasing over time, so why not plot that directly: 如此多的答案......您似乎想知道结果2-3的频率是否随着时间的推移而增加,那么为什么不直接绘制:
set.seed(1)
Data <- data.frame(
year = sample(1998:2004, 200, replace = TRUE),
score = sample(0:3, 200, replace = TRUE))
library(ggplot2)
ggplot(Data, aes(x=factor(year),y=score, group=(score>1)))+
stat_summary(aes(color=(score>1)),fun.y=length, geom="line")+
scale_color_discrete("score",labels=c("0 - 1","2 - 3"))+
labs(x="",y="Frequency")
> with(Data, round( prop.table(table(year,score), 1), 3) )
score
year 1 2 3 4
1998 0.308 0.231 0.231 0.231
1999 0.136 0.273 0.227 0.364
2000 0.281 0.250 0.219 0.250
2001 0.129 0.290 0.226 0.355
2002 0.217 0.174 0.261 0.348
2003 0.286 0.286 0.200 0.229
2004 0.387 0.129 0.194 0.290
png(); plot(jitter(Data$year), jitter(Data$score) );dev.off()
There are other methods one could use if the number of points are so large that jittering doesn't let you determine counts by eye. 如果点数太大以至于抖动不能让您通过眼睛确定计数,则可以使用其他方法。 You can use transparent color which would allow you to determine density of points.
您可以使用透明颜色来确定点的密度。 The last 2 hex digits in an 8-position hex number preceded bu an octothorpe is the alpha-transparency of a color.
8位十六进制数字中的最后2个十六进制数字位于oc ocothothorpe之前是颜色的alpha透明度。 See
?rgb
and ?col2rgb
. 请参阅
?rgb
和?col2rgb
。 Compare these two plots with new data that allows you to have differences in proportions: 将这两个图与新数据进行比较,这些数据允许您在比例上有所不同:
Data <- data.frame(
year = rep(1998:2004, length=49000),
score = sample(1:7, 49000, prob=(1:7)/5, replace = TRUE)
)
png(); plot(jitter(Data$year), jitter(Data$score) );dev.off()
png(); plot(jitter(Data$year), jitter(Data$score) ,
col="#bbbbbb11" );dev.off()
Another alternative: 另一种选择:
df<-aggregate(Data$score,by= list(Data$year),table)
matplot(df$Group.1,(df[,2]))
hope it helps 希望能帮助到你
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.