简体   繁体   English

R - 在较小值范围内随时间观察的图形频率

[英]R - graph frequency of observations over time with small value range

I'd trying to graph the frequency of observations over time. 我试图绘制观察频率随时间变化的情况。 I have a dataset where hundreds of laws are coded 0-3. 我有一个数据集,其中数百个法则编码为0-3。 I'd like to know if outcomes 2-3 are occurring more often as time progresses. 我想知道结果2-3是否随着时间的推移更频繁地发生。 Here is a sample of mock data: 以下是模拟数据的示例:

Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(1:4, 200, replace = TRUE)
)

If i plot 如果我情节

plot(Data$year, Data$score)

I get a checkered matrix where every single spot is filled in, but I can't tell which numbers occur more often. 我得到一个格子矩阵,每个点填充,但我不知道哪些数字更频繁出现。 Is there a way to color or to change the size of each point by the number of observations of a given row/year? 有没有办法根据给定行/年的观察数量来着色或改变每个点的大小?

A few notes may help in answering the question: 一些注释可能有助于回答这个问题:

1). 1)。 I don't know how to sample data where certain numbers occur more frequently than others. 我不知道如何对某些数字出现频率高于其他数字的数据进行抽样。 My sample procedure samples equally from all numbers. 我的样本程序从所有数字中均等地采样。 If there is a better way I should have created my reproducible data to reflect more observations in later years, I would like to know how. 如果有更好的方法我应该创建可重现的数据以反映后来的更多观察结果,我想知道如何。

2). 2)。 this seemed like it would be best to visualize in a scatter plot, but I could be wrong. 这似乎最好是在散点图中可视化,但我可能是错的。 I'm open to other visualizations. 我对其他可视化开放。

Thanks! 谢谢!

Here's how I would approach this (hope this is what you need) 这是我如何接近这个(希望这是你需要的)

Create the data (Note: when using sample in questions, always use set.seed too so it will be reproducible) 创建数据(注意:在问题中使用sample时,总是使用set.seed以便它可以重现)

set.seed(123)
Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(1:4, 200, replace = TRUE)
)

Find frequncies of score per year using table 找到frequncies scoreyear使用table

Data2 <- as.data.frame.matrix(table(Data))
Data2$year <- row.names(Data2)

Use melt to convert it back to long format 使用melt将其转换为长格式

library(reshape2)
Data2 <- melt(Data2, "year")

Plot the data while showing different color per group and relative size pre frequency 绘制数据,同时显示每组不同的颜色和预先频率的相对大小

library(ggplot2)
ggplot(Data2, aes(year, variable, size = value, color = variable)) +
  geom_point()

在此输入图像描述

Alternatively, you could use both fill and size to describe frequency, something like 或者,您可以使用fillsize来描述频率,例如

ggplot(Data2, aes(year, variable, size = value, fill = value)) +
  geom_point(shape = 21)

在此输入图像描述

Here's another approach: 这是另一种方法:

ggplot(Data, aes(year)) + geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)

PIC

Each facet represents one "score" value, as noted in the title of each facet. 每个方面代表一个“得分”值,如每个方面的标题中所述。 You can easily get a feeling for the counts by looking at the hight of the bars + the colour (lighter blue indicating more counts). 您可以通过查看条形图的高度+颜色(浅蓝色表示更多计数)轻松获得计数。


Of course you could also do this only for the score %in% 2:3 , if you don't want score 1 and 4 included. 当然,如果您不希望得分1和4,您也可以仅对score %in% 2:3score %in% 2:3执行此操作。 In such a case, you could do: 在这种情况下,你可以这样做:

ggplot(Data[Data$score %in% 2:3,], aes(year)) + 
     geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)

So many answers... You seem to want to know if the frequency of outcomes 2-3 is increasing over time, so why not plot that directly: 如此多的答案......您似乎想知道结果2-3的频率是否随着时间的推移而增加,那么为什么不直接绘制:

set.seed(1)
Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(0:3, 200, replace = TRUE))
library(ggplot2)
ggplot(Data, aes(x=factor(year),y=score, group=(score>1)))+
  stat_summary(aes(color=(score>1)),fun.y=length, geom="line")+
  scale_color_discrete("score",labels=c("0 - 1","2 - 3"))+
  labs(x="",y="Frequency")

> with(Data, round( prop.table(table(year,score), 1), 3)  )

      score
year       1     2     3     4
  1998 0.308 0.231 0.231 0.231
  1999 0.136 0.273 0.227 0.364
  2000 0.281 0.250 0.219 0.250
  2001 0.129 0.290 0.226 0.355
  2002 0.217 0.174 0.261 0.348
  2003 0.286 0.286 0.200 0.229
  2004 0.387 0.129 0.194 0.290

png(); plot(jitter(Data$year), jitter(Data$score) );dev.off()

在此输入图像描述

There are other methods one could use if the number of points are so large that jittering doesn't let you determine counts by eye. 如果点数太大以至于抖动不能让您通过眼睛确定计数,则可以使用其他方法。 You can use transparent color which would allow you to determine density of points. 您可以使用透明颜色来确定点的密度。 The last 2 hex digits in an 8-position hex number preceded bu an octothorpe is the alpha-transparency of a color. 8位十六进制数字中的最后2个十六进制数字位于oc ocothothorpe之前是颜色的alpha透明度。 See ?rgb and ?col2rgb . 请参阅?rgb?col2rgb Compare these two plots with new data that allows you to have differences in proportions: 将这两个图与新数据进行比较,这些数据允许您在比例上有所不同:

Data <- data.frame(
   year = rep(1998:2004, length=49000),
   score = sample(1:7, 49000, prob=(1:7)/5, replace = TRUE)
 )

png(); plot(jitter(Data$year), jitter(Data$score) );dev.off()

alpha透明度示例

 png(); plot(jitter(Data$year), jitter(Data$score) ,
        col="#bbbbbb11" );dev.off()

在此输入图像描述

Another alternative: 另一种选择:

df<-aggregate(Data$score,by= list(Data$year),table)
matplot(df$Group.1,(df[,2]))

hope it helps 希望能帮助到你

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM