简体   繁体   中英

R - graph frequency of observations over time with small value range

I'd trying to graph the frequency of observations over time. I have a dataset where hundreds of laws are coded 0-3. I'd like to know if outcomes 2-3 are occurring more often as time progresses. Here is a sample of mock data:

Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(1:4, 200, replace = TRUE)
)

If i plot

plot(Data$year, Data$score)

I get a checkered matrix where every single spot is filled in, but I can't tell which numbers occur more often. Is there a way to color or to change the size of each point by the number of observations of a given row/year?

A few notes may help in answering the question:

1). I don't know how to sample data where certain numbers occur more frequently than others. My sample procedure samples equally from all numbers. If there is a better way I should have created my reproducible data to reflect more observations in later years, I would like to know how.

2). this seemed like it would be best to visualize in a scatter plot, but I could be wrong. I'm open to other visualizations.

Thanks!

Here's how I would approach this (hope this is what you need)

Create the data (Note: when using sample in questions, always use set.seed too so it will be reproducible)

set.seed(123)
Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(1:4, 200, replace = TRUE)
)

Find frequncies of score per year using table

Data2 <- as.data.frame.matrix(table(Data))
Data2$year <- row.names(Data2)

Use melt to convert it back to long format

library(reshape2)
Data2 <- melt(Data2, "year")

Plot the data while showing different color per group and relative size pre frequency

library(ggplot2)
ggplot(Data2, aes(year, variable, size = value, color = variable)) +
  geom_point()

在此输入图像描述

Alternatively, you could use both fill and size to describe frequency, something like

ggplot(Data2, aes(year, variable, size = value, fill = value)) +
  geom_point(shape = 21)

在此输入图像描述

Here's another approach:

ggplot(Data, aes(year)) + geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)

PIC

Each facet represents one "score" value, as noted in the title of each facet. You can easily get a feeling for the counts by looking at the hight of the bars + the colour (lighter blue indicating more counts).


Of course you could also do this only for the score %in% 2:3 , if you don't want score 1 and 4 included. In such a case, you could do:

ggplot(Data[Data$score %in% 2:3,], aes(year)) + 
     geom_histogram(aes(fill = ..count..)) + facet_wrap(~ score)

So many answers... You seem to want to know if the frequency of outcomes 2-3 is increasing over time, so why not plot that directly:

set.seed(1)
Data <- data.frame(
  year = sample(1998:2004, 200, replace = TRUE),
  score = sample(0:3, 200, replace = TRUE))
library(ggplot2)
ggplot(Data, aes(x=factor(year),y=score, group=(score>1)))+
  stat_summary(aes(color=(score>1)),fun.y=length, geom="line")+
  scale_color_discrete("score",labels=c("0 - 1","2 - 3"))+
  labs(x="",y="Frequency")

> with(Data, round( prop.table(table(year,score), 1), 3)  )

      score
year       1     2     3     4
  1998 0.308 0.231 0.231 0.231
  1999 0.136 0.273 0.227 0.364
  2000 0.281 0.250 0.219 0.250
  2001 0.129 0.290 0.226 0.355
  2002 0.217 0.174 0.261 0.348
  2003 0.286 0.286 0.200 0.229
  2004 0.387 0.129 0.194 0.290

png(); plot(jitter(Data$year), jitter(Data$score) );dev.off()

在此输入图像描述

There are other methods one could use if the number of points are so large that jittering doesn't let you determine counts by eye. You can use transparent color which would allow you to determine density of points. The last 2 hex digits in an 8-position hex number preceded bu an octothorpe is the alpha-transparency of a color. See ?rgb and ?col2rgb . Compare these two plots with new data that allows you to have differences in proportions:

Data <- data.frame(
   year = rep(1998:2004, length=49000),
   score = sample(1:7, 49000, prob=(1:7)/5, replace = TRUE)
 )

png(); plot(jitter(Data$year), jitter(Data$score) );dev.off()

alpha透明度示例

 png(); plot(jitter(Data$year), jitter(Data$score) ,
        col="#bbbbbb11" );dev.off()

在此输入图像描述

Another alternative:

df<-aggregate(Data$score,by= list(Data$year),table)
matplot(df$Group.1,(df[,2]))

hope it helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM