简体   繁体   中英

Plot top X categories in R/ggplot2

This is very similar to the question here:

How to use ggplot to group and show top X categories?

Except in my case I don't have a discrete value to go on. I've got data about users posting messages to a user forum. Similar to:

Year, Month, Day, User, Message

I've got an entry for every single message a person posted and I want to plot the top 5 users per year in terms of total Messages posted. In the previous question there was a distinct list of values that could be keyed off of.

In my case, I'm curious if I can do it easily in ggplot2, or if I need to do something like:

  1. Load the data into a dataframe
  2. Construct a new dataframe which is the same data collapsed & summarized by year
  3. Plot from the new frame using the same approach as the previous question

If this is the best way to do it, what's the "correct" way to do #2? That new dataframe should probably be of the form:

Year, User, Total number of Messages

any help is appreciated.

Based on Joran's comment, I found this plyr approach:

ddply(posts, .(year, poster), summarise, freq=length(year))

Which gives me the posts per year per user. From there I can trim it down as suggested in other posts to get the top X posters per year.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM