简体   繁体   中英

Rearranging data frame in R with summarizing values

I need to rearrange a data frame, which currently looks like this:

> counts
       year     score   freq rounded_year
    1: 1618         0     25         1620
    2: 1619         2      1         1620
    3: 1619         0     20         1620
    4: 1620         1      6         1620
    5: 1620         0     70         1620
   ---                                   
11570: 1994       107      1         1990
11571: 1994       101      2         1990
11572: 1994        10    194         1990
11573: 1994         1  30736         1990
11574: 1994         0 711064         1990

But what I need is the count of the unique values in score per decade ( rounded_year ). So, the data frame should looks like this:

rounded_year  0       1      2   3  [...] total
1620          115     6      1   0        122
---
1990          711064  30736  0   0        741997

I've played around with aggregate and ddply , but so far without success. I hope, it's clear what I mean. I don't know how to describe it better.

Any ideas?

A simple example using dplyr and tidyr .

dt = data.frame(year = c(1618,1619,1620,1994,1994,1994),
                score = c(0,1,0,2,2,3),
                freq = c(3,5,2,6,7,8),
                rounded_year = c(1620,1620,1620,1990,1990,1990))

dt

#    year score freq rounded_year
# 1 1618     0    3         1620
# 2 1619     1    5         1620
# 3 1620     0    2         1620
# 4 1994     2    6         1990
# 5 1994     2    7         1990
# 6 1994     3    8         1990


library(dplyr)
library(tidyr)

dt %>%
  group_by(rounded_year, score) %>%
  summarise(freq = sum(freq)) %>%
  mutate(total = sum(freq)) %>%
  spread(score,freq, fill=0) 


# Source: local data frame [2 x 6]
# 
#    rounded_year total     0     1     2     3
#           (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
# 1         1620    10     5     5     0     0
# 2         1990    21     0     0    13     8

In case you prefer to work with data.table (as the dataset you provide looks more like a data.table), you can use this:

library(data.table)
library(tidyr)

dt = setDT(dt)[, .(freq = sum(freq)) ,by=c("rounded_year","score")]
dt = dt[, total:= sum(freq) ,by="rounded_year"]
dt = spread(dt,score,freq, fill=0)
dt

#    rounded_year total 0 1  2 3
# 1:         1620    10 5 5  0 0
# 2:         1990    21 0 0 13 8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM