[英]How to aggregate data in R
我希望创建一个基于区域的新数据框架,并根据“年份”,“属性类型”以及是旧的还是新的,按照每个区域的计数对数据集进行分组。
我已经尝试了聚合函数但是丢失了其他变量的值。 以下是数据集
Property.Type Old.New Town.City District County Date
1 D N BARKING BARKING AND DAGENHAM GREATER LONDON 2012
2 D Y BARKING BARKING AND DAGENHAM GREATER LONDON 2012
3 D N BARKING BARKING AND DAGENHAM GREATER LONDON 2012
4 D N DAGENHAM BARKING AND DAGENHAM GREATER LONDON 2012
5 D N DAGENHAM BARKING AND DAGENHAM GREATER LONDON 2012
我想重新安排数据,所以我将分区作为我的ID和每个类别的不同帧,例如:
by year
District 2012 2013 2014 2015
Barking 100 500 700 800
by Old.New and year
District New Old
Barking 50 70
by property type and year
District New2012 Old2012
Barking 50 70
没有完整的数据框,它有点难以帮助,但是这里有一些代码块,向您展示如何使用tidyverse
库来聚合数据。
首先使用提供的数据重新创建数据帧:
Property.Type <- c("D","D","D","D","D")
Old.New <- c("N","Y","N","N","N")
Town.City <- c("BARKING","BARKING","BARKING","DAGENHAM","DAGENHAM")
District <- c("BARKING AND DAGENHAM","BARKING AND DAGENHAM","BARKING AND DAGENHAM","BARKING AND DAGENHAM","BARKING AND DAGENHAM")
County <- c("GREATER LONDON","GREATER LONDON","GREATER LONDON","GREATER LONDON","GREATER LONDON")
Date <- c(2012,2012,2012,2012,2012)
df <- data.frame(Property.Type,Old.New,Town.City,District,County,Date)
然后通过一些列聚合:
> df %>% group_by(Town.City) %>% summarise(n = n())
# A tibble: 2 x 2
Town.City n
<fct> <int>
1 BARKING 3
2 DAGENHAM 2
>
> df %>% group_by(Date, Town.City) %>% summarise(n = n())
# A tibble: 2 x 3
# Groups: Date [?]
Date Town.City n
<dbl> <fct> <int>
1 2012 BARKING 3
2 2012 DAGENHAM 2
>
> df %>% group_by(Date, Town.City) %>% summarise(n = n())
# A tibble: 2 x 3
# Groups: Date [?]
Date Town.City n
<dbl> <fct> <int>
1 2012 BARKING 3
2 2012 DAGENHAM 2
>
> df %>% group_by(Property.Type, Date) %>% summarise(n = n())
# A tibble: 1 x 3
# Groups: Property.Type [?]
Property.Type Date n
<fct> <dbl> <int>
1 D 2012 5
如需进一步参考,请点击此链接
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.