[英]Aggregate data for 3 months for the current year and then transpose the uniques values of a column in R
I have a column that represents a date in a dataframe. 我有一列代表数据框中的日期。 I simply want to create a new dataframe that summarizes amount coulmn at ID for 3months (20140831, 20140930,20141031) using the date column. 我只是想创建一个新的数据框,该数据框使用日期列汇总ID为3个月的金额(20140831、20140930、20141031)。 Then transposing the Brand Column with the summarized Amount value for each. 然后,将品牌栏与每个栏的汇总金额进行转换。 What is the best approach? 最好的方法是什么?
the data set is as below. 数据集如下。
ID date Brand Amount
1001 20141031 UNIBIC 9.8
1001 20140930 UNIBIC 1.023
1002 20140831 CITRIZINE 2.019
1002 20140930 CITRIZINE 2.015
1002 20141031 CITRIZINE 1.002
1003 20140831 CHOCO 4.22
1004 20140930 SOLOSTAR 1.007
1004 20141030 SOLOSTAR 1.008
1005 20140930 DOLO 1.025
I would like to have the Output as below 我想要以下输出
ID UNIBIC CITRIZINE CHOCO SOLOSTAR DOLO
1001 5.4115
1002 1.678
1003 4.22
1004 1.039
1005 1.025
Any help provided from your end will be greatly appreciated 您提供的任何帮助将不胜感激
You can try 你可以试试
library(reshape2)
df$Brand <- factor(df$Brand, levels=unique(df$Brand))
dcast(df, ID~Brand, value.var='Amount', mean)
# ID UNIBIC CITRIZINE CHOCO SOLOSTAR DOLO
#1 1001 5.4115 NaN NaN NaN NaN
#2 1002 NaN 1.678667 NaN NaN NaN
#3 1003 NaN NaN 4.22 NaN NaN
#4 1004 NaN NaN NaN 1.0075 NaN
#5 1005 NaN NaN NaN NaN 1.025
Or may be you can try dcast.data.table
which would be faster 或者也许您可以尝试dcast.data.table
,它将更快
library(data.table)
dcast.data.table(setDT(df), ID~Brand, value.var='Amount', mean)
Or using dplyr/tidyr
或使用dplyr/tidyr
library(dplyr)
library(tidyr)
df %>%
group_by(ID, Brand) %>%
summarise(Amount=mean(Amount)) %>%
ungroup() %>%
spread(Brand, Amount)
If you need only Aug
, Sep
, and Oct
, you could subset
the dataset before transforming. 如果只需要Aug
, Sep
和Oct
,则可以在转换之前对数据集进行subset
集化。
df1 <- df[as.numeric(substr(df$date, 5,6)) %in% 8:10,]
dcast(df1, ID~Brand, value.var='Amount', mean)
df <- structure(list(ID = c(1001L, 1001L, 1002L, 1002L, 1002L, 1003L,
1004L, 1004L, 1005L), date = c(20141031L, 20140930L, 20130831L,
20140930L, 20141031L, 20130831L, 20130930L, 20131030L, 20140930L
), Brand = c("UNIBIC", "UNIBIC", "CITRIZINE", "CITRIZINE", "CITRIZINE",
"CHOCO", "SOLOSTAR", "SOLOSTAR", "DOLO"), Amount = c(9.8, 1.023,
2.019, 2.015, 1.002, 4.22, 1.007, 1.008, 1.025)), .Names = c("ID",
"date", "Brand", "Amount"), class = "data.frame", row.names = c(NA, -9L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.