简体   繁体   English

汇总当年3个月的数据,然后转置R中一列的唯一值

[英]Aggregate data for 3 months for the current year and then transpose the uniques values of a column in R

I have a column that represents a date in a dataframe. 我有一列代表数据框中的日期。 I simply want to create a new dataframe that summarizes amount coulmn at ID for 3months (20140831, 20140930,20141031) using the date column. 我只是想创建一个新的数据框,该数据框使用日期列汇总ID为3个月的金额(20140831、20140930、20141031)。 Then transposing the Brand Column with the summarized Amount value for each. 然后,将品牌栏与每个栏的汇总金额进行转换。 What is the best approach? 最好的方法是什么?

the data set is as below. 数据集如下。

  ID         date       Brand     Amount
 1001   20141031    UNIBIC          9.8
 1001   20140930    UNIBIC          1.023
 1002   20140831    CITRIZINE       2.019
 1002   20140930    CITRIZINE       2.015
 1002   20141031    CITRIZINE       1.002
 1003   20140831    CHOCO           4.22
 1004   20140930    SOLOSTAR        1.007
 1004   20141030    SOLOSTAR        1.008
 1005   20140930    DOLO            1.025

I would like to have the Output as below 我想要以下输出

  ID           UNIBIC     CITRIZINE   CHOCO   SOLOSTAR      DOLO  
  1001        5.4115                
  1002                  1.678
  1003                               4.22        
  1004                                        1.039
  1005                                                    1.025

Any help provided from your end will be greatly appreciated 您提供的任何帮助将不胜感激

You can try 你可以试试

library(reshape2)
df$Brand <- factor(df$Brand, levels=unique(df$Brand))
dcast(df, ID~Brand, value.var='Amount', mean)
#   ID UNIBIC CITRIZINE CHOCO SOLOSTAR  DOLO
#1 1001 5.4115       NaN   NaN      NaN   NaN
#2 1002    NaN  1.678667   NaN      NaN   NaN
#3 1003    NaN       NaN  4.22      NaN   NaN
#4 1004    NaN       NaN   NaN   1.0075   NaN
#5 1005    NaN       NaN   NaN      NaN 1.025

Or may be you can try dcast.data.table which would be faster 或者也许您可以尝试dcast.data.table ,它将更快

library(data.table) 
dcast.data.table(setDT(df), ID~Brand, value.var='Amount', mean)

Or using dplyr/tidyr 或使用dplyr/tidyr

library(dplyr)
library(tidyr)

df %>%
   group_by(ID, Brand) %>% 
   summarise(Amount=mean(Amount)) %>%
   ungroup() %>%
   spread(Brand, Amount)

Update 更新资料

If you need only Aug , Sep , and Oct , you could subset the dataset before transforming. 如果只需要AugSepOct ,则可以在转换之前对数据集进行subset集化。

df1 <-  df[as.numeric(substr(df$date, 5,6)) %in% 8:10,]
dcast(df1, ID~Brand, value.var='Amount', mean)

data 数据

 df <- structure(list(ID = c(1001L, 1001L, 1002L, 1002L, 1002L, 1003L, 
 1004L, 1004L, 1005L), date = c(20141031L, 20140930L, 20130831L, 
 20140930L, 20141031L, 20130831L, 20130930L, 20131030L, 20140930L
 ), Brand = c("UNIBIC", "UNIBIC", "CITRIZINE", "CITRIZINE", "CITRIZINE", 
 "CHOCO", "SOLOSTAR", "SOLOSTAR", "DOLO"), Amount = c(9.8, 1.023, 
 2.019, 2.015, 1.002, 4.22, 1.007, 1.008, 1.025)), .Names = c("ID", 
 "date", "Brand", "Amount"), class = "data.frame", row.names = c(NA, -9L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM