简体   繁体   中英

Aggregating by indeces and reweighting in R

I have a ton of Price data that is indexed by State, Date, and UPC (Product Code). I want to aggregate away UPC, and combine the Prices by taking a weighted average. I will try to explain it, but you may just want to read the code below.

Each observation in the dataset is: UPC, date, state, price, and weight. I would like to aggregate away the UPC index in this way:

Take all data points with the same date and state, and multiple their prices by their weights and sum them up. This obviously creates a weighted average, which I call the priceIndex. However, for some date&state combos the weights do not add up to 1. Therefore, I want to create two additional columns: One for the total of the weights for each date&state combo. The second is for a reweighted average: Ie, if the original two weights were .5 and .3, change them to .5/(.5+.3)=.625 and .3/(.5+.3)=.375, and then recalculate the weighted average into another price index.

This is what I mean:

upc=c(1153801013,1153801013,1153801013,1153801013,1153801013,1153801013,2105900750,2105900750,2105900750,2105900750,2105900750,2173300001,2173300001,2173300001,2173300001)
date=c(200601,200602,200603,200603,200601,200602,200601,200602,200603,200601,200602,200601,200602,200603,200601)
price=c(26,28,27,27,23,24,85,84,79.5,81,78,24,19,98,47)
state=c(1,1,1,2,2,2,1,1,2,2,2,1,1,1,2)
weight=c(.3,.2,.6,.4,.4,.5,.5,.5,.45,.15,.5,.2,.15,.3,.45)

# This is what I have:
data <- data.frame(upc,date,state,price,weight)
data

# These are a few of the weighted calculations:
# .3*26+85*.5+24*.2 = 55.1
# 28*.2+84*.5+19*.15 = 50.45
# 27*.6+98*.3 = 45.6
# Etc. etc.

# Here is the reweighted calculation for date=200602 & state==1:
# 28*(.2/.85)+84*(.5/.85)+19*(.15/.85) = 50.45
# Or, equivalently:
# (28*.2+84*.5+19*.15)/.85 = 50.45

# This is what I want:
date=c(200601,200602,200603,200601,200602,200603)
state=c(1,1,1,2,2,2)
priceIndex=c(55.1,50.45,45.6,42.5,51,46.575)
totalWeight=c(1,.85,.9,1,1,.85)
reweightedIndex=c(55.1,59.35294,50.66667,42.5,51,54.79412)
index <- data.frame(date,state,priceIndex,totalWeight,reweightedIndex)
index

Also, not that it should matter, but there are about 35 states, 150 UPCs, and 84 dates in the dataset -- so there are a lot of observations.

Thanks a lot in advance.

We can use one of the group by summarise operation. With data.table , we convert the 'data.frame' to 'data.table' ( setDT(data) , grouped by 'date', 'state', we get the sum of product of 'price' and 'weight', and sum(weight) as temporary variables, and then create the 3 variables in the list based on that.

library(data.table) 
setDT(data)[, {tmp1 = sum(price*weight)
                tmp2 = sum(weight)
        list(priceIndex=tmp1, totalWeight=tmp2,
              reweigthedIndex = tmp1/tmp2)}, .(date, state)]
#    date state priceIndex totalWeight reweightedIndex
#1: 200601     1     55.100        1.00        55.10000
#2: 200602     1     50.450        0.85        59.35294
#3: 200603     1     45.600        0.90        50.66667
#4: 200603     2     46.575        0.85        54.79412
#5: 200601     2     42.500        1.00        42.50000
#6: 200602     2     51.000        1.00        51.00000

Or using dplyr , we can use summarise to create the 3 columns after doing grouping by 'date' and 'state'.

library(dplyr)
data %>% 
  group_by(date, state) %>% 
  summarise(priceIndex = sum(price*weight),
            totalWeight = sum(weight),
            reweightedIndex = priceIndex/totalWeight)
#   date state priceIndex totalWeight reweightedIndex
#   (dbl) (dbl)      (dbl)       (dbl)           (dbl)
#1 200601     1     55.100        1.00        55.10000
#2 200601     2     42.500        1.00        42.50000
#3 200602     1     50.450        0.85        59.35294
#4 200602     2     51.000        1.00        51.00000
#5 200603     1     45.600        0.90        50.66667
#6 200603     2     46.575        0.85        54.79412

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM