简体   繁体   中英

Creating a new feature column on grouped data in a Pandas dataframe

I have a Pandas dataframe with the columns ['week', 'price_per_unit', 'total_units']. I wish to create a new column called 'weighted_price' as follows: first group by 'week' and then for each week calculate price_per_unit * total_units / sum(total_units) for that week. I have code that does this:

import pandas as pd
import numpy as np

def create_features_by_group(df):
    # first group data
    grouped = df.groupby(['week'])
    df_temp = pd.DataFrame(columns=['weighted_price'])

    # run through the groups and create the weighted_price per group
    for name, group in grouped:
        res = (group['total_units'] * group['price_per_unit']) / np.sum(group['total_units'])
        for idx in res.index:
            df_temp.loc[idx] = [res[idx]]

    df.join(df_temp['weighted_price'])

    return df 

The only problem is that this is very, very slow. Is there some faster way to do this?

I used the following code to test the function.

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['week', 'price_per_unit', 'total_units'])


for i in range(10):
    df.loc[i] = [round(int(i % 3), 0) , 10 * np.random.rand(), round(10 * np.random.rand(), 0)]

I think you need to do it this way:

df
   price  total_units  week
0      5          100     1
1      7          200     1
2      9          150     2
3     11          250     2
4     13          125     2

def fun(table):
    table['measure'] = table['price'] * (table['total_units'] / table['total_units'].sum())
    return table  

df.groupby('week').apply(fun)
   price  total_units  week   measure
0      5          100     1  1.666667
1      7          200     1  4.666667
2      9          150     2  2.571429
3     11          250     2  5.238095
4     13          125     2  3.095238

I have grouped the dataset by 'Week' to calculate the weighted price for each week.

Then I joined the original dataset with the grouped dataset to get the result:

# importing the libraries
import pandas as pd
import numpy as np

# creating the dataset
df = {
'Week' : [1,1,1,1,2,2], 
      'price_per_unit' : [10,11,22,12,12,45],
      'total_units' : [10,10,10,10,10,10]
      }
df = pd.DataFrame(df)
df['price'] = df['price_per_unit'] * df['total_units']

# calculate the total sales and total number of units sold in each week
df_grouped_week = df.groupby(by = 'Week').agg({'price' : 'sum', 'total_units' : 'sum'}).reset_index()

# calculate the weighted price
df_grouped_week['wt_price'] = df_grouped_week['price']  / df_grouped_week['total_units']  

# merging df and df_grouped_week
df_final = pd.merge(df, df_grouped_week[['Week', 'wt_price']], how = 'left', on = 'Week')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM