簡體   English   中英

在Pandas數據框中的分組數據上創建新的功能列

[英]Creating a new feature column on grouped data in a Pandas dataframe

我有一個Pandas數據框,其列為['week','price_per_unit','total_units']。 我希望創建一個名為“ weighted_price”的新列,如下所示:首先按“周”分組,然后針對每個星期計算該周的price_per_unit * total_units / sum(total_units)。 我有這樣做的代碼:

import pandas as pd
import numpy as np

def create_features_by_group(df):
    # first group data
    grouped = df.groupby(['week'])
    df_temp = pd.DataFrame(columns=['weighted_price'])

    # run through the groups and create the weighted_price per group
    for name, group in grouped:
        res = (group['total_units'] * group['price_per_unit']) / np.sum(group['total_units'])
        for idx in res.index:
            df_temp.loc[idx] = [res[idx]]

    df.join(df_temp['weighted_price'])

    return df 

唯一的問題是,這非常非常慢。 有一些更快的方法嗎?

我使用以下代碼來測試該功能。

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['week', 'price_per_unit', 'total_units'])


for i in range(10):
    df.loc[i] = [round(int(i % 3), 0) , 10 * np.random.rand(), round(10 * np.random.rand(), 0)]

我認為您需要這樣做:

df
   price  total_units  week
0      5          100     1
1      7          200     1
2      9          150     2
3     11          250     2
4     13          125     2

def fun(table):
    table['measure'] = table['price'] * (table['total_units'] / table['total_units'].sum())
    return table  

df.groupby('week').apply(fun)
   price  total_units  week   measure
0      5          100     1  1.666667
1      7          200     1  4.666667
2      9          150     2  2.571429
3     11          250     2  5.238095
4     13          125     2  3.095238

我已按“周”對數據集進行了分組,以計算每周的加權價格。

然后,我將原始數據集與分組數據集結合在一起以得到結果:

# importing the libraries
import pandas as pd
import numpy as np

# creating the dataset
df = {
'Week' : [1,1,1,1,2,2], 
      'price_per_unit' : [10,11,22,12,12,45],
      'total_units' : [10,10,10,10,10,10]
      }
df = pd.DataFrame(df)
df['price'] = df['price_per_unit'] * df['total_units']

# calculate the total sales and total number of units sold in each week
df_grouped_week = df.groupby(by = 'Week').agg({'price' : 'sum', 'total_units' : 'sum'}).reset_index()

# calculate the weighted price
df_grouped_week['wt_price'] = df_grouped_week['price']  / df_grouped_week['total_units']  

# merging df and df_grouped_week
df_final = pd.merge(df, df_grouped_week[['Week', 'wt_price']], how = 'left', on = 'Week')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM