简体   繁体   English

Pandas - 将每行除以组平均值

[英]Pandas - divide each row by a group average

I am trying so solve a somewhat simple task but it is not clear how to achieve it in pandas.我正在尝试解决一个有点简单的任务,但不清楚如何在熊猫中实现它。

So I have a pandas dataframe which has a set of columns I am interested in. Set of columns is stored in factors list:所以我有一个 Pandas 数据框,它有一组我感兴趣的列。一组列存储在factors列表中:

#get df
df = pd.read_sql(sql=sqlString, con = engine)

#shuffle
df = df.sample(frac=1, random_state=123).reset_index(drop=True)

#list of fields we want
factors = ['GRP_RANK', 'BK_YIELD', 'SALES_YIELD', 'EARNINGS_YIELD_LTM', 'CASHFLOW_YIELD', 'ROE', 'ROIC',
           'ROA', 'GROSS_MGN', '12MVT', '1MVT', 'BETA_3Y', 'BETA_1Y', 'P_TOTAL_RETURN(-1,0,USD)']

Now, there is column DATE in dataframe.现在,数据框中有DATE列。 For each of the factors for each record I want to divide the value of factor by the average of the factor value on particular date.对于每条记录的每个factors ,我想将因子的值除以特定日期的因子值的平均值。

I managed to obtain the averages by day for each factor:我设法按天获得每个因素的平均值:

dfGroup = df[factors + ["DATE"]].groupby('DATE')[factors].mean()

But I am not sure how to proceed.但我不确定如何继续。 Only thing which comes to my mind is get new big dataframe by left joining df and dfGroup by DATE field and then doing some ugle column by column division, but maybe there is a way to do it easier?我想到的唯一一件事是通过按 DATE 字段左加入dfdfGroup来获得新的大数据dfGroup ,然后按列划分做一些 ugle 列,但也许有一种方法可以更轻松地做到这一点?

Let's look at using groupby and transform with div :让我们看看使用groupby并使用div transform

MVCE: MVCE:

df = pd.DataFrame({'Date':pd.date_range('2018-02-10','2018-02-12',freq='H'),'A':np.random.randint(0,100,49),'B':np.random.randint(100,200,49),'C':np.random.random(49)})

df = df.set_index('Date')

print(df.head())

Output:输出:

                      A    B         C
Date                                  
2018-02-10 00:00:00  11  131  0.474226
2018-02-10 01:00:00  35  188  0.998742
2018-02-10 02:00:00  97  182  0.683685
2018-02-10 03:00:00   0  134  0.845094
2018-02-10 04:00:00  24  173  0.238379

Use groupby, transfrom and div:使用 groupby、transfrom 和 div:

df[['A','B','C']].div(df.groupby(df.index.floor('D')).transform('mean'))

Output head():输出头():

                        A         B         C
Date                                             
2018-02-10 00:00:00  0.362637  0.866593  0.931739
2018-02-10 01:00:00  1.153846  1.243660  1.962284
2018-02-10 02:00:00  3.197802  1.203969  1.343275
2018-02-10 03:00:00  0.000000  0.886439  1.660404
2018-02-10 04:00:00  0.791209  1.144432  0.468357

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM