简体   繁体   中英

High performance apply on group by pandas

I need to calculate percentile on a column of a pandas dataframe. A subset of the dataframe is as below:

df

I want to calculate the 20th percentile of the SaleQTY, but for each group of ["Barcode","ShopCode"]: so I define a function as below:

def quant(group):
    group["Quantile"] = np.quantile(group["SaleQTY"], 0.2)
    return group

And apply this function on each group pf my sales data which has almost 18 million rows and roughly 3 million groups of ["Barcode","ShopCode"]:

quant_sale = sales.groupby(['Barcode','ShopCode']).apply(quant)

That took 2 hours to complete on a windows server with 128 GB Ram and 32 Core. It make not sense because that is one small part of my code. S o I start searching the net to enhance the performance. I came up with "numba" solution with below code which didn't work:

from numba import njit, jit
@jit(nopython=True)
def quant_numba(df):
    final_quant = []
    for bar_shop,group in df.groupby(['Barcode','ShopCode']):
        group["Quantile"] = np.quantile(group["SaleQTY"], 0.2)
        final_quant.append((bar_shop,group["Quantile"]))
    return final_quant    
result = quant_numba(sales)  

It seems that I cannot use pandas objects within this decorator. 在此处输入图片说明

I am not sure whether I can use of multi processing (which I'm unfamiliar with the whole concept) or whether is there any solution to speed up my code. So any help would be appreciated.

You can try DataFrameGroupBy.quantile :

df1 = df.groupby(['Barcode', 'Shopcode'])['SaleQTY'].quantile(0.2)

Or like montioned @Jon Clements for new columns filled by percentiles use GroupBy.transform :

df['Quantile'] = df.groupby(['Barcode', 'Shopcode'])['SaleQTY'].transform('quantile', q=0.2)

There is a inbuilt function in panda called quantile().

quantile() will help to get nth percentile of a column in df.

Doc reference link

geeksforgeeks examplereference

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM