簡體   English   中英

加速適用於 Pandas

[英]Speed up apply in Pandas

東風:

Person,utility,selected,innovation
2012001153_7_E02005533_1_2012002698,130.2333,yes,0
2012001153_7_E02005533_1_2012002698,110.33,no,1
2012001153_7_E02005533_1_2012002698,83,no,2
2012001153_7_E02005533_1_2012002698,-100,no,3
2012001153_7_E02005533_1_2012002698,49,no,4

我希望創建一個新列,在其中找到與所選“即選擇時 ==“是””相關的 dis 實用程序。

以下方法有效,但在數百萬條記錄上執行此操作時速度很慢:

def get_relativeUtilityToSelected(group):
    selected_utility = group[group['selected']=='yes']['utility'].values[0]
    group['relativeDisUtilityToSelected'] = group['utility'] - selected_utility
    return group

df = df.groupby(['person']).apply(get_relativeUtilityToSelected)

預期 output:

Person,utility,selected,innovation,relativeDisUtilityToSelected
2012001153_7_E02005533_1_2012002698,130.2333,yes,0,0
2012001153_7_E02005533_1_2012002698,110.33,no,1,-19.9033
2012001153_7_E02005533_1_2012002698,83,no,2,-47.2333
2012001153_7_E02005533_1_2012002698,-100,no,3,-230.2333
2012001153_7_E02005533_1_2012002698,49,no,4,-81.2333

我怎樣才能加快速度?

如果你有速度問題,你應該完全避免使用.apply ,因為它會循環你的 DataFrame。

注意:從您的 function 中,我推斷每個 'person' 只有一個且只有一行'selected' == 'yes'

那應該這樣做:

import pandas as pd
from io import StringIO

ds = """person,utility,selected,innovation
2012001153_7_E02005533_1_2012002698,130.2333,yes,0
2012001153_7_E02005533_1_2012002698,110.33,no,1
2012001153_7_E02005533_1_2012002698,83,no,2
2012001153_7_E02005533_1_2012002698,-100,no,3
2012001153_7_E02005533_1_2012002698,49,no,4"""

df = pd.read_csv(StringIO(ds))

df['relativeDisUtility'] = df['utility'] - df[df['selected'] == 'yes'][['person', 'utility']].set_index('person').loc[df['person']].values[:, 0]

df

結果:

                                person  ...  relativeDisUtility
0  2012001153_7_E02005533_1_2012002698  ...              0.0000
1  2012001153_7_E02005533_1_2012002698  ...            -19.9033
2  2012001153_7_E02005533_1_2012002698  ...            -47.2333
3  2012001153_7_E02005533_1_2012002698  ...           -230.2333
4  2012001153_7_E02005533_1_2012002698  ...            -81.2333

分解:

df[
    df['selected'] == 'yes'  # pick the rows where 'selected' == 'yes'
][
    ['person', 'utility']    # choose 'person' and 'utility' columns
].set_index(
    'person'                 # make 'person' the index
).loc[
    df['person']             # expand to the shape of the original 'person' column
].values[:, 0]               # get the values

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM