简体   繁体   中英

pandas combing dataframes optimisation

Hey I have a time series order dataset in pandas with missing values for some dates to correct it I am trying to pick up the value from the previous dates available.

for date in dates_missing:
    df_temp = df[df.order_date<date].sort_values(['order_date'],ascending=False)
    supplier_map = df_temp.groupby('supplier_id')['value'].first()

    for supplier_id in supplier_map.index.values:
        df[(df.order_datetime==date)&(df.su_id == supp)]['value'] = supplier_map.get(supplier_id)

To explain the code I am looping over the missing dates then fetching the list of values previous to the missing date. Then getting the supplier id to value map using the pandas first()

NOW the slowest part is updating back the original data frame

I am looping over each supplier and updating the values in the original data frame.

Need suggestion to speed up this inner for loop

Example:

|order_date|supplier_id |value |sku_id| |2017-12-01| 10 | 1.0 | 1 | |2017-12-01| 9 | 1.3 | 7 | |2017-12-01| 3 | 1.4 | 2 | |2017-12-02| 3 | 0 | 2 | |2017-12-02| 9 | 0 | 7 | |2017-12-03| 3 | 1.0 | 2 | |2017-12-03| 10 | 1.0 | 1 | |2017-12-03| 9 | 1.3 | 7 |

date to fix 2017-12-02

|2017-12-02| 3 | 0 | 2 | |2017-12-02| 9 | 0 | 7 |

corrected data frame

|order_date|supplier_id |value |sku_id| |2017-12-01| 10 | 1.0 | 1 | |2017-12-01| 9 | 1.3 | 7 | |2017-12-01| 3 | 1.4 | 2 | |2017-12-02| 3 | 1.4 | 2 | |2017-12-02| 9 | 1.3 | 7 | |2017-12-03| 3 | 1.0 | 2 | |2017-12-03| 10 | 1.0 | 1 | |2017-12-03| 9 | 1.3 | 7 | PS: I might not be way clear with the question so would be happy to answer doubts and re-edit the post moving on.

You can group the dataframe by day and supplier_id, for each grouped dataframe replace 0 with Null, once you got null fill with forward fill, for early values you can use backward fill,

It may reduce your time

df.replace(0,np.nan,inplace=True)
df['values'] = df.groupby([df.supplier_id])['values'].apply(lambda x: x.replace(0,np.nan).fillna(method='ffill').fillna(method = 'bfill'))

Out:

    order_date  sku_id  supplier_id values
0   2017-12-01  1   10  1.0
1   2017-12-01  7   9   1.3
2   2017-12-01  2   3   1.4
3   2017-12-02  2   3   1.4
4   2017-12-02  7   9   1.3
5   2017-12-03  2   3   1.0
6   2017-12-03  1   10  1.0
7   2017-12-03  7   9   1.3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM