简体   繁体   中英

Creating a new pandas dataframe column by looking up values in other rows

I would like to find a faster way to calculate the sales 52 weeks ago column for each product below without using iterrows or itertuples. Any suggestions? Input will be the table without "sales 52 weeks ago column" and output will be the entire table below.

         date  sales city product  sales 52 weeks ago
0  2020-01-01    1.5   c1      p1       0.6
1  2020-01-01    1.2   c1      p2       0.3
2  2019-05-02    0.5   c1      p1       nan
3  2019-01-02    0.3   c1      p2       nan
4  2019-01-02    0.6   c1      p1       nan
5  2019-01-01    1.2   c1      p2       nan

Example itertuples code but really slow:

for row in df.itertuples(index=True, name='Pandas'):
    try:
        df.at[row.Index, 'sales 52 weeks ago']=df[(df['date']==row.date-timedelta(weeks=52))&(df['product']==row.product),'sales']
    except:
        continue

You need a merge after subtracting the date with Timedelta :

m=df['date'].sub(pd.Timedelta('52W')).to_frame().assign(product=df['product'])
final = df.assign(sales_52_W_ago=m.merge(df,
         on=['date','product'],how='left').loc[:,'sales'])

        date  sales city product  sales_52_W_ago
0 2020-01-01    1.5   c1      p1             0.6
1 2020-01-01    1.2   c1      p2             0.3
2 2019-05-02    0.5   c1      p1             NaN
3 2019-01-02    0.3   c1      p2             NaN
4 2019-01-02    0.6   c1      p1             NaN
5 2019-01-01    1.2   c1      p2             NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM