简体   繁体   中英

Assign values to a column using conditional on a second pandas df

I have a pandas dataframe with dates and locations:

df1 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', 
      '1-3-2013'], 'locations':['L1','L2','L3']}) 

and another DataFrame that has the counts of points of interest that intersect with each location:

df2 = pd.DataFrame({'dates':['1-1-2013', '1-2-2013', 
      '1-3-2013'], 'locations':['L1','L1','L1'], 'poi_cts':[23,12,23]}) 

The dates in df2 are a small subset of the dates of df1.

I want to create a column in df1 (df1['counts']) which sums the poi_cts for each location/date for poi_cts that are within a specified date range (eg, within 14 days prior to the date in df1).

I've tried:

def ct_pts(window=14):

    Date = row.Date

    cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])

return cts

df1.apply(ct_pts, axis = 1)

but that doesn't work (not sure how to assign the column for each row, I saw this example used somewhere but it isn't working).

I could also do this column-wise, but I'm struggling there too:

def ct_pts():
    new = pd.DataFrame()
    for location in pd.unique(df1['locations']):
        subset = df1[df1['locations']==location]
        for date in pd.unique(df1['Date']):
            df2 = df[df['Date'] == date]
            df2['spray'] = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])
            new = new.append(df2)
    return new

this isn't working either.

I feel like I'm missing something very simple, is there an easy way to do this?

I am using numpy boardcast to speed up the whole process

l=[]
for x , y in df1.groupby('locations'):
    s=df2.loc[df2.locations==x,'dates'].values
    t=y['dates'].values
    v=((t[:,None]-s)/np.timedelta64(1, 'D'))
    l.extend(np.dot(((v>-14)&(v<=0)),df2.loc[df2.locations==x,'poi_cts'].values))



df1['cts']=l
df1
Out[167]: 
       dates locations  cts
0 2013-01-01        L1   23
1 2013-02-01        L2    0
2 2013-03-01        L3    0

This might be a bit slower but here's how you can do this using apply :

  1. Create a new column to get start_dates so it's easier to filter:

     df1['dates'] = pd.to_datetime(df1['dates']) df1['start_dates'] = df1['dates'] - pd.to_timedelta(14, unit='d')
  2. Apply function on entire dataframe:

     def ct_pts(row): df_fil = df2[(df2['dates'] <= row['dates']) & (df2['dates'] >= row['start_dates']) & (df2['locations'] == row['locations'])] row['counts'] = sum(df_fil['poi_cts']) return row df1 = df1.apply(ct_pts, axis=1)

OUTPUT:

dates       locations   start_dates counts
2013-01-01  L1          2012-12-18  23
2013-01-02  L2          2012-12-19  0
2013-01-03  L3          2012-12-20  0

I got my initial attempt to work using apply:

def num_spray(row):

    Date = row['Date']

    cts = np.sum(df2[(df2['Date'] < Date) & (df2['Date'] > (Date - np.timedelta64(window,'D')))]['poi_cts'])

    return cts

df1.apply(ct_pts, axis = 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM