简体   繁体   中英

Pandas: Conditional Self-Join Based on Multiple Conditions

I am familiar with how to merge/join two Pandas dataframes like so:

result = pd.merge(user_usage,
                 user_device[['use_id', 'platform', 'device']],
                 on='use_id', 
                 how='right')

However, I don't knopw how would I do a self-join of a table:

id    rank   ts
1     1      2015-11-01
1     2      2015-11-03
1     3      2015-11-07

where I want the comparison of each id-rank's timestamp with the following one.

In SQL and Scala syntax, this is easy. In SQL, I would just do something like (in pseudo-code):

SELECT *
FROM df a
LEFT JOIN df b
ON a.id = b.id & (a.rank + 1) = b.rank;

In the pd.merge syntax, I've never seen such an example and am still unable to find one.

To be clear, I'm looking for:

id    rank   ts           ts_2         time_since_previous_obs
1     1      2015-11-01   <null>       0
1     2      2015-11-03   2015-11-01   2
1     3      2015-11-07   2015-11-03   4

Is this possible with Python Pandas merge or join syntax? Is there another smarter way?

Well, you can modify the rank before merge:

(df.merge(df.assign(rank=df['rank'] - 1),
          on=['id','rank'], how='left')
   .assign(last_obs_since=lambda x: x['ts_y'] - x['ts_x'])
)

Output:

   id  rank       ts_x       ts_y last_obs_since
0   1     1 2015-11-01 2015-11-02         1 days
1   1     2 2015-11-02 2015-11-03         1 days
2   1     3 2015-11-03        NaT            NaT
#create a list from ts and shift by one to make ts2
ts2 =df["ts"][:-1].tolist()
ts2.insert(0,None)

#append list to dataframe
df["ts2"] = ts2

#calculate difference
df["diff"] = df["ts"] - df["ts2"]
print(df)

output:

在此处输入图像描述

Following should also work,

df['ts2'] = df.shift(1)['ts']
df['last_obs_since'] = df['ts'] - df['ts2']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM