简体   繁体   中英

Pandas dataframe - find matching rows, based on IDs from two columns

I've got a dataframe, let's say it is defined like this:

df = pd.DataFrame({
'variable' : ["A","A","B","B","A","B","B","A"],
'id1' : [1,2,3,4,5,6,7,8],
'id2' : ['NaN','NaN',2,'NaN',4,3,'NaN',5]
})

So the result is:

   id1  id2 variable1
0    1  NaN         A
1    2  NaN         A
2    3    2         B
3    4  NaN         B
4    5    4         A
5    6    3         B
6    7  NaN         B
7    8    5         A

Now what I would like to achieve is:

  1. Match rows where id2 in one row is equal to id1 in another row. So in this particular example, paired rows should be:

    (2, 1) - because df.iloc[2]['id2'] == df.iloc[1]['id1']

    and subsequently: (4, 3), (5, 2), (7, 4)

  2. From these pairs, choose only those where variable1 is constant within a pair, so from the pairs above we will have left only (5, 2) and (7, 4) because only in these cases we have:

    df.iloc[5]['variable1'] == df.iloc[2]['variable1']

    (example is valid for the first pair of course).

Now, if this were pure Python, I would probably iterate over two lists, containing id1 and id2 , checking equalities of ids and variable , but I suppose there should be a more array-oriented method...

Perhaps creating a duplicate dataframe and doing an intersection of id1 and id2 will do? Or should I consider a totally different idea?

maybe:

>>> pd.merge(df[['id2', 'variable']].reset_index(),
...          df.reset_index(), how='inner',
...          left_on=['id2', 'variable'],
...          right_on=['id1', 'variable'])[['index_x', 'index_y']]

   index_x  index_y
0        5        2
1        7        4

[2 rows x 2 columns]

How about this?

# Add index column:
df = df.reset_index()

df2 = pd.merge(
         df, df
         left_on="id1", right_on="id2"
        ).query("variable_x == variable_y")

list(df2[["index_x", "index_y"]].itertuples(index=False))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM