简体   繁体   中英

Python equivalence of R's match() for indexing

So i essentially want to implement the equivalent of R's match() function in Python, using Pandas dataframes - without using a for-loop.

In R match() returns a vector of the positions of (first) matches of its first argument in its second.

Let's say that I have two df A and B, of which both include the column C. Where

A$C = c('a','b')
B$C = c('c','c','b','b','c','b','a','a')

In R we would get

match(A$C,B$C) = c(7,3)

What is an equivalent method in Python for columns in pandas data frames, that doesn't require looping through the values.

Here is a one liner :

B.reset_index().set_index('c').loc[Ac, 'index'].values

This solution returns the results in the same order as the input A , as match does in R, so it is a better equivalent than @jezrael's answer, because


Full example:

A = pd.DataFrame({'c':['a','b']})
B = pd.DataFrame({'c':['c','c','b','b','c','b','a','a']})

B.reset_index().set_index('c').loc[A.c, 'index'].values
Output array([6, 2])

You can use first drop_duplicates and then boolean indexing with isin or merge .

Python counts from 0 , so for same output add 1 .

A = pd.DataFrame({'c':['a','b']})
B = pd.DataFrame({'c':['c','c','b','b','c','b','a','a']})


B = B.drop_duplicates('c')
print (B)
   c
0  c
2  b
6  a

print (B[B.c.isin(A.c)])
   c
2  b
6  a

print (B[B.c.isin(A.c)].index)
Int64Index([2, 6], dtype='int64')

print (pd.merge(B.reset_index(), A))
   index  c
0      2  b
1      6  a

print (pd.merge(B.reset_index(), A)['index'])
0    2
1    6
Name: index, dtype: int64

This gives all the indices that are matched (with python's 0 based indexing):

import pandas as pd

df1 = pd.DataFrame({'C': ['a','b']})
print df1

   C
0  a
1  b

df2 = pd.DataFrame({'C': ['c','c','b','b','c','b','a','a']})
print df2   

   C
0  c
1  c
2  b
3  b
4  c
5  b
6  a
7  a

match = df2['C'].isin(df1['C'])
print [i for i in range(match.shape[0]) if match[i]]

#[2, 3, 5, 6, 7]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM