简体   繁体   中英

Compare two columns, one by float and other by strings to get matching values

I have two dataframes with two of its columns that really matter. One of the column consists of float64 values and the other is string. the dataframes are of different sizes.

I would like to match both the Number column and the Item column at the same time and then get only the ones that match.

df1 = pd.DataFrame({ 'Number':[1.0,3.0,4.0,5.0,8.0,12.0,32.0,58.0] , 'Item': ['Phone', 'Watch', 'Pen', 'Pencil', 'Pencil','toolkit','box','fork']})

df2 = pd.DataFrame({'Number':[3.0,4.0,8.0,12.0,15.0,32.0,54.0,58.0,72.0], 'Item':['Watch','Pen','Pencil','Eraser','bottle','box','toolkit','fork','Phone']})

df1
Number     Item
0     1.0    Phone
1     3.0    Watch
2     4.0      Pen
3     5.0   Pencil
4     8.0   Pencil
5    12.0  toolkit
6    32.0      box
7    58.0     fork

df2
   Number     Item
0     3.0    Watch
1     4.0      Pen
2     8.0   Pencil
3    12.0   Eraser
4    15.0   bottle
5    32.0      box
6    54.0  toolkit
7    58.0     fork
8    72.0    Phone

I am trying to use forloop and the loop goes very long. It seems to be very bad method of achieving this. I am trying to use mask operation but unsure how to achieve this. Appreciate help in performing this in the shortest way possible.

The desired result should look like:

      Item  Matching  Number
0    Phone  No Match     1.0
1    Watch   Matched     3.0
2      Pen   Matched     4.0
3   Pencil  No Match     5.0
4   Pencil   Matched     8.0
5  toolkit  No Match    12.0
6      box   Matched    32.0
7     fork   Matched    58.0

If problem merge float values, is possible multiple by 1000 and cast to integers and then merge with left join, because there should be problem with matching, baciuse float precision should be different in both columns:

df1['Number1'] = df1['Number'].mul(1000).astype(int)
df2['Number1'] = df2['Number'].mul(1000).astype(int)

df = pd.merge(df1, df2.drop('Number', 1), how='left', on=['Item','Number1'], indicator=True)
df['Matching'] = df['_merge'].map({'left_only':'No Match', 'both':'Match'})

df = df.drop(['Number1','_merge'], axis=1)
print (df)

   Number     Item  Matching
0     1.0    Phone  No Match
1     3.0    Watch     Match
2     4.0      Pen     Match
3     5.0   Pencil  No Match
4     8.0   Pencil     Match
5    12.0  toolkit  No Match
6    32.0      box     Match
7    58.0     fork     Match

You are looking for a left merge with indicator=True :

res = pd.merge(df1, df2, how='left', indicator=True)

print(res)

      Item  Number     _merge
0    Phone     1.0  left_only
1    Watch     3.0       both
2      Pen     4.0       both
3   Pencil     5.0  left_only
4   Pencil     8.0       both
5  toolkit    12.0  left_only
6      box    32.0       both
7     fork    58.0       both

In general, avoid explicit for loops when purpose-built methods are available, as these are generally optimized for performance. You can, if you wish, then replace strings via a dictionary mapping:

d = {'left_only': 'No Match', 'both': 'Matched'}
df['_merge'] = df['_merge'].map(d)

You can come to the dataframe you need by some simple loc and isin as shown below

df = df1.copy()

df['Matching'] = np.nan
df.loc[(df.Number.isin(df2.Number)) & (df.Item.isin(df2.Item)), 'Matching'] = 'Matched'
df.Matching.fillna('No Match', inplace=True)

Number    Item      Matching

1.0   Phone     No Match
3.0   Watch     Matched
4.0   Pen       Matched
5.0   Pencil    No Match
8.0   Pencil    Matched
12.0  toolkit   Matched
32.0  box       Matched
58.0  fork      Matched

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM