简体   繁体   中英

pandas inner join on data frames without merging data

I have 2 indexed data frames (index on column 0):

0       1       2          3
JoeK    Joe     Kavanagh   joe.kavanagh@nomail.com
BarryD  Barry   Dempsy     bdempsy@nomail.com
OrlaF   Orla    Farrel     ofjk@nomail.com
SethB   Seth    Black      sblack@nomail.com
KateW   Kate    White      kw12@nomail.com

and the second one:

0       1       2          3
JoeK    Joe     Kavanagh   jkavanagh@nomail.com
BarryD  Barry   Dempsy     barry.dempsy@nomail.com
JimmyS  Jimmy   Smith      j.Smith@nomail.com
AndyB   Andy    Brown      ABrwn@nomail.com
MaryP   Mary    Power      MaryPower@nomail.com

I would like to perform an inner join like in the following SQL:

SELECT df2.* FROM df2
INNER JOIN df1
ON df2.0 = df1.0

Where I only get the results from the 2nd data frame, and not both:

0       1       2          3
JoeK    Joe     Kavanagh   jkavanagh@nomail.com
BarryD  Barry   Dempsy     barry.dempsy@nomail.com

I tried the pandas merge, but it gives me the result from both data frames! Any help is much appreciated.

rather than a merge you want to just filter your second df by testing membership of the values against the other df using isin :

In [16]:
df1[df1['0'].isin(df['0'])]

Out[16]:
        0      1         2                        3
0    JoeK    Joe  Kavanagh     jkavanagh@nomail.com
1  BarryD  Barry    Dempsy  barry.dempsy@nomail.com

You need to specify that you want an inner merge (the default is to do an outer). You also need to first restrict df1 to the merge columns:

In [11]: df2.merge(df1[['0']], how="inner", on=['0'])  # equivalently df1[['0']].merge(df2, how="inner", on=['0'])
Out[11]:
        0      1         2                        3
0    JoeK    Joe  Kavanagh     jkavanagh@nomail.com
1  BarryD  Barry    Dempsy  barry.dempsy@nomail.com

If you don't do the restriction ( df1[['0']] ) it'll suffix the overlapping columns:

In [12]: df2.merge(df1, how="inner", on=['0'])
Out[12]:
        0    1_x       2_x                      3_x    1_y       2_y                      3_y
0    JoeK    Joe  Kavanagh     jkavanagh@nomail.com    Joe  Kavanagh  joe.kavanagh@nomail.com
1  BarryD  Barry    Dempsy  barry.dempsy@nomail.com  Barry    Dempsy       bdempsy@nomail.com

The suffixes can be configured with the suffixes kwarg.

See also the pandas docs for a "brief primer on merge methods" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM