简体   繁体   中英

where condition with inner join in pandas

I want to achieve the below sql equivalent in pandas -

SELECT 
    A.col1,
    A.col2,
    A.col3,
    A.col4,
    A.col5
FROM Table1 A
INNER JOIN Table1 b
ON   A.col1  = B.col1 AND
     A.col2  = B.col2
WHERE 
    (A.col3 <> 0) AND   
    (A.col4 <> B.col4) 

The pandas part I have been able to achieve is -

#Dataframe dfTable1All contains the columns col1, col2, col3, col4, col5
#Dataframe dfTable1 contains the columns col1, col2, col4

dfTable1All = dfTable1All [(dfTable1All ['col3'] <> 0)]

dfjoin = pd.merge(dfTable1All, dfTable1, on=('col1','col2'), how='inner')

Can you please help me on how to use the where condition with the inner join?

WHERE (A.col4 <> B.col4)

Thanks.

If you look at the merge docstring, there is is option for suffixes on columns that have the same name. The defaults are '_x' and '_y' . So to filter where the two col4s are not equal, you could do:

dfjoin = dfjoin[dfjoin['col4_x'] != dfjoin['col4_y']]

I would break it up into separate operations

dfA.set_index(['col1', 'col2'], inplace=True)
dfB.set_index(['col1', 'col2'], inplace=True)

dfAB = dfA.join(dfB, rsuffix='_A', lsuffix='_B')

dfAB.query("col3_A != 0 and col4_A != col4_B")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM