简体   繁体   中英

Is there a function to compare two DataFrames and output the different elements?

I am trying to compare two different DataFrames to check the new_df symbol does it exists in old_df, if it doesnt exist in the old_df, I want to output to a list.

Code looks like this:

old_df = pd.DataFrame({'symbol': ['A', 'B', 'C', 'D', 'E']})
new_df = pd.DataFrame({'symbol': ['C', 'A', 'B', 'F', 'H']})

I want the output like this:

['F','H']

isin() as a mask

old_df = pd.DataFrame({'symbol': ['A', 'B', 'C', 'D', 'E']})
new_df = pd.DataFrame({'symbol': ['C', 'A', 'B', 'F', 'H']})

new_df[~np.isin(new_df.values,old_df.values)].values

output

array([['F'],
       ['H']], dtype=object)

What about

>>> set(new_df.symbol.unique()) - set(old_df.symbol.unique())
{'F', 'H'}

Using the python built-in set .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM