[英]Compare 2 Pandas Dataframes and return all rows that are different
我有 2 個具有相同架構和不同數據的數據幀。 我想比較它們並獲取任何列具有不同值的所有行。
“df1”:
id Store is_open
1 'Walmart' true
2 'Best Buy' false
3 'Target' true
4 'Home Depot' true
“df2”:
id Store is_open
1 'Walmart' false
2 'Best Buy' true
3 'Target' true
4 'Home Depot' false
我能夠得到差異,但我沒有得到所有的列,而只是那些已經改變的列。 所以我得到以下輸出:
結果_df:
id is_open is_open
1 true false
2 false true
4 true false
這是實現上述輸出的代碼:
ne_stacked = (from_aoi_df != to_aoi_df).stack()
changed = ne_stacked[ne_stacked]
changed.index.names = ['id', 'col_changed']
difference_locations = np.where(from_aoi_df != to_aoi_df)
changed_from = from_aoi_df.values[difference_locations]
changed_to = to_aoi_df.values[difference_locations]
df5=pd.DataFrame({'from': changed_from, 'to': changed_to})
df5
但是,除了上述結果之外,我還想要添加 Store 列的所有相同列,因此我的預期輸出是:
expected_result_df:
id Store is_open_df1 is_open_df2
1 Walmart true false
2 Best Buy false true
4 Home Depot true false
我怎樣才能做到這一點?
這個怎么樣?
df1['is_open_df2'] = df2['is_open']
expected_result_df = df1[df1['is_open'] != df1[is_open_df2']]
new_df = pd.concat([df1, df2]).reset_index(drop=True)
df = new_df.drop_duplicates(subset=['col1','col2'], keep=False)
這將為您提供一個名為 df 的數據框,其中僅包含不同的記錄。
df=np.where(df1==df2,'true','false')
希望這可以幫助!! 如果 df1 和 df2 具有唯一值,則有效……在使用此之前,您可以刪除其中存在的重復項。
用:
#compare DataFrames
m = (from_aoi_df != to_aoi_df)
#check at least one True per columns
m1 = m.any(axis=0)
#check at least one True per rows
m2 = m.any(axis=1)
#filter only not equal values
df1 = from_aoi_df.loc[m2, m1].add_suffix('_df1')
df2 = to_aoi_df.loc[m2, m1].add_suffix('_df2')
#filter equal values
df3 = from_aoi_df.loc[m2, ~m1]
#join together
df = pd.concat([df3, df1, df2], axis=1)
print (df)
id Store is_open_df1 is_open_df2
0 1 Walmart True False
1 2 Best Buy False True
3 4 Home Depot True False
使用多個更改的列驗證解決方案:
#changed first value id column
print (from_aoi_df)
id Store is_open
0 10 Walmart True
1 2 Best Buy False
2 3 Target True
3 4 Home Depot True
m = (from_aoi_df != to_aoi_df)
m1 = m.any(axis=0)
m2 = m.any(axis=1)
df1 = from_aoi_df.loc[m2, m1].add_suffix('_df1')
df2 = to_aoi_df.loc[m2, m1].add_suffix('_df2')
df3 = from_aoi_df.loc[m2, ~m1]
df = pd.concat([df3, df1, df2], axis=1)
print (df)
Store id_df1 is_open_df1 id_df2 is_open_df2
0 Walmart 10 True 1 False
1 Best Buy 2 False 2 True
3 Home Depot 4 True 4 False
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.