[英]Pandas Compare between Two DataFrames, flag what matches
我必須要數據幀df
和df1
df
低於
Facility Category ID Part Text
Centennial History 11111 A Drain
Centennial History 11111 B Read
Centennial History 11111 C EKG
Centennial History 11111 D Assistant
Centennial History 11111 E Primary
df1
在下面(僅包含一個小樣本問題,實際上是50,000行)
Facility Category ID Part Text
Centennial History 11111 D Assistant
基本上,我想比較數據框之間的行,如果行在兩個數據框之間匹配,則在第一個數據框df
創建另一個列,列標題為['MatchingFlag']
我的最終結果數據框如下所示,因為我擔心那些不匹配的數據框。
Facility Category ID Part Text MatchingFlag
Centennial History 11111 A Drain No
Centennial History 11111 B Read No
Centennial History 11111 C EKG No
Centennial History 11111 D Assistant Yes
Centennial History 11111 E Primary No
有什么幫助嗎? 我試過合並df = pd.merge(df1, df, how='left', on=['Facility', 'Category', 'ID', 'Part', 'Text'])
然后根據空白或NaN值創建一個標志,但這並沒有達到我的期望。
在要匹配的列上設置索引,然后使用該索引來排序匹配的行可能是有意義的
columns = ['Facility', 'Category', 'ID', 'Part', 'Text']
# It's always a good idea to sort after creating a MultiIndex like this
df = df.set_index(columns).sortlevel()
df1 = df1.set_index(columns).sortlevel()
# You don't have to use Yes here, anything will do
# The boolean True might be more appropriate
df['MatchingFlag'] = "Yes"
df1['MatchingFlag'] = "Yes"
# Add them together, matching rows will have the value "YesYes"
# Non-matches will be nan
result = df + df1
# If you'd rather not have NaN's
result.loc[:,'MatchingFlag'] = result.loc[:,'MatchingFlag'].replace('YesYes','Yes')
result.loc[:,'MatchingFlag'] = result['MatchingFlag'].fillna('No')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.