[英]Comparing two dataframes based off one column, with the equal values in different index positions
所以我遇到了一個問題,試圖為我似乎遇到的問題找到解決方案。
我正在嘗試比較兩個很大的數據幀,但是對於我的第一個問題,我將其減小為較小的樣本量。
目前,我只想簡單地打印出這兩個數據框中的播放器名稱。 將來,我將遍歷各列以比較值並記錄差異,但這是將來的問題。
我已經注意到,在共享其他示例和解決方案時,大多數人將在相同的索引中擁有他們要比較的兩個值,但是我對Pandas命令的經驗不足,無法知道如何操作這些解決方案。
import pandas as pd
df1=pd.read_excel('Example players 2019.xlsx')
df2=pd.read_excel('Example players 2018.xlsx')
header2019 = df1.iloc[0]
df1 = df1[1:]
df1.columns = header2019
header2018 = df2.iloc[0]
df2 = df2[1:]
df2.columns = header2018
print('df1')
print(df1)
print('df2')
print(df2)
columnLength2019=df1.shape[1]
columnLength2018=df2.shape[1]
rowLength2019=df1.shape[0]
rowLength2018=df2.shape[0]
for i in range (1, rowLength2019):
for j in range (1, rowLength2018):
if df1['Player'] == df2['Player']:
print(df1['Player'])
您可能要合並播放器列上的兩個數據框,請參閱https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html 。
例:
import pandas as pd
df_2018 = pd.DataFrame({'player':['a','b','c'], 'team':['x','y','z']})
df_2019 = pd.DataFrame({'player':['b','c','d'], 'team':['y','j','k']})
matched = df_2018.merge(df_2019,
on='player',
how='inner',
suffixes=['_2018','_2019']
)
print(matched)
輸出:
player team_2018 team_2019
0 b y y
1 c z j
要打印出匹配的播放器,您可以執行以下操作:
for player in matched['player']:
print(player)
在同一個DataFrame中擁有這兩個年份的數據還應該使以后比較它們變得更加容易。
您可以使用isin
檢驗值是否為序列
a =df1[(df1.player.isin(df2.player))]
for player in a['player']:
print(player)
或者你可以使用np.where
與isin
到chekc和在一行打印。
np.where((df1.player.isin(df2.player)), df1.player+ " is present", df1.player+ " is NOT present").tolist()
您也可以使用np.where
在數據np.where
創建一列
df1['present'] = np.where((df1.player.isin(df2.player)), "Present", "NOT present")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.