[英]Compare two dataframe cell by cell with conditions on columns
我想比較兩個數據幀並輸出一個具有差異的數據幀。 但是,我可以容忍 2 天差異內的日期差異,並在 5 分差異內得分。 如果它們在可接受的范圍內,我將保留來自 df1 的值。
df1
id group date score
10 A 2020-01-10 50
29 B 2020-01-01 80
39 C 2020-01-21 84
38 A 2020-02-02 29
df2
id group date score
10 B 2020-01-11 56
29 B 2020-01-01 81
39 C 2020-01-22 85
38 A 2020-02-12 29
我的預期輸出:
id group date score
10 A -> B 2020-01-10 50 -> 56
29 B 2020-01-01 80
39 C 2020-01-21 84
38 A 2020-02-02 -> 2020-02-12 29
因此,我想在某些列上逐個單元格和條件比較數據幀單元格。
我開始了這個:
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
result = []
for col in df1.columns:
for index, row in df1.iterrows():
diff = []
compare_item = row[col][index]
for index, row in df2.iterrows():
if col == 'date':
# acceptable if it's within 2 days differences
if col == 'score':
# acceptable if it's within 5 points differences
if compare_item == row[col][index]:
diff.append(compare_item)
else:
diff.append('{} --> {}'.format(compare_item, row[col]))
result.append(diff)
df = pd.DataFrame(result, columns = [df1.columns])
咱們試試吧:
thresh = {'date':pd.to_timedelta('2D'),
'score':5}
def update(col):
name = col.name
# if there is a threshold, we update only if threshold is surpassed
if name in thresh:
return col.where(col.sub(df2[name]).abs()<=thresh[name], df2[name])
# there is no threshold for the column
# return the corresponding column from df2
return df2[name]
df1.apply(update)
輸出:
group date score
id
10 B 2020-01-10 56
29 B 2020-01-01 80
39 C 2020-01-21 84
38 A 2020-02-12 29
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.