[英]How to merge two pandas dataframe and sort by number record in the dataframe?
現在,我有兩個熊貓數據框
規則是:
當兩個數據幀的“否”相同時,
只需將“pass”結果記錄提取到 df1 數據幀。
當兩個數據幀的“否”不同時,根據數字大小將 df2 的數據幀合並到 df1 中。
數據幀 1 和 2 都在下面:
====df1====
no result
0 12 pass
1 13 fail
2 15 pass
3 16 pass
4 17 fail
====df2====
no result
0 13 pass
1 14 fail
預期的答案是:
====df1-merge====
no result
0 12 pass
1 13 pass
2 14 fail
3 15 pass
4 16 pass
5 17 fail
但是在執行我的代碼后,結果如下,如何將我的代碼修復為與上述預期答案相同? 謝謝
====df1-merge====
no result
0 12 pass
1 13 pass
2 15 pass
3 16 pass
4 17 fail
我的代碼如下:
import pandas as pd
import numpy as np
lst11 = [12,13,15,16,17]
lst12 = ["pass","fail","pass","pass","fail"]
df1 = pd.DataFrame(list(zip(lst11,lst12)), columns = ['no','result'])
lst21 = [13,14]
lst22 = ["pass","fail"]
df2 = pd.DataFrame(list(zip(lst21,lst22)), columns = ['no','result'])
print("====df1====")
print(df1)
print("====df2====")
print(df2)
for i in range(len(df1) - 1):
no1 = df1.at[i, "no"]
for x in range(len(df2)):
no2 = df2.at[x, "no"]
if no1 == no2:
result_no1 = df1.at[i,'result']
result_no2 = df2.at[x,'result']
#==============================
if result_no1 == "pass":
result_no1_str = 1
else:
result_no1_str = 0
if result_no2 == "pass":
result_no2_str = 1
else:
result_no2_str = 0
#==============================
result_all = result_no1_str or result_no2_str
#==============================
if result_all == 1:
result_all = "pass"
else:
result_all = "fail"
df1.at[i, "result"] = result_all
else:
if no1 < no2:
if i == len(df1) - 1:
no = df2.at[x,'no']
result = df2.at[x,'result']
df1.loc[len(df1.index)] = [no, result]
else:
pass
else:
if i == len(df1) - 1:
no = df2.at[i,'no']
result = df2.at[i,'result']
df1.loc[i+1] = pd.Series({"no": no,"Result": result})
else:
pass
print("\n====df1-merge====")
print(df1)
[===== 添加“N/A”結果類型后的新編輯 =====]
如果結果類型有“N/A”,而不僅僅是“通過”和“失敗”……規則是:
當兩個數據幀的“否”相同時,只需將“通過”結果記錄提取到 df1 數據幀。
但如果其中一個結果是“N/A”,則先選擇“pass”,然后“fail”,如果兩個結果都是“N/A”,則選擇“N/A”
當兩個數據幀的“否”不同時,根據數字大小將 df2 的數據幀合並到 df1 中。
The both of dataframe 1 and 2 is below:
====df1====
no result
0 12 pass
1 13 fail
2 15 pass
3 16 N/A
4 17 N/A
5 18 pass
====df2====
no result
0 13 pass
1 14 fail
2 15 N/A
3 16 N/A
4 17 fail
預期的答案是:
====df1-merge====
no result
0 12 pass
1 13 pass
2 14 fail
3 15 pass
4 16 N/A
5 17 fail
6 18 pass
df = df1.copy()
df.loc[df["no"].isin(df2["no"]), "result"] = "pass"
df = df.append(df2[~df2["no"].isin(df["no"])], ignore_index=True)
注意:這里我重置了結果數據幀的索引以避免重復索引
編輯:要使用 NA 值,將 no 列設置為df1 = pd.DataFrame(data=lst12, index=lst11, columns=["result"])
會簡單得多(即通過創建這樣的數據df1 = pd.DataFrame(data=lst12, index=lst11, columns=["result"])
: df1 = pd.DataFrame(data=lst12, index=lst11, columns=["result"])
)
然后你可以添加一個額外的條件,當兩個數據幀中的值都不是 na 時,只修改你的第一個數據幀的結果列。
df = df1.copy()
df.loc[df.index.isin(df2.index) & (~(df2["result"].isna() & df["result"].isna())).reindex(df.index), "result"] = "pass"
df = df.append(df2[~df2.index.isin(df.index)]).sort_index()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.