如何根據兩個數據框中兩列或三列之間的條件創建新的 boolean 列？

Question

我有兩個不同大小的數據幀df1和df2 。 我正在嘗試檢查df1中的值是否存在於df2的列中，並在df1的新列中返回True或False 。

第一個dataframe是我的參考。 它是從 xls 文件中提取的。

df1.head(10)
Out[29]: 
    PO Number  Sales Document           SO           DO  Document Number
0  3620556930    9001724124.0 4001458660.0 8001721322.0       1500017748
1  3620556930    9001723883.0 4001458865.0 8001721037.0       1500017540
2  3620556930    9001723884.0 4001459374.0 8001721038.0       1500017541
3  3620556930    9001723885.0 4001458101.0 8001721043.0       1500017542
4  3620547728    9001721907.0 4001457180.0 8001719172.0       1500015786
5  3620556930    9001721908.0 4001457724.0 8001719173.0       1500015787
6    TT030720             nan          nan          nan        700001897
7  3620518726    9600008914.0 5600008655.0 5600008655.0       1500008725
8  3620518726    9600008912.0 5600008653.0 5600008653.0       1500008723
9  3620518726    9600008913.0 5600008654.0 5600008654.0       1500008724

第二個 dataframe 來自我從網站上抓取的表格。

df2.head(10)
Out[32]: 
        PO No         Doc Type  SUS Doc No                    GR_GA   Inv_SO_DO  Doc Date
0  3620556930   Purchase Order  8001294233                      CSL              27.08.2020
1  3620556930    Goods Receipt  7903307400           Goods Received  4001457724  04.09.2020
2  3620556930    Goods Receipt  7903307457           Goods Accepted  4001457724  04.09.2020
3  3620556930  Payment Request  3102053949              CCM Invoice  9001721908  23.09.2020
4  3620556930    Goods Receipt  7903333326           Goods Received  4001458660  29.09.2020
5  3620556930    Goods Receipt  7903333325           Goods Received  4001458101  29.09.2020
6  3620556930    Goods Receipt  7903333322           Goods Received  4001458865  29.09.2020
7  3620556930    Goods Receipt  7903333327           Goods Accepted  4001458660  29.09.2020
8  3620556930    Goods Receipt  7903333324           Goods Received  4001458660  29.09.2020
9  3620556930    Goods Receipt  7903333329           Goods Accepted  4001458865  29.09.2020

我獲得 output 的思路如下：

我將在df1中創建另外三個列，命名為df1['GR', 'GA', 'Inv'] 。
我將使用df1['SO']和df1['DO']中的值來檢查它們是否存在於df2['Inv_SO_DO']中。
如果這些值存在，我將檢查df2['GR_GA']是收貨單、收貨單還是發票。 然后，我將根據此檢查在df1['GR', 'GA', 'Inv']列中返回True或False 。

我已經嘗試了一個for循環，如下所示，用於創建要為['GA']添加的值列表，但它只給了我一個 Falses 列表。

ga = []
t1 = x.iloc[:,2].values
t2 = y.iloc[:,4].values
t3 = y.iloc[:,3].values
for i in t1:
    for j in t2:
        for k in t3:
            if i == j and k == 'Goods Receipt':
                ga.append('True') 
                
            else:
                ga.append('False')

我最接近解決方案的是這里的另一個問題。 我嘗試了代碼並對其進行了修改，但結果也不正確。 要么，要么我正在錯誤地執行鏈接中的代碼。

任何建議都將受到歡迎！

Output 需要：

df1.head(4)
Out[43]: 
    PO Number  Sales Document           SO           DO  Document Number     GR     GA    Inv
0  3620556930    9001724124.0 4001458660.0 8001721322.0       1500017748   True   True   True
1  3620556930    9001723883.0 4001458865.0 8001721037.0       1500017540   True  False  False
2  3620556930    9001723884.0 4001459374.0 8001721038.0       1500017541  False  False  False
3  3620556930    9001723885.0 4001458101.0 8001721043.0       1500017542   True   True  False

Answer 1

您可以執行此操作的一種方法如下：

將DO或SO （從左起）上的df1和df2合並到Inv_SO_DO （從右起）。 請注意，在您的情況下，每個SO值對應於df2中的多行，因此您可能需要稍微修改合並邏輯（例如df2中最新出現的行？）
使用pd.get_dummies() “虛擬化” GR_GA列，然后在將虛擬對象轉換為boolean類型后，將其與合並 df 中所需的列連接起來。

例如：

m = pd.concat([df1.merge(df2, left_on='SO', right_on='Inv_SO_DO', how='inner'),
               df1.merge(df2, left_on='DO', right_on='Inv_SO_DO', how='inner')
              ])

desired_cols = ["PO_Number", "Sales_Document", "SO", "DO", "Document_Number", "CSL", "GoodsAccepted", "GoodsReceived"]
pd.concat([m, pd.get_dummies(m['GR_GA']).astype(bool)], axis=1)[desired_cols]

結果如下：

    PO_Number   Sales_Document  SO          DO          Document_Number CSL GoodsAccepted   GoodsReceived   CCMInvoice
0   3620556930  9001724124      4001458660  8001721322  1500017748      False   False           True            False
1   3620556930  9001724124      4001458660  8001721322  1500017748      False   True            False           False
2   3620556930  9001724124      4001458660  8001721322  1500017748      False   False           True            False
3   3620556930  9001723883      4001458865  8001721037  1500017540      False   False           True            False

再次注意，因為您提供的示例df1中的每個SO和DO都可以匹配df2中的多於 1 行，所以您可能需要添加一些關於如何合並的自定義邏輯。

如何根據兩個數據框中兩列或三列之間的條件創建新的 boolean 列？

問題描述

1 個解決方案

解決方案1
1 2020-10-26 10:31:02

如何根據兩個數據框中兩列或三列之間的條件創建新的 boolean 列？

問題描述

1 個解決方案

解決方案1 1 2020-10-26 10:31:02

解決方案1
1 2020-10-26 10:31:02