Python Pandas Dataframe：根據單獨的列取下一個較小的值

Question

我有一個交易表。 對於 1 個日期，1 個事務 ID 可以存在的記錄數。

我想在同一個 df 中添加一個新列，以便它可以為每個事務 ID 提取相關行日期之前的日期。

如果沒有上一個日期的交易，新欄應該是空白的。 如果上一個日期有超過 1 條記錄，則應在上述記錄中填充該日期的第一條記錄的值。

在 excel 中，使用匹配和索引的組合，我填充了值，但是我是 python 的新手，並且努力以正確的方式完成這項工作。

我在想是否可以使用自連接（事務 ID 可以是關鍵，並且條件Transaction_Date of t2 < Transaction_Date of t1 ）或者在 Python 中是否有更有效的方法。

數據按交易 ID 和交易日期（降序）排序。

輸入數據：

Transaction_ID  Transaction_Date  Invoice
1001            3/27/2020         10,000 
1001            3/27/2020         10,000 
1001            3/27/2020         10,000 
1002            1/23/2020         127,000 
1002            10/30/2019        117,000 
1003            3/26/2020         291,000 
1003            3/24/2020         292,000 
1003            1/15/2020         290,000 
1003            12/30/2019        292,000 
1003            10/21/2019        189,000 
1003            10/21/2019        189,000 
1004            2/17/2020         1,261,500 
1004            2/14/2020         1,262,000 
1004            1/14/2020         1,552,000 
1004            1/14/2020         1,452,000 
1004            12/14/2019        1,000,000 
1004            11/4/2019         2,392,000 
1004            11/4/2019         2,792,000

預期 Output：

Transaction_ID  Transaction_Date  Invoice    Previous_Transaction_Date  Previous_Invoice_amount
1001            3/27/2020         10,000           
1001            3/27/2020         10,000           
1001            3/27/2020         10,000           
1002            1/23/2020         127,000    10/30/2019                 117,000 
1002            10/30/2019        117,000          
1003            3/26/2020         291,000    3/24/2020                  292,000 
1003            3/24/2020         292,000    1/15/2020                  290,000 
1003            1/15/2020         290,000    12/30/2019                 292,000 
1003            12/30/2019        292,000    10/21/2019                 189,000 
1003            10/21/2019        189,000          
1003            10/21/2019        189,000          
1004            2/17/2020         1,261,500  2/14/2020                  1,262,000 
1004            2/14/2020         1,262,000  1/14/2020                  1,552,000 
1004            1/14/2020         1,552,000  12/14/2019                 1,000,000 
1004            1/14/2020         1,452,000  12/14/2019                 1,000,000 
1004            12/14/2019        1,000,000  11/4/2019                  2,392,000 
1004            11/4/2019         2,392,000  9/10/2020                  900,050 
1004            11/4/2019         2,792,000  9/10/2020                  900,050

Answer 1

對np.where 、 .ne和.eq使用大量.shift邏輯，這將完成工作。

df1 = df.copy()
#Main Logic
df1['Previous_Transaction_Date'] = np.where(((df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(1))) |
                                                 (df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(-1)))) &
                                                  (df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))),
                                                 df1['Transaction_Date'].shift(-1), '')
df1['Previous_Invoice_amount'] = np.where(((df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(1))) |
                                                 (df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(-1)))) &
                                                  (df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))),
                                                 df1['Invoice'].shift(-1), '')
#Supplementary logic to get rest of cells.
df1['Previous_Transaction_Date'] = np.where(df1['Previous_Transaction_Date'] == df1['Transaction_Date'],
                                            df1['Previous_Transaction_Date'].shift(-1), df1['Previous_Transaction_Date'])
df1['Previous_Invoice_amount'] = np.where(df1['Previous_Transaction_Date'] == '',
                                          '', df1['Previous_Invoice_amount'])
df1['Previous_Invoice_amount'] = np.where(((df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))) &
                                          (df1['Transaction_Date'].eq(df1['Transaction_Date'].shift(-1)))),
                                          df1['Previous_Invoice_amount'].shift(-1),df1['Previous_Invoice_amount'])
df1

請記住，除了前 3 行之外，Transaction_Date 的最高計數是 2。 如果您的較大數據集中的 Transaction_Date 計數為 3 或更高，則您可能無法獲得預期的結果。

Python Pandas Dataframe：根據單獨的列取下一個較小的值

問題描述

1 個解決方案

解決方案1
1 2020-05-28 12:09:53

Python Pandas Dataframe：根據單獨的列取下一個較小的值

問題描述

1 個解決方案

解決方案1 1 2020-05-28 12:09:53

解決方案1
1 2020-05-28 12:09:53