簡體   English   中英

Python Pandas Dataframe:根據單獨的列取下一個較小的值

[英]Python Pandas Dataframe: Take next smaller value based on separate column

我有一個交易表。 對於 1 個日期,1 個事務 ID 可以存在的記錄數。

我想在同一個 df 中添加一個新列,以便它可以為每個事務 ID 提取相關行日期之前的日期。

如果沒有上一個日期的交易,新欄應該是空白的。 如果上一個日期有超過 1 條記錄,則應在上述記錄中填充該日期的第一條記錄的值。

在 excel 中,使用匹配和索引的組合,我填充了值,但是我是 python 的新手,並且努力以正確的方式完成這項工作。

我在想是否可以使用自連接(事務 ID 可以是關鍵,並且條件Transaction_Date of t2 < Transaction_Date of t1 )或者在 Python 中是否有更有效的方法。

數據按交易 ID 和交易日期(降序)排序。

輸入數據:

Transaction_ID  Transaction_Date  Invoice
1001            3/27/2020         10,000 
1001            3/27/2020         10,000 
1001            3/27/2020         10,000 
1002            1/23/2020         127,000 
1002            10/30/2019        117,000 
1003            3/26/2020         291,000 
1003            3/24/2020         292,000 
1003            1/15/2020         290,000 
1003            12/30/2019        292,000 
1003            10/21/2019        189,000 
1003            10/21/2019        189,000 
1004            2/17/2020         1,261,500 
1004            2/14/2020         1,262,000 
1004            1/14/2020         1,552,000 
1004            1/14/2020         1,452,000 
1004            12/14/2019        1,000,000 
1004            11/4/2019         2,392,000 
1004            11/4/2019         2,792,000

預期 Output:

Transaction_ID  Transaction_Date  Invoice    Previous_Transaction_Date  Previous_Invoice_amount
1001            3/27/2020         10,000           
1001            3/27/2020         10,000           
1001            3/27/2020         10,000           
1002            1/23/2020         127,000    10/30/2019                 117,000 
1002            10/30/2019        117,000          
1003            3/26/2020         291,000    3/24/2020                  292,000 
1003            3/24/2020         292,000    1/15/2020                  290,000 
1003            1/15/2020         290,000    12/30/2019                 292,000 
1003            12/30/2019        292,000    10/21/2019                 189,000 
1003            10/21/2019        189,000          
1003            10/21/2019        189,000          
1004            2/17/2020         1,261,500  2/14/2020                  1,262,000 
1004            2/14/2020         1,262,000  1/14/2020                  1,552,000 
1004            1/14/2020         1,552,000  12/14/2019                 1,000,000 
1004            1/14/2020         1,452,000  12/14/2019                 1,000,000 
1004            12/14/2019        1,000,000  11/4/2019                  2,392,000 
1004            11/4/2019         2,392,000  9/10/2020                  900,050 
1004            11/4/2019         2,792,000  9/10/2020                  900,050

np.where.ne.eq使用大量.shift邏輯,這將完成工作。

df1 = df.copy()
#Main Logic
df1['Previous_Transaction_Date'] = np.where(((df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(1))) |
                                                 (df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(-1)))) &
                                                  (df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))),
                                                 df1['Transaction_Date'].shift(-1), '')
df1['Previous_Invoice_amount'] = np.where(((df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(1))) |
                                                 (df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(-1)))) &
                                                  (df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))),
                                                 df1['Invoice'].shift(-1), '')
#Supplementary logic to get rest of cells.
df1['Previous_Transaction_Date'] = np.where(df1['Previous_Transaction_Date'] == df1['Transaction_Date'],
                                            df1['Previous_Transaction_Date'].shift(-1), df1['Previous_Transaction_Date'])
df1['Previous_Invoice_amount'] = np.where(df1['Previous_Transaction_Date'] == '',
                                          '', df1['Previous_Invoice_amount'])
df1['Previous_Invoice_amount'] = np.where(((df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))) &
                                          (df1['Transaction_Date'].eq(df1['Transaction_Date'].shift(-1)))),
                                          df1['Previous_Invoice_amount'].shift(-1),df1['Previous_Invoice_amount'])
df1

請記住,除了前 3 行之外,Transaction_Date 的最高計數是 2。 如果您的較大數據集中的 Transaction_Date 計數為 3 或更高,則您可能無法獲得預期的結果。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM