[英]Python Pandas Dataframe: Take next smaller value based on separate column
我有一個交易表。 對於 1 個日期,1 個事務 ID 可以存在的記錄數。
我想在同一個 df 中添加一個新列,以便它可以為每個事務 ID 提取相關行日期之前的日期。
如果沒有上一個日期的交易,新欄應該是空白的。 如果上一個日期有超過 1 條記錄,則應在上述記錄中填充該日期的第一條記錄的值。
在 excel 中,使用匹配和索引的組合,我填充了值,但是我是 python 的新手,並且努力以正確的方式完成這項工作。
我在想是否可以使用自連接(事務 ID 可以是關鍵,並且條件Transaction_Date
of t2 < Transaction_Date of t1
)或者在 Python 中是否有更有效的方法。
數據按交易 ID 和交易日期(降序)排序。
輸入數據:
Transaction_ID Transaction_Date Invoice
1001 3/27/2020 10,000
1001 3/27/2020 10,000
1001 3/27/2020 10,000
1002 1/23/2020 127,000
1002 10/30/2019 117,000
1003 3/26/2020 291,000
1003 3/24/2020 292,000
1003 1/15/2020 290,000
1003 12/30/2019 292,000
1003 10/21/2019 189,000
1003 10/21/2019 189,000
1004 2/17/2020 1,261,500
1004 2/14/2020 1,262,000
1004 1/14/2020 1,552,000
1004 1/14/2020 1,452,000
1004 12/14/2019 1,000,000
1004 11/4/2019 2,392,000
1004 11/4/2019 2,792,000
預期 Output:
Transaction_ID Transaction_Date Invoice Previous_Transaction_Date Previous_Invoice_amount
1001 3/27/2020 10,000
1001 3/27/2020 10,000
1001 3/27/2020 10,000
1002 1/23/2020 127,000 10/30/2019 117,000
1002 10/30/2019 117,000
1003 3/26/2020 291,000 3/24/2020 292,000
1003 3/24/2020 292,000 1/15/2020 290,000
1003 1/15/2020 290,000 12/30/2019 292,000
1003 12/30/2019 292,000 10/21/2019 189,000
1003 10/21/2019 189,000
1003 10/21/2019 189,000
1004 2/17/2020 1,261,500 2/14/2020 1,262,000
1004 2/14/2020 1,262,000 1/14/2020 1,552,000
1004 1/14/2020 1,552,000 12/14/2019 1,000,000
1004 1/14/2020 1,452,000 12/14/2019 1,000,000
1004 12/14/2019 1,000,000 11/4/2019 2,392,000
1004 11/4/2019 2,392,000 9/10/2020 900,050
1004 11/4/2019 2,792,000 9/10/2020 900,050
對np.where
、 .ne
和.eq
使用大量.shift
邏輯,這將完成工作。
df1 = df.copy()
#Main Logic
df1['Previous_Transaction_Date'] = np.where(((df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(1))) |
(df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(-1)))) &
(df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))),
df1['Transaction_Date'].shift(-1), '')
df1['Previous_Invoice_amount'] = np.where(((df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(1))) |
(df1['Transaction_Date'].ne(df1['Transaction_Date'].shift(-1)))) &
(df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))),
df1['Invoice'].shift(-1), '')
#Supplementary logic to get rest of cells.
df1['Previous_Transaction_Date'] = np.where(df1['Previous_Transaction_Date'] == df1['Transaction_Date'],
df1['Previous_Transaction_Date'].shift(-1), df1['Previous_Transaction_Date'])
df1['Previous_Invoice_amount'] = np.where(df1['Previous_Transaction_Date'] == '',
'', df1['Previous_Invoice_amount'])
df1['Previous_Invoice_amount'] = np.where(((df1['Transaction_ID'].eq(df1['Transaction_ID'].shift(-1))) &
(df1['Transaction_Date'].eq(df1['Transaction_Date'].shift(-1)))),
df1['Previous_Invoice_amount'].shift(-1),df1['Previous_Invoice_amount'])
df1
請記住,除了前 3 行之外,Transaction_Date 的最高計數是 2。 如果您的較大數據集中的 Transaction_Date 計數為 3 或更高,則您可能無法獲得預期的結果。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.