[英]Add summary columns to a pandas dataframe based on matching values in a different dataframe
[英]Matching rows in pandas based on values is different columns
輸入
假設我有一個具有以下結構的數據框:
transaction_code transaction_time amount reversed_transaction_code
0 TX051 2019-01-01 13:00:00 150
1 TX002 2019-01-01 14:00:00 250 TX004
2 TX113 2019-01-01 15:00:00 100
3 TX004 2019-01-01 16:00:00 80 TX002
4 TX805 2019-01-01 17:00:00 30
可以使用以下代碼來復制它:
eg = {'transaction_code': ['TX051','TX002','TX113','TX004','TX805'],
'transaction_time': pd.to_datetime(['1 Jan 2019 1pm','1 Jan 2019 2pm','1 Jan 2019 3pm','1 Jan 2019 4pm','1 Jan 2019 5pm']),
'amount': [150,250,100,80,30],
'reversed_transaction_code': ['','TX004','','TX002','']}
df = pd.DataFrame(eg)
在df
,每一行都對應於在我的商店進行的交易。 返回項目后,將添加一個新事務,並將其記錄在reversed_transaction_code
列中。
問題
例如,在TX004中返回了來自TX002的80美元項目。 如何匹配這些交易,記錄時間和退貨金額,然后刪除反向交易的ROWS?
預期產量
新列應如下所示:
reversed_amount reversed_transaction_time
0 NaN NaT
1 80 2019-01-01 16:00:00
2 NaN NaT
4 NaN NaT
可以使用以下代碼來復制它:
da = df[df.index!=3]
da['reversed_amount'] = [None, 80, None, None]
da['reversed_transaction_time'] = pd.to_datetime([None, '1 Jan 2019 4pm', None, None])
我已經修改了您的原始數據,使其變得更加復雜。
eg = {'transaction_code': ['TX051','TX002','TX113','TX004','TX805'],
'transaction_time': pd.to_datetime(['1 Jan 2019 1pm','1 Jan 2019 2pm','1 Jan 2019 3pm','1 Jan 2019 4pm','1 Jan 2019 5pm']),
'amount': [150,250,100,80,30],
'reversed_transaction_code': ['','TX004','TX805','TX002','TX113']}
df = pd.DataFrame(eg)
df
+---+--------+---------------------------+------------------+---------------------+
| | amount | reversed_transaction_code | transaction_code | transaction_time |
+---+--------+---------------------------+------------------+---------------------+
| 0 | 150 | | TX051 | 2019-01-01 13:00:00 |
| 1 | 250 | TX004 | TX002 | 2019-01-01 14:00:00 |
| 2 | 100 | TX805 | TX113 | 2019-01-01 15:00:00 |
| 3 | 80 | TX002 | TX004 | 2019-01-01 16:00:00 |
| 4 | 30 | TX113 | TX805 | 2019-01-01 17:00:00 |
+---+--------+---------------------------+------------------+---------------------+
# Fetching the index where there's an entry on reversed_transaction_code
idx_ = df[df.reversed_transaction_code.str.startswith('T')].index
idx_
# Int64Index([1, 2], dtype='int64')
# Creating blank columns
df['reversed_amount'] = np.NaN
df['reversed_transaction_time'] = None
# Reverse transaction index
idxR_ = df.iloc[idx_, :][df.loc[idx_, 'reversed_transaction_code'].str.split('TX', expand=True).iloc[:, 1] < df.loc[idx_, 'transaction_code'].str.split('TX', expand=True).iloc[:, 1]].index
idxR_
# Int64Index([3, 4], dtype='int64')
# Fetching valid reversed transaction code from reversed_transaction_code column
val = df.loc[idxR_, 'reversed_transaction_code']
val
# 3 TX002
# 4 TX113
# Name: reversed_transaction_code, dtype: object
# Fetching transaction code from transaction_code column
code_idx_ = df[np.where(df.transaction_code.isin(val), True , False)].index
code_idx_
# Int64Index([1, 2], dtype='int64')
# checking where does transaction code lies and adding corresponding results to new columns
# Below code can be made shorter or more efficient (say using merge/join, etc)
for i in range(len(val)):
for j in range(len(code_idx_)):
if val.iloc[i] == df.loc[code_idx_[j], 'transaction_code']:
df.loc[code_idx_[j], 'reversed_transaction_time'] = df.loc[val.index[i], 'transaction_time']
df.loc[code_idx_[j], 'reversed_amount'] = df.loc[val.index[i], 'amount']
# Removing the rows with reversed transactions
df.drop(val.index, inplace=True)
df
+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+
| | amount | reversed_transaction_code | transaction_code | transaction_time | reversed_amount | reversed_transaction_time |
+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+
| 0 | 150 | | TX051 | 2019-01-01 13:00:00 | NaN | None |
| 1 | 250 | TX004 | TX002 | 2019-01-01 14:00:00 | 80.0 | 2019-01-01 16:00:00 |
| 2 | 100 | TX805 | TX113 | 2019-01-01 15:00:00 | 30.0 | 2019-01-01 17:00:00 |
+---+--------+---------------------------+------------------+---------------------+-----------------+---------------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.