[英]How to select and combine different columns based on specific condition in pandas python?
df = pd.DataFrame(data={
"id": ['a', 'a', 'b', 'b', 'a', 'c', 'c', 'b'],
"transaction_amount": [110, 0, 10, 30, 40.4, 62.2, 20, 20],
"principal_amount": [100, 0, 0, 0, 40, 60, 0, 0],
"interest_amount": [10, 0, 10, 0, 0.4, 0.6, 10, 0],
"overpayment_amount": [0, 0, 0, 0, 0, 1.6, 10, 20],
})
我有上面的dataframe。 我想要一列amount
并按如下方式填充它:
principal_amount
、 interest_amount
和overpayment_amount
的值不为 0,则创建一行,并将principal
、 interest
和overpayment
分别分配给新列transaction_type
。transaction_amount
获取值。output 应如下所示:
amount transaction_type id
3 30.0 NaN b
0 100.0 principal a
4 40.0 principal a
5 60.0 principal c
0 10.0 interest a
2 10.0 interest b
4 0.4 interest a
5 0.6 interest c
6 10.0 interest c
5 1.6 overpayment c
6 10.0 overpayment c
7 20.0 overpayment b
我目前的解决方案:
import pandas as pd
df = pd.DataFrame(data={
"id": ['a', 'a', 'b', 'b', 'a', 'c', 'c', 'b'],
"transaction_amount": [110, 0, 10, 30, 40.4, 62.2, 20, 20],
"principal_amount": [100, 0, 0, 0, 40, 60, 0, 0],
"interest_amount": [10, 0, 10, 0, 0.4, 0.6, 10, 0],
"overpayment_amount": [0, 0, 0, 0, 0, 1.6, 10, 20],
})
columns = ["amount", "transaction_type"]
output_df = pd.DataFrame(columns=columns)
# Add transaction amount
condition = (df["principal_amount"] == 0) & (df["interest_amount"] == 0) & (df["overpayment_amount"] == 0) & (df["transaction_amount"] != 0)
subdf = df.loc[condition, ['id', 'transaction_amount']]
subdf = subdf.rename(columns={'transaction_amount': "amount"})
output_df = output_df.append(subdf)
# Add principal and interest
for field in ["principal_amount", "interest_amount", "overpayment_amount"]:
subdf = df.loc[df[field] != 0, ['id', field]]
subdf["transaction_type"] = field.split("_")[0]
subdf = subdf.rename(columns={field: "amount"})
output_df = output_df.append(subdf)
是否有任何 pandas 功能可以帮助我更简洁高效地执行此操作?
一种方法可以如下。
import pandas as pd
import numpy as np
out = df.reset_index(drop=False).melt(
id_vars=['index'],
value_vars=list(df.columns)[1:],
var_name='transaction_type',
value_name='amount'
).set_index('index')
out = out[out['amount'].gt(0)]
out['v'] = out.index.value_counts()
out = out[out.v.eq(1) |
out.transaction_type.ne('transaction_amount')].drop('v', axis=1)
out['transaction_type'] = out['transaction_type']\
.str.replace('_amount','').replace({'transaction':np.nan})
out = out.iloc[:,::-1]
out.index.name=None
out['id'] = df['id']
print(out)
amount transaction_type id
3 30.0 NaN b
0 100.0 principal a
4 40.0 principal a
5 60.0 principal c
0 10.0 interest a
2 10.0 interest b
4 0.4 interest a
5 0.6 interest c
6 10.0 interest c
5 1.6 overpayment c
6 10.0 overpayment c
7 20.0 overpayment b
解释方法:
df.melt
在两个单独的列中获取所有列名(从第二列开始)和数量,并确保还保留原始索引值(首先重置索引,然后再次将其设置为“索引”) .amount
上使用Series.gt
只保留 amount > 0 的行。Series.value_counts
。 每个值计数为1
的索引值将仅具有与transaction_amount
关联的值。out['v'].eq(1)
或transaction_type
不是 'transaction_amount' 的行。 之后,我们可以再次删除临时列。transaction_type
列中的“_amount”,并将“transaction”替换为NaN
值。 最后的整容程序是按请求的顺序获取列,删除索引名称,并将id
添加为额外的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.