如何 select 并根据 pandas python 中的特定条件组合不同的列？

Question

df = pd.DataFrame(data={
    "id": ['a', 'a', 'b', 'b', 'a', 'c', 'c', 'b'],
    "transaction_amount": [110, 0, 10, 30, 40.4, 62.2, 20, 20],
    "principal_amount":   [100, 0, 0,  0,  40,   60,   0,  0],
    "interest_amount":    [10,  0, 10, 0,  0.4,  0.6,  10, 0],
    "overpayment_amount": [0,   0, 0,  0,  0,    1.6,  10, 20],
})

我有上面的dataframe。 我想要一列amount并按如下方式填充它：

如果每个principal_amount 、 interest_amount和overpayment_amount的值不为 0，则创建一行，并将principal 、 interest和overpayment分别分配给新列transaction_type 。
如果该行的其他三列值为 0，则从transaction_amount获取值。

output 应如下所示：

   amount transaction_type id
3    30.0              NaN  b
0   100.0        principal  a
4    40.0        principal  a
5    60.0        principal  c
0    10.0         interest  a
2    10.0         interest  b
4     0.4         interest  a
5     0.6         interest  c
6    10.0         interest  c
5     1.6      overpayment  c
6    10.0      overpayment  c
7    20.0      overpayment  b

我目前的解决方案：

import pandas as pd

df = pd.DataFrame(data={
    "id": ['a', 'a', 'b', 'b', 'a', 'c', 'c', 'b'],
    "transaction_amount": [110, 0, 10, 30, 40.4, 62.2, 20, 20],
    "principal_amount":   [100, 0, 0,  0,  40,   60,   0,  0],
    "interest_amount":    [10,  0, 10, 0,  0.4,  0.6,  10, 0],
    "overpayment_amount": [0,   0, 0,  0,  0,    1.6,  10, 20],
})

columns = ["amount", "transaction_type"]
output_df = pd.DataFrame(columns=columns)

# Add transaction amount
condition = (df["principal_amount"] == 0) & (df["interest_amount"] == 0) & (df["overpayment_amount"] == 0) & (df["transaction_amount"] != 0)
subdf = df.loc[condition, ['id', 'transaction_amount']]
subdf = subdf.rename(columns={'transaction_amount': "amount"})
output_df = output_df.append(subdf)

# Add principal and interest
for field in ["principal_amount", "interest_amount", "overpayment_amount"]:
    subdf = df.loc[df[field] != 0, ['id', field]]
    subdf["transaction_type"] = field.split("_")[0]
    subdf = subdf.rename(columns={field: "amount"})
    output_df = output_df.append(subdf)

是否有任何 pandas 功能可以帮助我更简洁高效地执行此操作？

Answer 1

一种方法可以如下。

import pandas as pd
import numpy as np

out = df.reset_index(drop=False).melt(
    id_vars=['index'], 
    value_vars=list(df.columns)[1:], 
    var_name='transaction_type', 
    value_name='amount'
    ).set_index('index')

out = out[out['amount'].gt(0)]
out['v'] = out.index.value_counts()

out = out[out.v.eq(1) | 
          out.transaction_type.ne('transaction_amount')].drop('v', axis=1)

out['transaction_type'] = out['transaction_type']\
    .str.replace('_amount','').replace({'transaction':np.nan})

out = out.iloc[:,::-1]
out.index.name=None
out['id'] = df['id']

print(out)

   amount transaction_type id
3    30.0              NaN  b
0   100.0        principal  a
4    40.0        principal  a
5    60.0        principal  c
0    10.0         interest  a
2    10.0         interest  b
4     0.4         interest  a
5     0.6         interest  c
6    10.0         interest  c
5     1.6      overpayment  c
6    10.0      overpayment  c
7    20.0      overpayment  b

解释方法：

我们使用df.melt在两个单独的列中获取所有列名（从第二列开始）和数量，并确保还保留原始索引值（首先重置索引，然后再次将其设置为“索引”） .
我们通过在amount上使用Series.gt只保留 amount > 0 的行。
我们创建一个临时列来存储应用于索引的Series.value_counts 。 每个值计数为1的索引值将仅具有与transaction_amount关联的值。
我们将此信息用于另一个过滤器：仅保留具有out['v'].eq(1)或transaction_type不是 'transaction_amount' 的行。 之后，我们可以再次删除临时列。
最后，我们去掉了transaction_type列中的“_amount”，并将“transaction”替换为NaN值。 最后的整容程序是按请求的顺序获取列，删除索引名称，并将id添加为额外的列。

如何 select 并根据 pandas python 中的特定条件组合不同的列？

问题描述

1 个解决方案

解决方案1
0 2022-09-08 10:52:06

如何 select 并根据 pandas python 中的特定条件组合不同的列？

问题描述

1 个解决方案

解决方案1 0 2022-09-08 10:52:06

解决方案1
0 2022-09-08 10:52:06