使用 Python 在数据集中创建计算字段

Question

I have a dataset, df, where I would like to create columns that display the output of a subtraction calculation:我有一个数据集 df，我想在其中创建显示减法计算输出的列：

Data数据

count   power   id  p_q122  p_q222      c_q122  c_q222  
100     1000    aa  200     300         10      20      
100     2000    bb  400     500         5       10

Desired想要的

cnt pwr    id   p_q122  avail1  p_q222  avail2  c_q122  count1  c_q222  count2  
100 1000   aa   200     800     300     700     10      90      20      80
100 2000   bb   400     1600    500     1500    5       95      10      90

Doing正在做

   a =    df['avail1']  = + df['pwr'] - df['p_q122']
   
   b =    df['avail2']  = + df['pwr'] - df['p_q222']

I am looking for a more elegant way that provides the desire output.我正在寻找一种更优雅的方式来提供欲望输出。 Any suggestion is appreciated.任何建议表示赞赏。

Answer 1

Try:尝试：

df['avail1'] = df['power'].sub(df['p_q122'])
df['avail2'] = df['power'].sub(df['p_q222'])

Answer 2

We can perform 2D subtraction with numpy:我们可以用 numpy 执行 2D 减法：

pd.DataFrame(
    df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
).rename(columns=lambda i: f'avail{i + 1}')

   avail1  avail2
0     800     700
1    1600    1500

Benefit here is that, no matter how many p_ columns there are, all will be subtracted from the power column.这里的好处是，无论有多少p_列，都将从power列中减去。

We can concat all of the computations with df like:我们可以concat所有像DF的计算的：

df = pd.concat([
    df,
    # power calculations
    pd.DataFrame(
        df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
    ).rename(columns=lambda i: f'avail{i + 1}'),
    # Count calculations
    pd.DataFrame(
        df['count'].to_numpy()[:, None] - df.filter(like='c_').to_numpy()
    ).rename(columns=lambda i: f'count{i + 1}'),
], axis=1)

which gives df :这给了df ：

   count  power  id  p_q122  p_q222  ...  c_q222  avail1  avail2  count1  count2
0    100   1000  aa     200     300  ...      20     800     700      90      80
1    100   2000  bb     400     500  ...      10    1600    1500      95      90

[2 rows x 11 columns]

If we have many column groups to do, we can build the list of DataFrames programmatically as well:如果我们有很多列组要做，我们也可以以编程方式构建 DataFrame 列表：

df = pd.concat([df, *(
    pd.DataFrame(
        df[col].to_numpy()[:, None] - df.filter(like=filter_prefix).to_numpy()
    ).rename(columns=lambda i: f'{new_prefix}{i + 1}')
    for col, filter_prefix, new_prefix in [
        ('power', 'p_', 'avail'),
        ('count', 'c_', 'count')
    ]
)], axis=1)

Setup and imports:设置和导入：

import pandas as pd

df = pd.DataFrame({
    'count': [100, 100], 'power': [1000, 2000], 'id': ['aa', 'bb'],
    'p_q122': [200, 400], 'p_q222': [300, 500], 'c_q122': [10, 5],
    'c_q222': [20, 10]
})

使用 Python 在数据集中创建计算字段

问题描述

2 个解决方案

解决方案1
1 2021-10-13 04:59:46

解决方案2
1 已采纳 2021-10-13 05:36:28

使用 Python 在数据集中创建计算字段

问题描述

2 个解决方案

解决方案1 1 2021-10-13 04:59:46

解决方案2 1 已采纳 2021-10-13 05:36:28

解决方案1
1 2021-10-13 04:59:46

解决方案2
1 已采纳 2021-10-13 05:36:28