简体   繁体   English

使用 Python 在数据集中创建计算字段

[英]Create calculated field within dataset using Python

I have a dataset, df, where I would like to create columns that display the output of a subtraction calculation:我有一个数据集 df,我想在其中创建显示减法计算输出的列:

Data数据

count   power   id  p_q122  p_q222      c_q122  c_q222  
100     1000    aa  200     300         10      20      
100     2000    bb  400     500         5       10      

Desired想要的

cnt pwr    id   p_q122  avail1  p_q222  avail2  c_q122  count1  c_q222  count2  
100 1000   aa   200     800     300     700     10      90      20      80
100 2000   bb   400     1600    500     1500    5       95      10      90

Doing正在做

   a =    df['avail1']  = + df['pwr'] - df['p_q122']
   
   b =    df['avail2']  = + df['pwr'] - df['p_q222']

I am looking for a more elegant way that provides the desire output.我正在寻找一种更优雅的方式来提供欲望输出。 Any suggestion is appreciated.任何建议表示赞赏。

Try:尝试:

df['avail1'] = df['power'].sub(df['p_q122'])
df['avail2'] = df['power'].sub(df['p_q222'])

We can perform 2D subtraction with numpy:我们可以用 numpy 执行 2D 减法:

pd.DataFrame(
    df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
).rename(columns=lambda i: f'avail{i + 1}')

   avail1  avail2
0     800     700
1    1600    1500

Benefit here is that, no matter how many p_ columns there are, all will be subtracted from the power column.这里的好处是,无论有多少p_列,都将从power列中减去。


We can concat all of the computations with df like:我们可以concat所有像DF的计算的:

df = pd.concat([
    df,
    # power calculations
    pd.DataFrame(
        df['power'].to_numpy()[:, None] - df.filter(like='p_').to_numpy()
    ).rename(columns=lambda i: f'avail{i + 1}'),
    # Count calculations
    pd.DataFrame(
        df['count'].to_numpy()[:, None] - df.filter(like='c_').to_numpy()
    ).rename(columns=lambda i: f'count{i + 1}'),
], axis=1)

which gives df :这给了df

   count  power  id  p_q122  p_q222  ...  c_q222  avail1  avail2  count1  count2
0    100   1000  aa     200     300  ...      20     800     700      90      80
1    100   2000  bb     400     500  ...      10    1600    1500      95      90

[2 rows x 11 columns]

If we have many column groups to do, we can build the list of DataFrames programmatically as well:如果我们有很多列组要做,我们也可以以编程方式构建 DataFrame 列表:

df = pd.concat([df, *(
    pd.DataFrame(
        df[col].to_numpy()[:, None] - df.filter(like=filter_prefix).to_numpy()
    ).rename(columns=lambda i: f'{new_prefix}{i + 1}')
    for col, filter_prefix, new_prefix in [
        ('power', 'p_', 'avail'),
        ('count', 'c_', 'count')
    ]
)], axis=1)

Setup and imports:设置和导入:

import pandas as pd

df = pd.DataFrame({
    'count': [100, 100], 'power': [1000, 2000], 'id': ['aa', 'bb'],
    'p_q122': [200, 400], 'p_q222': [300, 500], 'c_q122': [10, 5],
    'c_q222': [20, 10]
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM