简体   繁体   English

基于另一列对熊猫数据框应用不同的功能

[英]Apply different function to pandas dataframe based on another column

I have a dataframe, with two columns.我有一个数据框,有两列。 I am trying to create a third column based on the numbers inside the dataframe.我正在尝试根据数据框中的数字创建第三列。 If the number in column b is positive, I want column C to equal column a * b如果 b 列中的数字为正数,我希望 C 列等于 a * b 列

If the number in column b is negative, I want column c to equal column a * b * 0.95.如果 b 列中的数字为负数,我希望 c 列等于 a * b * 0.95 列。

an example of what I am trying to get at:我试图得到的一个例子:

col_a col_b col_c
100.    1.   100
100.    -1.  -95
100.    10.  1000
100.    -10.  -950


I have currently tried this:


def profit_calculation(value):

    if value<0:
        return(a * b * 0.95)
    else:
        return(a * b) 

df['col_c']=df['col_b'].apply(profit_calculation)

But this seems to be incorrect.但这似乎是不正确的。

df = pd.DataFrame({"a": [100, 100, 100, 100],
                   "b": [1, -1, 10, -10]})

df.a * df.b * (1 - 0.05 * (df.b < 0))

# out:
0     100.0
1     -95.0
2    1000.0
3    -950.0

Explanation: When multiplied with the float 0.05 the boolean Series (df.b < 0) is cast to integers (True=1, False=0) and therefore we subtract 0.05 from 1 in all instances of negative b, hence obtaining 0.95 when we need it.说明:当乘以浮点数 0.05 时,布尔系列(df.b < 0)被转换为整数 (True=1, False=0),因此我们在所有负 b 的情况下从 1 中减去 0.05,因此当我们得到 0.95 时需要它。

You can use np.where and check whether column b is greater than 0 using gt :您可以使用np.where并使用gt检查列 b 是否大于 0:

import numpy as np
import pandas as pd

a_b =  df.col_a.mul(df.col_b)
df['col_c'] = np.where(df['col_b'].gt(0), a_b, a_b.mul(0.95))

which prints:打印:

>>> df

   col_a  col_b   col_c
0    100      1   100.0
1    100     -1   -95.0
2    100     10  1000.0
3    100    -10  -950.0

You can use a lambda function to create new data based on data in the dataframe(df) See explanation of lambda functions here => https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html It takes in parameter a row in the dataframe and return the update made So for each row we call profit_calculation and we give it the data corresponding to the row in parameter.您可以使用 lambda 函数根据数据帧(df)中的数据创建新数据,请在此处查看 lambda 函数的说明 => https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame .apply.html它接受数据帧中的一行参数并返回所做的更新所以对于每一行我们调用利润计算,我们给它提供与参数中的行对应的数据。 So you have to replace by所以你必须替换为

def profit_calculation(value):
  return value["col_b"]*value["col_a"] if value["col_b"] > 0 else value["col_b"]*value["col_a"]*.95  

df['col_c']=df.apply(lambda value: profit_calculation(value), axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas数据帧根据其他列值将函数应用于列字符串 - Pandas dataframe apply function to column strings based on other column value 将函数应用于具有不同参数的行pandas数据帧,具体取决于另一列的值 - apply function to rows pandas dataframe with different parameters depending on values of another column 根据特定列中的值将函数应用于熊猫中的数据框行 - Apply function to dataframe row in pandas based on value in specific column 熊猫数据框应用功能基于选定的行创建新列 - Pandas dataframe apply function to create new column based on selected row 如何正确地将基于向量的 function 应用于 pandas dataframe 列? - how to properly apply a vector based function to a pandas dataframe column? 将函数应用于pandas数据框的列 - Apply a function to column of pandas dataframe Pandas数据帧生成具有不同行信息的列,但没有应用函数 - Pandas dataframe generate column with different row info, but no apply function 根据熊猫中另一列的值在groupby之后应用lambda函数 - apply lambda function after groupby based on values of another column in pandas 如何基于另一列的值应用熊猫函数? - How to apply a pandas function based on the value of another column? 熊猫:根据年份将不同的过滤器应用于数据框 - Pandas: Apply different filter to dataframe based on the year
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM