简体   繁体   English

根据同一行的其他列中的值将函数应用于dataframe列元素?

[英]Apply function to dataframe column element based on value in other column for same row?

I have a dataframe: 我有一个数据帧:

df = pd.DataFrame(
    {'number': ['10', '20' , '30', '40'], 'condition': ['A', 'B', 'A', 'B']})

df = 
    number    condition
0    10         A
1    20         B
2    30         A
3    40         B

I want to apply a function to each element within the number column, as follows: 我想将一个函数应用于数字列中的每个元素,如下所示:

 df['number'] = df['number'].apply(lambda x: func(x))

BUT, even though I apply the function to the number column, I want the function to also make reference to the condition column ie in pseudo code: 但是,即使我将函数应用于数字列,我希望函数也引用condition列,即伪代码:

func(n):
    #if the value in corresponding condition column is equal to some set of values:
        # do some stuff to n using the value in condition
        # return new value for n

For a single number, and an example function I would write: 对于单个数字,我会写一个示例函数:

number = 10
condition = A
def func(num, condition):
    if condition == A:
        return num*3
    if condition == B:
        return num*4

func(number,condition) = 15

How can I incorporate the same function to my apply statement written above? 如何将相同的功能合并到我上面写的apply语句中? ie making reference to the value within the condition column, while acting on the value within the number column? 即引用条件列中的值,同时对数字列中的值进行操作?

Note: I have read through the docs on np.where() , pandas.loc() and pandas.index() but I just cannot figure out how to put it into practice. 注意:我已经阅读了关于np.where()pandas.loc()pandas.index()的文档,但我无法弄清楚如何将其付诸实践。

I am struggling with the syntax for referencing the other column from within the function, as I need access to both the values in the number and condition column. 我正在努力从函数中引用其他列的语法,因为我需要访问numbercondition列中的值。

As such, my expected output is: 因此,我的预期输出是:

df = 
    number    condition
0    30         A
1    80         B
2    90         A
3    160         B

UPDATE: The above was far too vague. 更新:以上内容太模糊了。 Please see the following: 请参阅以下内容:

df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})


    Entries    Conflict
0    "man"    "Yes"
1    "guy"    "Yes"
2    "boy"    "Yes"
3    "girl"   "No

def funcA(d):
    d = d + 'aaa'
    return d
def funcB(d):
    d = d + 'bbb'
    return d

df1['Entries'] = np.where(df1['Conflict'] == 'Yes', funcA, funcB)

Output:
{'Conflict': ['Yes', 'Yes', 'Yes', 'Np'],
 'Entries': array(<function funcB at 0x7f4acbc5a500>, dtype=object)}

How can I apply the above np.where statement to take a pandas series as mentioned in the comments, and produce the desired output shown below: 如何应用上面的np.where语句来获取注释中提到的pandas系列,并生成如下所示的所需输出:

Desired Output: 期望的输出:

    Entries    Conflict
0    "manaaa"    "Yes"
1    "guyaaa"    "Yes"
2    "boyaaa"    "Yes"
3    "girlbbb"   "No

As the question was in regard to the apply function to a dataframe column for the same row, it seems more accurate to use the pandas apply funtion in combination with lambda : 由于问题是关于同一行的数据帧列的apply函数,使用pandas apply funtion与lambda结合似乎更准确:

import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})

def func(number,condition):
    multiplier = {'A': 2, 'B': 4}
    return number * multiplier[condition]

df['new_number'] = df.apply(lambda x: func(x['number'], x['condition']), axis=1)

In this example, lambda takes the columns 'number' and 'condition' of the dataframe df and applies these columns of the same row to the function func with apply . 在此示例中, lambda获取数据帧df的列'number''condition' ,并使用apply将同一行的这些列应用于函数func

This returns the following result: 这将返回以下结果:

df
Out[10]: 
 condition  number  new_number
0   A   10  20
1   B   20  80
2   A   30  60
3   B   40  160

For the UPDATE case its also possible to use the pandas apply function: 对于UPDATE情况,它也可以使用pandas apply函数:

df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})

def funcA(d):
    d = d + 'aaa'
    return d
def funcB(d):
    d = d + 'bbb'
    return d

df1['Entries'] = df1.apply(lambda x: funcA(x['Entries']) if x['Conflict'] == 'Yes' else funcB(x['Entries']), axis=1)

In this example, lambda takes the columns 'Entries' and 'Conflict' of the dataframe df and applies these columns either to funcA or funcB of the same row with apply . 在此示例中, lambda获取数据帧df的列'Entries''Conflict' ,并将这些列应用于使用apply的同一行的funcAfuncB The condition if funcA or funcB will be applied is done with an if-else clause in lambda. 如果将应用funcAfuncB,则使用lambda中的if-else子句完成该条件。

This returns the following result: 这将返回以下结果:

df
Out[12]:


    Conflict    Entries
0   Yes     manaaa
1   Yes     guyaaa
2   Yes     boyaaa
3   No  girlbbb

I don't know about using pandas.DataFrame.apply , but you could define a certain condition:multiplier key-value mapping (seen in multiplier below), and pass that into your function. 我不知道使用pandas.DataFrame.apply ,但你可以定义一个特定condition:multiplier键值映射(见下面的multiplier ),并将其传递给你的函数。 Then you can use a list comprehension to calculate the new number output based on those conditions: 然后,您可以使用列表推导来根据这些条件计算新的number输出:

import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})

multiplier = {'A': 2, 'B': 4}

def func(num, condition, multiplier):
    return num * multiplier[condition]

df['new_number'] = [func(df.loc[idx, 'number'], df.loc[idx, 'condition'], 
                     multiplier) for idx in range(len(df))]

Here's the result: 这是结果:

df
Out[24]: 
  condition  number  new_number
0         A      10          30
1         B      20          80
2         A      30          90
3         B      40         160

There is likely a vectorized, pure-pandas solution that's more "ideal." 可能有一种矢量化的纯熊猫解决方案更“理想”。 But this works, too, in a pinch. 但这也适用于紧要关头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM