[英]Apply function to dataframe column element based on value in other column for same row?
I have a dataframe: 我有一个数据帧:
df = pd.DataFrame(
{'number': ['10', '20' , '30', '40'], 'condition': ['A', 'B', 'A', 'B']})
df =
number condition
0 10 A
1 20 B
2 30 A
3 40 B
I want to apply a function to each element within the number column, as follows: 我想将一个函数应用于数字列中的每个元素,如下所示:
df['number'] = df['number'].apply(lambda x: func(x))
BUT, even though I apply the function to the number column, I want the function to also make reference to the condition
column ie in pseudo code: 但是,即使我将函数应用于数字列,我希望函数也引用
condition
列,即伪代码:
func(n):
#if the value in corresponding condition column is equal to some set of values:
# do some stuff to n using the value in condition
# return new value for n
For a single number, and an example function I would write: 对于单个数字,我会写一个示例函数:
number = 10
condition = A
def func(num, condition):
if condition == A:
return num*3
if condition == B:
return num*4
func(number,condition) = 15
How can I incorporate the same function to my apply
statement written above? 如何将相同的功能合并到我上面写的
apply
语句中? ie making reference to the value within the condition column, while acting on the value within the number column? 即引用条件列中的值,同时对数字列中的值进行操作?
Note: I have read through the docs on np.where()
, pandas.loc()
and pandas.index()
but I just cannot figure out how to put it into practice. 注意:我已经阅读了关于
np.where()
, pandas.loc()
和pandas.index()
的文档,但我无法弄清楚如何将其付诸实践。
I am struggling with the syntax for referencing the other column from within the function, as I need access to both the values in the number
and condition
column. 我正在努力从函数中引用其他列的语法,因为我需要访问
number
和condition
列中的值。
As such, my expected output is: 因此,我的预期输出是:
df =
number condition
0 30 A
1 80 B
2 90 A
3 160 B
UPDATE: The above was far too vague. 更新:以上内容太模糊了。 Please see the following:
请参阅以下内容:
df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})
Entries Conflict
0 "man" "Yes"
1 "guy" "Yes"
2 "boy" "Yes"
3 "girl" "No
def funcA(d):
d = d + 'aaa'
return d
def funcB(d):
d = d + 'bbb'
return d
df1['Entries'] = np.where(df1['Conflict'] == 'Yes', funcA, funcB)
Output:
{'Conflict': ['Yes', 'Yes', 'Yes', 'Np'],
'Entries': array(<function funcB at 0x7f4acbc5a500>, dtype=object)}
How can I apply the above np.where statement to take a pandas series as mentioned in the comments, and produce the desired output shown below: 如何应用上面的np.where语句来获取注释中提到的pandas系列,并生成如下所示的所需输出:
Desired Output: 期望的输出:
Entries Conflict
0 "manaaa" "Yes"
1 "guyaaa" "Yes"
2 "boyaaa" "Yes"
3 "girlbbb" "No
As the question was in regard to the apply function to a dataframe column for the same row, it seems more accurate to use the pandas apply
funtion in combination with lambda
: 由于问题是关于同一行的数据帧列的apply函数,使用pandas
apply
funtion与lambda
结合似乎更准确:
import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})
def func(number,condition):
multiplier = {'A': 2, 'B': 4}
return number * multiplier[condition]
df['new_number'] = df.apply(lambda x: func(x['number'], x['condition']), axis=1)
In this example, lambda
takes the columns 'number' and 'condition' of the dataframe df and applies these columns of the same row to the function func with apply
. 在此示例中,
lambda
获取数据帧df的列'number'和'condition' ,并使用apply
将同一行的这些列应用于函数func 。
This returns the following result: 这将返回以下结果:
df
Out[10]:
condition number new_number
0 A 10 20
1 B 20 80
2 A 30 60
3 B 40 160
For the UPDATE case its also possible to use the pandas apply
function: 对于UPDATE情况,它也可以使用pandas
apply
函数:
df1 = pd.DataFrame({'Entries':['man','guy','boy','girl'],'Conflict':['Yes','Yes','Yes','No']})
def funcA(d):
d = d + 'aaa'
return d
def funcB(d):
d = d + 'bbb'
return d
df1['Entries'] = df1.apply(lambda x: funcA(x['Entries']) if x['Conflict'] == 'Yes' else funcB(x['Entries']), axis=1)
In this example, lambda
takes the columns 'Entries' and 'Conflict' of the dataframe df and applies these columns either to funcA or funcB of the same row with apply
. 在此示例中,
lambda
获取数据帧df的列'Entries'和'Conflict' ,并将这些列应用于使用apply
的同一行的funcA或funcB 。 The condition if funcA or funcB will be applied is done with an if-else
clause in lambda. 如果将应用funcA或funcB,则使用lambda中的
if-else
子句完成该条件。
This returns the following result: 这将返回以下结果:
df
Out[12]:
Conflict Entries
0 Yes manaaa
1 Yes guyaaa
2 Yes boyaaa
3 No girlbbb
I don't know about using pandas.DataFrame.apply
, but you could define a certain condition:multiplier
key-value mapping (seen in multiplier
below), and pass that into your function. 我不知道使用
pandas.DataFrame.apply
,但你可以定义一个特定condition:multiplier
键值映射(见下面的multiplier
),并将其传递给你的函数。 Then you can use a list comprehension to calculate the new number
output based on those conditions: 然后,您可以使用列表推导来根据这些条件计算新的
number
输出:
import pandas as pd
df = pd.DataFrame({'number': [10, 20 , 30, 40], 'condition': ['A', 'B', 'A', 'B']})
multiplier = {'A': 2, 'B': 4}
def func(num, condition, multiplier):
return num * multiplier[condition]
df['new_number'] = [func(df.loc[idx, 'number'], df.loc[idx, 'condition'],
multiplier) for idx in range(len(df))]
Here's the result: 这是结果:
df
Out[24]:
condition number new_number
0 A 10 30
1 B 20 80
2 A 30 90
3 B 40 160
There is likely a vectorized, pure-pandas solution that's more "ideal." 可能有一种矢量化的纯熊猫解决方案更“理想”。 But this works, too, in a pinch.
但这也适用于紧要关头。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.