[英]Use lambda with pandas to calculate a new column conditional on existing column
I need to create a new column in a pandas DataFrame which is calculated as the ratio of 2 existing columns in the DataFrame. However, the denominator in the ratio calculation will change based on the value of a string which is found in another column in the DataFrame.我需要在 pandas DataFrame 中创建一个新列,它计算为 DataFrame 中 2 个现有列的比率。但是,比率计算中的分母将根据在另一列中找到的字符串的值而变化DataFrame。
Example.例子。 Sample dataset:示例数据集:
import pandas as pd
df = pd.DataFrame(data={'hand' : ['left','left','both','both'],
'exp_force' : [25,28,82,84],
'left_max' : [38,38,38,38],
'both_max' : [90,90,90,90]})
I need to create a new DataFrame column df['ratio']
based on the condition of df['hand']
.我需要根据df['hand']
的条件创建一个新的 DataFrame 列df['ratio']
] 。
If df['hand']=='left'
then df['ratio'] = df['exp_force'] / df['left_max']
如果df['hand']=='left'
那么df['ratio'] = df['exp_force'] / df['left_max']
If df['hand']=='both'
then df['ratio'] = df['exp_force'] / df['both_max']
如果df['hand']=='both'
那么df['ratio'] = df['exp_force'] / df['both_max']
You can use np.where()
:您可以使用np.where()
:
import pandas as pd
df = pd.DataFrame(data={'hand' : ['left','left','both','both'],
'exp_force' : [25,28,82,84],
'left_max' : [38,38,38,38],
'both_max' : [90,90,90,90]})
df['ratio'] = np.where((df['hand']=='left'), df['exp_force'] / df['left_max'], df['exp_force'] / df['both_max'])
df
Out[42]:
hand exp_force left_max both_max ratio
0 left 25 38 90 0.657895
1 left 28 38 90 0.736842
2 both 82 38 90 0.911111
3 both 84 38 90 0.933333
Alternatively, in a real-life scenario, if you have lots of conditions and results, then you can use np.select()
, so that you don't have to keep repeating your np.where()
statement as I have done a lot in my older code.或者,在现实生活中,如果你有很多条件和结果,那么你可以使用np.select()
,这样你就不必像我所做的那样不断重复你的np.where()
语句很多在我的旧代码中。 It's better to use np.select
in these situations:在这些情况下最好使用np.select
:
import pandas as pd
df = pd.DataFrame(data={'hand' : ['left','left','both','both'],
'exp_force' : [25,28,82,84],
'left_max' : [38,38,38,38],
'both_max' : [90,90,90,90]})
c1 = (df['hand']=='left')
c2 = (df['hand']=='both')
r1 = df['exp_force'] / df['left_max']
r2 = df['exp_force'] / df['both_max']
conditions = [c1,c2]
results = [r1,r2]
df['ratio'] = np.select(conditions,results)
df
Out[430]:
hand exp_force left_max both_max ratio
0 left 25 38 90 0.657895
1 left 28 38 90 0.736842
2 both 82 38 90 0.911111
3 both 84 38 90 0.933333
Enumerate枚举
for i,e in enumerate(df['hand']):
if e == 'left':
df.at[i,'ratio'] = df.at[i,'exp_force'] / df.at[i,'left_max']
if e == 'both':
df.at[i,'ratio'] = df.at[i,'exp_force'] / df.at[i,'both_max']
df
Output: Output:
hand exp_force left_max both_max ratio
0 left 25 38 90 0.657895
1 left 28 38 90 0.736842
2 both 82 38 90 0.911111
3 both 84 38 90 0.933333
You can use the apply()
method of your dataframe:您可以使用 dataframe 的apply()
方法:
df['ratio'] = df.apply(
lambda x: x['exp_force'] / x['left_max'] if x['hand']=='left' else x['exp_force'] / x['both_max'],
axis=1
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.