[英]Pandas modify column values based on another DataFrame
I am trying to add values to a column based on a couple of conditions.我正在尝试根据几个条件向列添加值。 Here is the code example:
下面是代码示例:
Import pandas as pd
df1 = pd.DataFrame({'Type': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C'], 'Val': [20, -10, 20, -10, 30, -20, 40, -30]})
df2 = pd.DataFrame({'Type': ['A', 'A', 'B', 'B', 'C', 'C'], 'Cat':['p', 'n', 'p', 'n','p', 'n'], 'Val': [30, -40, 20, -30, 10, -20]})
for index, _ in df1.iterrows():
if df1.loc[index,'Val'] >=0:
df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'p'), 'Val'])
else:
df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'n'), 'Val'])
For each value in the 'Val' column of df1, I want to add values from df2, based on the type and whether the original value was positive or negative.对于 df1 的“Val”列中的每个值,我想根据类型以及原始值是正值还是负值添加来自 df2 的值。
The expected output for this example would be alternate 50 and -50 in df1.此示例的预期输出将是 df1 中的交替 50 和 -50。 The above code does the job, but is too slow to be usable for a large data set.
上面的代码完成了这项工作,但速度太慢,无法用于大型数据集。 Is there a better way to do this?
有一个更好的方法吗?
import numpy as np
df1['sign'] = np.sign(df1.Val)
df2['sign'] = np.sign(df2.Val)
df = pd.merge(df1, df2, on=['Type', 'sign'], suffixes=('_df1', '_df2'))
df['Val'] = df.Val_df1 + df.Val_df2
df = df.drop(columns=['Val_df1', 'sign', 'Val_df2'])
df
Try adding a Cat
column to df1
merge
then sum
val
columns across axis 1 then drop
the extra columns:尝试将
Cat
列添加到df1
merge
然后对轴 1 上的val
列sum
然后drop
额外的列:
df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)
Type Val
0 A 50
1 A 50
2 A -50
3 A -50
4 B 50
5 B -50
6 C 50
7 C -50
Add new column with np.where
使用
np.where
添加新列
df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
Type Val Cat
0 A 20 p
1 A -10 n
2 A 20 p
3 A -10 n
4 B 30 p
5 B -20 n
6 C 40 p
7 C -30 n
merge
on Type
and Cat
merge
Type
和Cat
df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
Type Val_x Cat Val_y
0 A 20 p 30
1 A -10 n -40
2 A 20 p 30
3 A -10 n -40
4 B 30 p 20
5 B -20 n -30
6 C 40 p 10
7 C -30 n -20
df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
Type Val_x Cat Val_y Val
0 A 20 p 30 50
1 A -10 n -40 -50
2 A 20 p 30 50
3 A -10 n -40 -50
4 B 30 p 20 50
5 B -20 n -30 -50
6 C 40 p 10 50
7 C -30 n -20 -50
df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)
Type Val
0 A 50
1 A -50
2 A 50
3 A -50
4 B 50
5 B -50
6 C 50
7 C -50
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.