Pandas 基于另一个 DataFrame 修改列值

Question

I am trying to add values to a column based on a couple of conditions.我正在尝试根据几个条件向列添加值。 Here is the code example:下面是代码示例：

Import pandas as pd

df1 = pd.DataFrame({'Type': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C'], 'Val': [20, -10, 20, -10, 30, -20, 40, -30]})
df2 = pd.DataFrame({'Type': ['A', 'A', 'B', 'B', 'C', 'C'], 'Cat':['p', 'n', 'p', 'n','p', 'n'], 'Val': [30, -40, 20, -30, 10, -20]})

for index, _ in df1.iterrows():  
    
    if df1.loc[index,'Val'] >=0:
        df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'p'), 'Val'])
    else:
        df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'n'), 'Val'])

For each value in the 'Val' column of df1, I want to add values from df2, based on the type and whether the original value was positive or negative.对于 df1 的“Val”列中的每个值，我想根据类型以及原始值是正值还是负值添加来自 df2 的值。

The expected output for this example would be alternate 50 and -50 in df1.此示例的预期输出将是 df1 中的交替 50 和 -50。 The above code does the job, but is too slow to be usable for a large data set.上面的代码完成了这项工作，但速度太慢，无法用于大型数据集。 Is there a better way to do this?有一个更好的方法吗？

Answer 1

import numpy as np

df1['sign'] = np.sign(df1.Val)
df2['sign'] = np.sign(df2.Val)
df = pd.merge(df1, df2, on=['Type', 'sign'], suffixes=('_df1', '_df2'))
df['Val'] = df.Val_df1 + df.Val_df2
df = df.drop(columns=['Val_df1', 'sign', 'Val_df2'])
df

Answer 2

Try adding a Cat column to df1 merge then sum val columns across axis 1 then drop the extra columns:尝试将Cat列添加到df1 merge然后对轴 1 上的val列sum然后drop额外的列：

df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)

  Type  Val
0    A   50
1    A   50
2    A  -50
3    A  -50
4    B   50
5    B  -50
6    C   50
7    C  -50

Add new column with np.where使用np.where添加新列

df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')

  Type  Val Cat
0    A   20   p
1    A  -10   n
2    A   20   p
3    A  -10   n
4    B   30   p
5    B  -20   n
6    C   40   p
7    C  -30   n

merge on Type and Cat merge Type和Cat

df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')

  Type  Val_x Cat  Val_y
0    A     20   p     30
1    A    -10   n    -40
2    A     20   p     30
3    A    -10   n    -40
4    B     30   p     20
5    B    -20   n    -30
6    C     40   p     10
7    C    -30   n    -20

sum Val columns: sum Val列：

df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)

  Type  Val_x Cat  Val_y  Val
0    A     20   p     30   50
1    A    -10   n    -40  -50
2    A     20   p     30   50
3    A    -10   n    -40  -50
4    B     30   p     20   50
5    B    -20   n    -30  -50
6    C     40   p     10   50
7    C    -30   n    -20  -50

drop extra columns: drop额外的列：

df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)

  Type  Val
0    A   50
1    A  -50
2    A   50
3    A  -50
4    B   50
5    B  -50
6    C   50
7    C  -50

Pandas 基于另一个 DataFrame 修改列值

问题描述

2 个解决方案

解决方案1
1 2021-06-21 15:41:00

解决方案2
1 已采纳 2021-06-21 15:42:29

Pandas 基于另一个 DataFrame 修改列值

问题描述

2 个解决方案

解决方案1 1 2021-06-21 15:41:00

解决方案2 1 已采纳 2021-06-21 15:42:29

解决方案1
1 2021-06-21 15:41:00

解决方案2
1 已采纳 2021-06-21 15:42:29