简体   繁体   English

Pandas 基于另一个 DataFrame 修改列值

[英]Pandas modify column values based on another DataFrame

I am trying to add values to a column based on a couple of conditions.我正在尝试根据几个条件向列添加值。 Here is the code example:下面是代码示例:

Import pandas as pd

df1 = pd.DataFrame({'Type': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C'], 'Val': [20, -10, 20, -10, 30, -20, 40, -30]})
df2 = pd.DataFrame({'Type': ['A', 'A', 'B', 'B', 'C', 'C'], 'Cat':['p', 'n', 'p', 'n','p', 'n'], 'Val': [30, -40, 20, -30, 10, -20]})

for index, _ in df1.iterrows():  
    
    if df1.loc[index,'Val'] >=0:
        df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'p'), 'Val'])
    else:
        df1.loc[index,'Val'] = df1.loc[index,'Val'] + float(df2.loc[(df2['Type'] == df1.loc[index,'Type']) & (df2['Cat'] == 'n'), 'Val'])

For each value in the 'Val' column of df1, I want to add values from df2, based on the type and whether the original value was positive or negative.对于 df1 的“Val”列中的每个值,我想根据类型以及原始值是正值还是负值添加来自 df2 的值。

The expected output for this example would be alternate 50 and -50 in df1.此示例的预期输出将是 df1 中的交替 50 和 -50。 The above code does the job, but is too slow to be usable for a large data set.上面的代码完成了这项工作,但速度太慢,无法用于大型数据集。 Is there a better way to do this?有一个更好的方法吗?

import numpy as np

df1['sign'] = np.sign(df1.Val)
df2['sign'] = np.sign(df2.Val)
df = pd.merge(df1, df2, on=['Type', 'sign'], suffixes=('_df1', '_df2'))
df['Val'] = df.Val_df1 + df.Val_df2
df = df.drop(columns=['Val_df1', 'sign', 'Val_df2'])
df

Try adding a Cat column to df1 merge then sum val columns across axis 1 then drop the extra columns:尝试将Cat列添加到df1 merge然后对轴 1 上的valsum然后drop额外的列:

df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)
  Type  Val
0    A   50
1    A   50
2    A  -50
3    A  -50
4    B   50
5    B  -50
6    C   50
7    C  -50

Add new column with np.where使用np.where添加新列

df1['Cat'] = np.where(df1['Val'].lt(0), 'n', 'p')
  Type  Val Cat
0    A   20   p
1    A  -10   n
2    A   20   p
3    A  -10   n
4    B   30   p
5    B  -20   n
6    C   40   p
7    C  -30   n

merge on Type and Cat merge TypeCat

df1 = df1.merge(df2, on=['Type', 'Cat'], how='left')
  Type  Val_x Cat  Val_y
0    A     20   p     30
1    A    -10   n    -40
2    A     20   p     30
3    A    -10   n    -40
4    B     30   p     20
5    B    -20   n    -30
6    C     40   p     10
7    C    -30   n    -20

sum Val columns: sum Val列:

df1['Val'] = df1[['Val_x', 'Val_y']].sum(axis=1)
  Type  Val_x Cat  Val_y  Val
0    A     20   p     30   50
1    A    -10   n    -40  -50
2    A     20   p     30   50
3    A    -10   n    -40  -50
4    B     30   p     20   50
5    B    -20   n    -30  -50
6    C     40   p     10   50
7    C    -30   n    -20  -50

drop extra columns: drop额外的列:

df1 = df1.drop(['Cat', 'Val_x', 'Val_y'], 1)
  Type  Val
0    A   50
1    A  -50
2    A   50
3    A  -50
4    B   50
5    B  -50
6    C   50
7    C  -50

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM