简体   繁体   English

如何根据另一个 dataframe 中的条件在 dataframe 中创建新列?

[英]how to create a new column in a dataframe based on conditions in another dataframe?

df1:

Variables     left      right
0  AUM           -0.001    28.20
1  AUM           28.20     40.28
2  AUM           40.28     58.27
3  AUM           58.27     80.72
4  AUM           80.72     100.00
0  ABS           -88.01    200.72
1  ABS           200.72    480.72
2  ABS           480.72    800.20
0  LS            10000     200000
1  LS            200000    400000
df2:
   
    Pan_no     ABS      AUM     LS      
0   AAA        28        30    10001      
2   CCC        500       98    390000     
1   BBB        250       50    150000     
3   DDD        100       60    380000     
4   EEE         88       10    378347   
  

Conditions:情况:

Based on the left and right value in df1, a new column should be created in df2 and the values inside the column should be the index of df1 for that particular variable.基于 df1 中的左右值,应在 df2 中创建一个新列,该列内的值应为该特定变量的 df1 索引。

Example: In df2 if AUM value falls under this range (-0.001 - 28.20) then the new column will have the index value of df1 as the new value.示例:在 df2 中,如果 AUM 值在此范围内 (-0.001 - 28.20),则新列将 df1 的索引值作为新值。 ie 0即0

Similarly,相似地,

In df2 if ABS value falls under this range (200.72 - 480.72) then the new column ABS_BIN will have the index value of df1 as the new value.在 df2 中,如果 ABS 值在此范围内 (200.72 - 480.72),则新列 ABS_BIN 将 df1 的索引值作为新值。 ie 1即 1

WHat i have tried is:我试过的是:

binning_vars = ['ABS','AUM','LS']
def f(row):
  for i in binning_vars:
      for j in df1[df1['Variable'] == i].index:
            if df1[i] >= df1['left'] & df1[i] >= df1['right']:
                value = j
            else:
                pass
            return value
df2[i,'_bin'] = df1.apply(f, axis=1)

but it throws an error as TypeError: unsupported operand type(s) for &: 'float' and 'float'.但它会抛出一个错误 TypeError: unsupported operand type(s) for &: 'float' and 'float'。 Any help will be much appreciated.任何帮助都感激不尽。

Expected Output:
with new columns in df2:
    
    Pan_no     ABS      AUM     LS      ABS_BIN    AUM_BIN     LS_BIN
0   AAA        28        30    10001      0          1           0
1   BBB        250       50    150000     1          2           0
2   CCC        500       98    390000     2          4           1
3   DDD        100       60    380000     0          3           1
4   EEE         88       10    378347     0          0           1

You can use pd.cut and avoid loops inside the binning function:您可以使用pd.cut并避免binning function 内的循环:

def binning(sr):
    df = df1.loc[df1['Variables'] == sr.name, ['left', 'right']]
    bins = sorted(set(df.to_numpy().ravel()))
    return pd.cut(sr, bins=bins, labels=df.index)

out = df2[binning_vars].apply(binning).add_suffix('_BIN')
df2 = pd.concat([df2, out], axis=1)

Output: Output:

>>> df2
  Pan_no  ABS  AUM      LS ABS_BIN AUM_BIN LS_BIN
0    AAA   28   30   10001       0       1      0
2    CCC  500   98  390000       2       4      1
1    BBB  250   50  150000       1       2      0
3    DDD  100   60  380000       0       3      1
4    EEE   88   10  378347       0       0      1

You can use merge_asof to avoid using apply :您可以使用merge_asof来避免使用apply

out = df2.merge(
 pd.merge_asof((df2.melt(id_vars='Pan_no')
                   .astype({'value': float})
                   .sort_values(by='value')
                ),
               df1.reset_index().sort_values(by='left'),
               left_by='variable', right_by='Variables',
               left_on='value', right_on='left', direction='backward')
  .pivot(index='Pan_no', columns='variable', values='index')
  .add_suffix('_BIN'),
    left_on='Pan_no', right_index=True
)

output: output:

  Pan_no  ABS  AUM      LS  ABS_BIN  AUM_BIN  LS_BIN
0    AAA   28   30   10001        0        1       0
2    CCC  500   98  390000        2        4       1
1    BBB  250   50  150000        1        2       0
3    DDD  100   60  380000        0        3       1
4    EEE   88   10  378347        0        0       1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据条件在 dataframe 中创建一个新列? - how to create a new column in a dataframe based on conditions? 如何根据另一个数据框的条件创建新的数据框列? - How do I create a new dataframe column based on conditions of another dataframe? 如何基于另一个DataFrame中的列在Pandas DataFrame中创建新列? - How to create a new column in a Pandas DataFrame based on a column in another DataFrame? 如何根据另一列的条件将新列添加到 dataframe - How to add a new column to dataframe based on conditions on another column 如何根据python中现有列中的条件创建新列? - How to create a new column based on conditions in the existing columns in a dataframe in python? 如何根据此 Pandas 数据框中的列条件创建新的索引行? - How to create new index lines based on column conditions in this pandas dataframe? 根据应用于另一个 DataFrame 的条件创建一个新的 DataFrame - Create a new DataFrame based on conditions applied to another DataFrame 根据来自另一个数据框的条件创建新的数据框进行循环 - Create a new Dataframe based on conditions from another Dataframe for loop DataFrame中基于条件的新列 - New column in DataFrame based on conditions 如何基于来自熊猫中其他数据框的多个条件在数据框中创建新的布尔列 - How to create a new boolean column in a dataframe based on multiple conditions from other dataframe in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM