繁体   English   中英

使用 python 替换数据帧中特定列中的值

[英]replacing value in specific columns in datafram by using python

这是计算证据权重的代码

#好是零坏是一

离散无序变量的证据 function 的权重

    df = pd.concat([df[the_categroical_name], My_target], axis = 1)
    df = pd.concat([df.groupby(df.columns.values[0], as_index = False)[df.columns.values[1]].count(),
                    df.groupby(df.columns.values[0], as_index = False)[df.columns.values[1]].mean()], axis = 1)
    df = df.iloc[:, [0, 1, 3]]
    df.columns = [df.columns.values[0], 'Number_of_observation', 'Probation_good_taxPayer']
    df['prop_Number_of_observation'] = df['Number_of_observation'] / df['Number_of_observation'].sum()
    df['N_good'] = df['Probation_good_taxPayer'] * df['Number_of_observation']
    df['n_bad'] = (1 - df['Probation_good_taxPayer']) * df['Number_of_observation']
    df['prop_n_good'] = df['N_good'] / df['N_good'].sum()
    df['prop_of_bad'] = df['n_bad'] / df['n_bad'].sum()
    df['WoE'] = np.log(df['prop_n_good'] / df['prop_of_bad'])
    df['PD']= ((df['N_good'])/(df['n_bad'] + df['N_good']))
    df = df.sort_values(['WoE'])
    df = df.reset_index(drop = True)
    #df['diff_Probation_good_taxPayer'] = df['Probation_good_taxPayer'].diff().abs()
    #df['diff_WoE'] = df['WoE'].diff().abs()
    df['IV'] = (df['prop_n_good'] - df['prop_of_bad']) * df['WoE']
    df['IV'] = df['IV'].sum()
    return df 
df_BUSINESS_CATEGORY = Weight_of_evidance(df_input, 'BUSINESS_CATEGORY', df_Label)
# We execute the function we defined with the necessary arguments: a dataframe, a string, and a dataframe.
# We store the result in a dataframe.
df_BUSINESS_CATEGORY

在此处输入图像描述

所以现在,如果我想用它们在列 Woe 中的值替换 business_category 中的任何值,例如 A 是 -0978021 stc,现在我正在使用如下代码中的 for 循环


def flag_df_ISIC_4_ARAB(df_input):
    if (df_input['BUSINESS_CATEGORY'] == 'A'):
        return '-0.978021'
    elif (df_input['BUSINESS_CATEGORY'] == 'اB'):
        return '-0.977854'
    elif (df_input['BUSINESS_CATEGORY'] == 'C'):
        return '0.082918'
    elif (df_input['BUSINESS_CATEGORY'] == 'D'):
        return '0.772306'
    elif (df_input['BUSINESS_CATEGORY'] == 'H'):
        return '-0.176700'
    elif (df_input['BUSINESS_CATEGORY'] == 'أخرى'):
        return '0.955446'
      else:
        return '0'
df_input['BUSINESS_CATEGORY'] = df_input.apply(flag_df_ISIC_4_ARAB, axis = 1).astype(str)```





is there another way to replace the Woe with out using for loop 

首先创建字典,传递给Series.map并将不匹配的值替换为'0'

d = {'A':'-0.978021','اB':'-0.977854', 'C':'0.082918', 
     'D':'0.772306', 'H': '-0.176700', 'أخرى': '0.955446'}

df_input['BUSINESS_CATEGORY'] = df_input['BUSINESS_CATEGORY'].map(d).fillna('0')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM