如何用 pandas 数据框中的 WOE 值替换数据框中所有列的 woe 值

Question

I am trying to replace all the columns in a Pandas data frame with respective woe values.我正在尝试用各自的 woe 值替换 Pandas 数据帧中的所有列。

I calculated woe values in a separate function.我在单独的 function 中计算了 woe 值。

I have variable, bin, binedges, WOE in one data frame and in the main data frame.我在一个数据帧和主数据帧中有变量、bin、binedges、WOE。

I have customer_id and the rest of independent varaibles, i have replace the independent varaible values with respective woe values.我有独立变量的customer_id和 rest，我已经用各自的 woe 值替换了独立变量值。

Can any one please help?有人可以帮忙吗？

Answer 1

You can use the xverse package in python for this.您可以为此使用 python 中的 xverse package。

First of all install the xverse package using Anaconda Prompt:首先使用 Anaconda 安装 xverse package 提示：

pip install xverse

Note: I'm also showing how to make bins.注意：我还展示了如何制作垃圾箱。

Then import MonotonicBinning from the xverse package in your notebook and make bins.然后从笔记本中的 xverse package 导入 MonotonicBinning 并制作 bins。

from xverse.transformer import MonotonicBinning

clf = MonotonicBinning()
clf.fit(X, y)
output_bins = clf.bins

Where X is the set of features(of which you want to replace by woe values) as pandas Dataframe and y is the target variable in form of an array其中 X 是一组特征（你想用 woe 值替换），如 pandas Dataframe 和 y 是数组形式的目标变量

Now store the bins in a separate dataset with the same column names:现在将 bin 存储在具有相同列名的单独数据集中：

X1 = clf.transform(X)

Now import WOE from the xverse package现在从 xverse package 导入 WOE

from xverse.transformer import WOE
clf1 = WOE()
clf1.fit(X1, y)

X2 = clf1.transform(X1)

X2 is the required dataframe of features replaced by their respective woe values X2 是所需的 dataframe 的特征被它们各自的 woe 值替换

Answer 2

You can use XVERSE.您可以使用 XVERSE。

Step-1: Feature Subset select a subset of features from the dataset.步骤 1：特征子集 select 数据集中的特征子集。 A list of features should be provided to subset.应向子集提供功能列表。

from xverse.feature_subset import FeatureSubset
numerical_features = list(df._get_numeric_data().columns)
categorical_features = list(df.columns.difference(numerical_features))
print(numerical_features)

clf = FeatureSubset(numerical_features) #select only numeric features
df = clf.fit_transform(df) #returns the dataframe with selected features

Step-2: Split X and Y步骤 2：拆分 X 和 Y

from xverse.feature_subset import SplitXY
clf = SplitXY(['target']) #Split the dataset into X and y
X, y = clf.fit_transform(df) #returns features (X) dataset and target(Y) as a numpy array

Step-3: Weight of Evidence第三步：证据权重

from xverse.transformer import WOE
clf = WOE()
clf.fit(X, y)

Have a look at the Information value of each of the features clf.iv_df查看每个特征clf.iv_df的信息值

output_woe_bins = clf.woe_bins #future transformation 
output_mono_bins = clf.mono_custom_binning  #future transformation

Also, Using the custom binning option in the future to score new data - WOE此外，将来使用自定义分箱选项对新数据进行评分 - WOE

clf = WOE(woe_bins=output_woe_bins, mono_custom_binning=output_mono_bins) #output_bins was created earlier
out_X = clf.transform(X)

Take some time to have a complete understanding of the Parameters of WOE花点时间全面了解WOE的参数

feature_names: 'all' or list (default='all')
    list of features to perform WOE transformation. 
    - 'all' (default): All categorical features in the dataset will be used
    - list of features: ['age', 'income',......]

exclude_features: list (default=None)
    list of features to be excluded from WOE transformation.
    - Example - ['age', 'income', .......]

woe_prefix: string (default=None)
    Variable prefix to be used for the column created by WOE transformer. The default value is set 'None'.

treat_missing: {'separate', 'mode', 'least_frequent'} (default='separate')
    This parameter setting is used to handle missing values in the dataset.
    'separate' - Missing values are treated as a own group (category)
    'mode' - Missing values are combined with the highest frequent item in the dataset
    'least_frequent' - Missing values are combined with the least frequent item in the dataset

woe_bins: dict of dicts(default=None)
    This feature is added as part of future WOE transformations or scoring. If this value is set, then WOE values provided for each of the features here will be used for transformation. Applicable only in the transform method. 
    Dictionary structure - {'feature_name': float list}
    Example - {'education': {'primary' : 0.1, 'tertiary' : 0.5, 'secondary', 0.7}}

monotonic_binning: bool (default=True)
    This parameter is used to perform monotonic binning on numeric variables. If set to False, numeric variables would be ignored.

mono_feature_names: 'all' or list (default='all')
    list of features to perform monotonic binning operation. 
    - 'all' (default): All features in the dataset will be used
    - list of features: ['age', 'income',......]

mono_max_bins: int (default=20)
    Maximum number of bins that can be created for any given variable. The final number of bins created will be less than or equal to this number.

mono_force_bins: int (default=3)
    It forces the module to create bins for a variable, when it cannot find monotonic relationship using "max_bins" option. The final number of bins created will be equal to the number specified.

mono_cardinality_cutoff: int (default=5)
    Cutoff to determine if a variable is eligible for monotonic binning operation. Any variable which has unique levels less than this number will be treated as character variables. At this point no binning operation will be performed on the variable and it will return the unique levels as bins for these variable.

mono_prefix: string (default=None)
    Variable prefix to be used for the column created by monotonic binning.

mono_custom_binning: dict (default=None)
    Using this parameter, the user can perform custom binning on variables. This parameter is also used to apply previously computed bins for each feature (Score new data).
    Dictionary structure - {'feature_name': float list}
    Example - {'age': [0., 1., 2., 3.]

如何用 pandas 数据框中的 WOE 值替换数据框中所有列的 woe 值

问题描述

2 个解决方案

解决方案1
1 2020-08-23 06:02:25

解决方案2
0 2022-06-06 05:56:15

Take some time to have a complete understanding of the Parameters of WOE花点时间全面了解WOE的参数

如何用 pandas 数据框中的 WOE 值替换数据框中所有列的 woe 值

问题描述

2 个解决方案

解决方案1 1 2020-08-23 06:02:25

解决方案2 0 2022-06-06 05:56:15

Take some time to have a complete understanding of the Parameters of WOE花点时间全面了解WOE的参数

解决方案1
1 2020-08-23 06:02:25

解决方案2
0 2022-06-06 05:56:15