簡體   English   中英

如何用 pandas 數據框中的 WOE 值替換數據框中所有列的 woe 值

[英]how to replace woe values of all columns in data frame with WOE values in pandas data frame

我正在嘗試用各自的 woe 值替換 Pandas 數據幀中的所有列。

我在單獨的 function 中計算了 woe 值。

我在一個數據幀和主數據幀中有變量、bin、binedges、WOE。

我有獨立變量的customer_id和 rest,我已經用各自的 woe 值替換了獨立變量值。

有人可以幫忙嗎?

您可以為此使用 python 中的 xverse package。

首先使用 Anaconda 安裝 xverse package 提示:

pip install xverse

注意:我還展示了如何制作垃圾箱。

然后從筆記本中的 xverse package 導入 MonotonicBinning 並制作 bins。

from xverse.transformer import MonotonicBinning

clf = MonotonicBinning()
clf.fit(X, y)
output_bins = clf.bins

其中 X 是一組特征(你想用 woe 值替換),如 pandas Dataframe 和 y 是數組形式的目標變量

現在將 bin 存儲在具有相同列名的單獨數據集中:

X1 = clf.transform(X)

現在從 xverse package 導入 WOE

from xverse.transformer import WOE
clf1 = WOE()
clf1.fit(X1, y)

X2 = clf1.transform(X1)

X2 是所需的 dataframe 的特征被它們各自的 woe 值替換

您可以使用 XVERSE。

步驟 1:特征子集 select 數據集中的特征子集。 應向子集提供功能列表。

from xverse.feature_subset import FeatureSubset
numerical_features = list(df._get_numeric_data().columns)
categorical_features = list(df.columns.difference(numerical_features))
print(numerical_features)

clf = FeatureSubset(numerical_features) #select only numeric features
df = clf.fit_transform(df) #returns the dataframe with selected features 

步驟 2:拆分 X 和 Y

from xverse.feature_subset import SplitXY
clf = SplitXY(['target']) #Split the dataset into X and y
X, y = clf.fit_transform(df) #returns features (X) dataset and target(Y) as a numpy array

第三步:證據權重

from xverse.transformer import WOE
clf = WOE()
clf.fit(X, y)

查看每個特征clf.iv_df的信息值

output_woe_bins = clf.woe_bins #future transformation 
output_mono_bins = clf.mono_custom_binning  #future transformation 

此外,將來使用自定義分箱選項對新數據進行評分 - WOE

clf = WOE(woe_bins=output_woe_bins, mono_custom_binning=output_mono_bins) #output_bins was created earlier
out_X = clf.transform(X)

花點時間全面了解WOE的參數

feature_names: 'all' or list (default='all')
    list of features to perform WOE transformation. 
    - 'all' (default): All categorical features in the dataset will be used
    - list of features: ['age', 'income',......]
exclude_features: list (default=None)
    list of features to be excluded from WOE transformation.
    - Example - ['age', 'income', .......]
woe_prefix: string (default=None)
    Variable prefix to be used for the column created by WOE transformer. The default value is set 'None'.  
treat_missing: {'separate', 'mode', 'least_frequent'} (default='separate')
    This parameter setting is used to handle missing values in the dataset.
    'separate' - Missing values are treated as a own group (category)
    'mode' - Missing values are combined with the highest frequent item in the dataset
    'least_frequent' - Missing values are combined with the least frequent item in the dataset
woe_bins: dict of dicts(default=None)
    This feature is added as part of future WOE transformations or scoring. If this value is set, then WOE values provided for each of the features here will be used for transformation. Applicable only in the transform method. 
    Dictionary structure - {'feature_name': float list}
    Example - {'education': {'primary' : 0.1, 'tertiary' : 0.5, 'secondary', 0.7}}
monotonic_binning: bool (default=True)
    This parameter is used to perform monotonic binning on numeric variables. If set to False, numeric variables would be ignored.
mono_feature_names: 'all' or list (default='all')
    list of features to perform monotonic binning operation. 
    - 'all' (default): All features in the dataset will be used
    - list of features: ['age', 'income',......]
mono_max_bins: int (default=20)
    Maximum number of bins that can be created for any given variable. The final number of bins created will be less than or equal to this number.
mono_force_bins: int (default=3)
    It forces the module to create bins for a variable, when it cannot find monotonic relationship using "max_bins" option. The final number of bins created will be equal to the number specified.
mono_cardinality_cutoff: int (default=5)
    Cutoff to determine if a variable is eligible for monotonic binning operation. Any variable which has unique levels less than this number will be treated as character variables. At this point no binning operation will be performed on the variable and it will return the unique levels as bins for these variable.
mono_prefix: string (default=None)
    Variable prefix to be used for the column created by monotonic binning. 
mono_custom_binning: dict (default=None)
    Using this parameter, the user can perform custom binning on variables. This parameter is also used to apply previously computed bins for each feature (Score new data).
    Dictionary structure - {'feature_name': float list}
    Example - {'age': [0., 1., 2., 3.]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM