简体   繁体   English

不平衡学习的FunctionSampler引发ValueError

[英]Imbalanced-Learn's FunctionSampler throws ValueError

I want to use the class FunctionSampler from imblearn to create my own custom class for resampling my dataset. 我想使用来自imblearn FunctionSampler类来创建自己的自定义类,以对数据集进行重采样。

I have a one-dimensional feature Series containing paths for each subject and a label Series containing the labels for each subject. 我有一个一维要素系列,其中包含每个主题的路径,以及一个标签系列,其中包含每个主题的标签。 Both come from a pd.DataFrame . 两者都来自pd.DataFrame I know that I have to reshape the feature array first since it is one-dimensional. 我知道我必须首先重塑特征数组,因为它是一维的。

When I use the class RandomUnderSampler everything works fine, however if I pass both the features and labels first to the fit_resample method of FunctionSampler which then creates an instance of RandomUnderSampler and then calls fit_resample on this class, I get the following error: 当我使用RandomUnderSamplerRandomUnderSampler一切正常,但是,如果我先将功能和标签都传递给FunctionSamplerfit_resample方法,然后再创建RandomUnderSampler的实例,然后在此类上调用fit_resample收到以下错误:

ValueError: could not convert string to float: 'path_1' ValueError:无法将字符串转换为float:'path_1'

Here's a minimal example producing the error: 这是产生错误的最小示例:

import pandas as pd
from imblearn.under_sampling import RandomUnderSampler
from imblearn import FunctionSampler

# create one dimensional feature and label arrays X and y
# X has to be converted to numpy array and then reshaped. 
X = pd.Series(['path_1','path_2','path_3'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])

FIRST METHOD (works) 第一种方法(有效)

rus = RandomUnderSampler()
X_res, y_res = rus.fit_resample(X,y)

SECOND METHOD (doesn't work) 第二种方法(无效)

def resample(X, y):
    return RandomUnderSampler().fit_resample(X, y)

sampler = FunctionSampler(func=resample)
X_res, y_res = sampler.fit_resample(X, y)

Does anyone know what goes wrong here? 有谁知道这里出了什么问题? It seems as the fit_resample method of FunctionSampler is not equal to the fit_resample method of RandomUnderSampler ... 这似乎为fit_resample的方法FunctionSampler不等于fit_resample的方法RandomUnderSampler ...

Your implementation of FunctionSampler is correct. 您对FunctionSampler实现是正确的。 The problem is with your dataset. 问题出在您的数据集上。

RandomUnderSampler seems to work for text data as well. RandomUnderSampler似乎也适用于文本数据。 There is no checking using check_X_y . 没有使用check_X_y检查。

But FunctionSampler() has this check, see here 但是FunctionSampler()具有此检查,请参见此处

from sklearn.utils import check_X_y

X = pd.Series(['path_1','path_2','path_2'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])

check_X_y(X, y)

This will throw an error 这将引发错误

ValueError: could not convert string to float: 'path_1' ValueError:无法将字符串转换为float:'path_1'

The following example would work! 以下示例将起作用!

X = pd.Series(['1','2','2'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])

def resample(X, y):
    return RandomUnderSampler().fit_resample(X, y)

sampler = FunctionSampler(func=resample)
X_res, y_res = sampler.fit_resample(X, y)

X_res, y_res 
# (array([[2.],
#        [1.]]), array([0, 1], dtype=int64))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法安装不平衡学习 - Unable to install imbalanced-learn 将GridSearchCV结果传递给Imbalanced-Learn的Pipeline对象 - Passing GridSearchCV results to an Imbalanced-Learn's Pipeline object 使用不平衡学习库的特征重要性 - Feature Importance using Imbalanced-learn library 不平衡学习:导入错误:无法导入名称“MultiOutputMixin” - Imbalanced-learn: Import Error: cannot import name 'MultiOutputMixin' 使用不平衡学习过采样后用于训练的形状输出 - Output of shape for training after oversampling with imbalanced-learn ValueError:在使用来自不平衡学习的 SMOTENC 时,无法将输入数组从形状 (3,96) 广播到形状 (184,96) - ValueError: could not broadcast input array from shape (3,96) into shape (184,96) while using SMOTENC from imbalanced learn scikit-learn learning_curve函数在喂入SVM分类器时会引发ValueError - scikit-learn learning_curve function throws a ValueError when fed a SVM Classifier matplotlib的pcolormesh引发ValueError:太多值无法解包 - matplotlib's pcolormesh throws ValueError: too many values to unpack 使用scikit-learn的数据集不平衡且负多数 - Imbalanced data set with a negative example majority using scikit-learn 如何使用过采样和欠采样的组合? 学习不平衡 - How to use combination of over- and undersampling? with imbalanced learn
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM