不平衡学习的FunctionSampler引发ValueError

Question

I want to use the class FunctionSampler from imblearn to create my own custom class for resampling my dataset. 我想使用来自imblearn FunctionSampler类来创建自己的自定义类，以对数据集进行重采样。

I have a one-dimensional feature Series containing paths for each subject and a label Series containing the labels for each subject. 我有一个一维要素系列，其中包含每个主题的路径，以及一个标签系列，其中包含每个主题的标签。 Both come from a pd.DataFrame . 两者都来自pd.DataFrame 。 I know that I have to reshape the feature array first since it is one-dimensional. 我知道我必须首先重塑特征数组，因为它是一维的。

When I use the class RandomUnderSampler everything works fine, however if I pass both the features and labels first to the fit_resample method of FunctionSampler which then creates an instance of RandomUnderSampler and then calls fit_resample on this class, I get the following error: 当我使用RandomUnderSampler类RandomUnderSampler一切正常，但是，如果我先将功能和标签都传递给FunctionSampler的fit_resample方法，然后再创建RandomUnderSampler的实例，然后在此类上调用fit_resample收到以下错误：

ValueError: could not convert string to float: 'path_1' ValueError：无法将字符串转换为float：'path_1'

Here's a minimal example producing the error: 这是产生错误的最小示例：

import pandas as pd
from imblearn.under_sampling import RandomUnderSampler
from imblearn import FunctionSampler

# create one dimensional feature and label arrays X and y
# X has to be converted to numpy array and then reshaped. 
X = pd.Series(['path_1','path_2','path_3'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])

FIRST METHOD (works) 第一种方法（有效）

rus = RandomUnderSampler()
X_res, y_res = rus.fit_resample(X,y)

SECOND METHOD (doesn't work) 第二种方法（无效）

def resample(X, y):
    return RandomUnderSampler().fit_resample(X, y)

sampler = FunctionSampler(func=resample)
X_res, y_res = sampler.fit_resample(X, y)

Does anyone know what goes wrong here? 有谁知道这里出了什么问题？ It seems as the fit_resample method of FunctionSampler is not equal to the fit_resample method of RandomUnderSampler ... 这似乎为fit_resample的方法FunctionSampler不等于fit_resample的方法RandomUnderSampler ...

Answer 1

Your implementation of FunctionSampler is correct. 您对FunctionSampler实现是正确的。 The problem is with your dataset. 问题出在您的数据集上。

RandomUnderSampler seems to work for text data as well. RandomUnderSampler似乎也适用于文本数据。 There is no checking using check_X_y . 没有使用check_X_y检查。

But FunctionSampler() has this check, see here 但是FunctionSampler()具有此检查，请参见此处

from sklearn.utils import check_X_y

X = pd.Series(['path_1','path_2','path_2'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])

check_X_y(X, y)

This will throw an error 这将引发错误

ValueError: could not convert string to float: 'path_1' ValueError：无法将字符串转换为float：'path_1'

The following example would work! 以下示例将起作用！

X = pd.Series(['1','2','2'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])

def resample(X, y):
    return RandomUnderSampler().fit_resample(X, y)

sampler = FunctionSampler(func=resample)
X_res, y_res = sampler.fit_resample(X, y)

X_res, y_res 
# (array([[2.],
#        [1.]]), array([0, 1], dtype=int64))

不平衡学习的FunctionSampler引发ValueError

问题描述

FIRST METHOD (works) 第一种方法（有效）

SECOND METHOD (doesn't work) 第二种方法（无效）

1 个解决方案

解决方案1
2 已采纳 2019-07-01 07:16:57

不平衡学习的FunctionSampler引发ValueError

问题描述

FIRST METHOD (works) 第一种方法（有效）

SECOND METHOD (doesn't work) 第二种方法（无效）

1 个解决方案

解决方案1 2 已采纳 2019-07-01 07:16:57

解决方案1
2 已采纳 2019-07-01 07:16:57