[英]Imbalanced-Learn's FunctionSampler throws ValueError
I want to use the class FunctionSampler
from imblearn
to create my own custom class for resampling my dataset. 我想使用来自
imblearn
FunctionSampler
类来创建自己的自定义类,以对数据集进行重采样。
I have a one-dimensional feature Series containing paths for each subject and a label Series containing the labels for each subject. 我有一个一维要素系列,其中包含每个主题的路径,以及一个标签系列,其中包含每个主题的标签。 Both come from a
pd.DataFrame
. 两者都来自
pd.DataFrame
。 I know that I have to reshape the feature array first since it is one-dimensional. 我知道我必须首先重塑特征数组,因为它是一维的。
When I use the class RandomUnderSampler
everything works fine, however if I pass both the features and labels first to the fit_resample
method of FunctionSampler
which then creates an instance of RandomUnderSampler
and then calls fit_resample
on this class, I get the following error: 当我使用
RandomUnderSampler
类RandomUnderSampler
一切正常,但是,如果我先将功能和标签都传递给FunctionSampler
的fit_resample
方法,然后再创建RandomUnderSampler
的实例,然后在此类上调用fit_resample
收到以下错误:
ValueError: could not convert string to float: 'path_1'
ValueError:无法将字符串转换为float:'path_1'
Here's a minimal example producing the error: 这是产生错误的最小示例:
import pandas as pd
from imblearn.under_sampling import RandomUnderSampler
from imblearn import FunctionSampler
# create one dimensional feature and label arrays X and y
# X has to be converted to numpy array and then reshaped.
X = pd.Series(['path_1','path_2','path_3'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])
rus = RandomUnderSampler()
X_res, y_res = rus.fit_resample(X,y)
def resample(X, y):
return RandomUnderSampler().fit_resample(X, y)
sampler = FunctionSampler(func=resample)
X_res, y_res = sampler.fit_resample(X, y)
Does anyone know what goes wrong here? 有谁知道这里出了什么问题? It seems as the
fit_resample
method of FunctionSampler
is not equal to the fit_resample
method of RandomUnderSampler
... 这似乎为
fit_resample
的方法FunctionSampler
不等于fit_resample
的方法RandomUnderSampler
...
Your implementation of FunctionSampler
is correct. 您对
FunctionSampler
实现是正确的。 The problem is with your dataset. 问题出在您的数据集上。
RandomUnderSampler
seems to work for text data as well. RandomUnderSampler
似乎也适用于文本数据。 There is no checking using check_X_y
. 没有使用
check_X_y
检查。
But FunctionSampler()
has this check, see here 但是
FunctionSampler()
具有此检查,请参见此处
from sklearn.utils import check_X_y
X = pd.Series(['path_1','path_2','path_2'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])
check_X_y(X, y)
This will throw an error 这将引发错误
ValueError: could not convert string to float: 'path_1'
ValueError:无法将字符串转换为float:'path_1'
The following example would work! 以下示例将起作用!
X = pd.Series(['1','2','2'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])
def resample(X, y):
return RandomUnderSampler().fit_resample(X, y)
sampler = FunctionSampler(func=resample)
X_res, y_res = sampler.fit_resample(X, y)
X_res, y_res
# (array([[2.],
# [1.]]), array([0, 1], dtype=int64))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.