I want to use the class FunctionSampler
from imblearn
to create my own custom class for resampling my dataset.
I have a one-dimensional feature Series containing paths for each subject and a label Series containing the labels for each subject. Both come from a pd.DataFrame
. I know that I have to reshape the feature array first since it is one-dimensional.
When I use the class RandomUnderSampler
everything works fine, however if I pass both the features and labels first to the fit_resample
method of FunctionSampler
which then creates an instance of RandomUnderSampler
and then calls fit_resample
on this class, I get the following error:
ValueError: could not convert string to float: 'path_1'
Here's a minimal example producing the error:
import pandas as pd
from imblearn.under_sampling import RandomUnderSampler
from imblearn import FunctionSampler
# create one dimensional feature and label arrays X and y
# X has to be converted to numpy array and then reshaped.
X = pd.Series(['path_1','path_2','path_3'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])
rus = RandomUnderSampler()
X_res, y_res = rus.fit_resample(X,y)
def resample(X, y):
return RandomUnderSampler().fit_resample(X, y)
sampler = FunctionSampler(func=resample)
X_res, y_res = sampler.fit_resample(X, y)
Does anyone know what goes wrong here? It seems as the fit_resample
method of FunctionSampler
is not equal to the fit_resample
method of RandomUnderSampler
...
Your implementation of FunctionSampler
is correct. The problem is with your dataset.
RandomUnderSampler
seems to work for text data as well. There is no checking using check_X_y
.
But FunctionSampler()
has this check, see here
from sklearn.utils import check_X_y
X = pd.Series(['path_1','path_2','path_2'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])
check_X_y(X, y)
This will throw an error
ValueError: could not convert string to float: 'path_1'
The following example would work!
X = pd.Series(['1','2','2'])
X = X.values.reshape(-1,1)
y = pd.Series([1,0,0])
def resample(X, y):
return RandomUnderSampler().fit_resample(X, y)
sampler = FunctionSampler(func=resample)
X_res, y_res = sampler.fit_resample(X, y)
X_res, y_res
# (array([[2.],
# [1.]]), array([0, 1], dtype=int64))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.