![](/img/trans.png)
[英]Scipy hstack results in “TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))”
[英]Sklearn FeatureUnion returns TypeError: no supported conversion for types: (dtype('int64'), dtype('O'))
我正在嘗試聯合兩條管道:
這樣做時,我收到錯誤:
TypeError:不支持類型轉換:(dtype('int64'),dtype('O'))
我的目標是找到一種通用方法,將 DataFrame 的原始列保留在管道中,以供分類器稍后使用。
代碼:
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline, FeatureUnion
class ColumnSelector(BaseEstimator, TransformerMixin):
def __init__(self, key, transform_function=None):
self.key = key
self.transform_function = transform_function
def fit(self, X, y=None, *parg, **kwarg):
return self
def transform(self, X):
result = X[self.key]
if self.transform_function:
result = self.transform_function(result)
return result
data = [
{'col1': 'hello my friend', 'col2': 'somestring_'},
{'col1': 'my friend', 'col2': 'somestring__'},
{'col1': 'hello friend', 'col2': 'somestring___'}
]
df = pd.DataFrame(data)
pipeline_1 = Pipeline([
('selector', ColumnSelector(key='col1')),
('vectorizer', CountVectorizer())
])
pipeline_2 = Pipeline([
('test', ColumnSelector(key='col2'))#, transform_function=lambda col: col.to_frame())),
])
feats = FeatureUnion([('count_vectorize', pipeline_1), ('original_column', pipeline_2)])
feats.fit_transform(df)
FeatureUnion 使用 numpy 或 scipy 稀疏運算來加入其中每個特征的 output。 因此,您不能在 FeatureUnion 中有任何可以返回非數值的步驟。
如果我更改您的pipeline2
以返回給定字符串中的字符數,它將開始工作。
注意:您可以使用ColumnTransformer
中的sklearn.compose
。
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline, FeatureUnion
class ColumnSelector(BaseEstimator, TransformerMixin):
def __init__(self, key, transform_function=None):
self.key = key
self.transform_function = transform_function
def fit(self, X, y=None, *parg, **kwarg):
return self
def transform(self, X):
result = X[self.key]
if self.transform_function:
result = self.transform_function(result)
return result
data = [
{'col1': 'hello my friend', 'col2': 'somestring_'},
{'col1': 'my friend', 'col2': 'somestring__'},
{'col1': 'hello friend', 'col2': 'somestring___'}
]
df = pd.DataFrame(data)
pipeline_1 = Pipeline([
('selector', ColumnSelector(key='col1')),
('vectorizer', CountVectorizer())
])
pipeline_2 = Pipeline([
('test', ColumnSelector(key='col2',transform_function=lambda x: [[len(i)] for i in x]))#, transform_function=lambda col: col.to_frame())),
])
feats = FeatureUnion([('count_vectorize', pipeline_1), ('original_column', pipeline_2)])
feats.fit_transform(df)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.