I have this code working fine
df_amazon = pd.read_csv ("datasets/amazon_alexa.tsv", sep="\t")
X = df_amazon['variation'] # the features we want to analyze
ylabels = df_amazon['feedback'] # the labels, or answers, we want to test against
X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)
# Create pipeline using Bag of Words
pipe = Pipeline([('cleaner', predictors()),
('vectorizer', bow_vector),
('classifier', classifier)])
pipe.fit(X_train,y_train)
But if I try to add 1 more feature to the model, replacing
X = df_amazon['variation']
by
X = df_amazon[['variation','verified_reviews']]
I have this error message from Sklearn when I call fit
:
ValueError: Found input variables with inconsistent numbers of samples: [2, 2205]
So fit
works when X_train
and y_train
have the shapes (2205,) and (2205,).
But not when the shapes are changed to (2205, 2) and (2205,).
What's the best way to deal with that?
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
df = pd.DataFrame(data = [['Heather Gray Fabric','I received the echo as a gift.',1],['Sandstone Fabric','Without having a cellphone, I cannot use many of her features',0]], columns = ['variation','review','feedback'])
vect = CountVectorizer()
vect.fit_transform(df[['variation','review']])
# now when you look at vocab that has been created
print(vect.vocabulary_)
#o/p, where feature has been generated only for column name and not content of particular column
Out[49]:
{'variation': 1, 'review': 0}
#so you need to make one column which contain which contain variation and review both and that need to be passed into your model
df['variation_review'] = df['variation'] + df['review']
vect.fit_transform(df['variation_review'])
print(vect.vocabulary_)
{'heather': 8,
'gray': 6,
'fabrici': 3,
'received': 9,
'the': 11,
'echo': 2,
'as': 0,
'gift': 5,
'sandstone': 10,
'fabricwithout': 4,
'having': 7,
'cellphone': 1}
The data must have a shape (n_samples, n_features)
. Try to traspose X ( XT
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.