简体   繁体   English

如何修复错误 ValueError: could not convert string to float in a NLP project in python?

[英]how to fix the error ValueError: could not convert string to float in a NLP project in python?

I am writing a python code using jupyter notebook that train and test a dataset in order to return a correct sentiment.我正在使用jupyter notebook编写 python 代码,该代码训练和测试数据集以返回正确的情绪。

The problem that when i try to predict the sentiment of the phrase the system crash and display the below error:当我尝试预测短语的情绪时系统崩溃并显示以下错误的问题:

ValueError: could not convert string to float: 'this book was so interstening it made me not happy' ValueError:无法将字符串转换为浮点数:“这本书太有趣了,让我不开心”

Note i have an imbalanced dataset so i use SMOTE in order to over_sampling the dataset请注意,我有一个不平衡的数据集,所以我使用SMOTE来对数据集进行过度采样

code:代码:

import pandas as pd
import numpy as np
from imblearn.over_sampling import SMOTE# for inbalance dataset
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfTransformer,TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,classification_report,confusion_matrix
from sklearn.pipeline import Pipeline

df = pd.read_csv("data/Apple-Twitter-Sentiment-DFE.csv",encoding="ISO-8859-1")

df
# data is cleaned using preprocessing functions

# Solving inbalanced dataset using SMOTE 

vectorizer = TfidfVectorizer()
vect_df =vectorizer.fit_transform(df["clean_text"])
oversample = SMOTE(random_state = 42)
x_smote,y_smote = oversample.fit_resample(vect_df, df["sentiment"])
print("shape x before SMOTE: {}".format(vect_df.shape))
print("shape x after SMOTE: {}".format(x_smote.shape))
print("balance of targets feild %")
y_smote.value_counts(normalize = True)*100


# split the dataset into train and test 
x_train,x_test,y_train,y_test = train_test_split(x_smote,y_smote,test_size = 0.2,random_state =42)


logreg = Pipeline([
                ('tfidf', TfidfTransformer()),
                ('clf', LogisticRegression(n_jobs=1, C=1e5)),
               ])
logreg.fit(x_train, y_train)

y_pred = logreg.predict(x_test)

print('accuracy %s' % accuracy_score(y_pred, y_test))
print(classification_report(y_test, y_pred))

# Make prediction 
exl = "this book was so interstening it made me not happy"

logreg.predict(exl)

You should define your variable exl as the following:你应该定义你的变量exl如下:

exl = vectorizer.transform(["this book was so interstening it made me not happy"])

and then do the prediction.然后做预测。

First, put the testing data in a list and then use vectorizer to use features extracted from your training data to do the prediction.首先,将测试数据放入列表中,然后使用vectorizer使用从训练数据中提取的特征进行预测。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何修复 ValueError :无法将字符串转换为浮点数:在 Python 中 - how to fix ValueError :could not convert string to float: in Python 如何修复“ValueError:无法将字符串转换为浮点数:'East'”(Python) - How to fix "ValueError: could not convert string to float: 'East'" (Python) 如何使用tkinter修复Python中的“ ValueError:无法将字符串转换为float:” - How to fix “ValueError: could not convert string to float:” in Python with tkinter 如何修复 ValueError: could not convert string to float: in python - How to fix the ValueError: could not convert string to float: in python 如何修复此错误:ValueError:无法将字符串转换为浮点数:'A' - How to fix this error: ValueError: could not convert string to float: 'A' 如何修复“ValueError:无法将字符串转换为浮点数”? - How to fix “ValueError: could not convert string to float”? 如何修复“ValueError:无法将字符串转换为浮点数” - How to fix “ValueError: could not convert string to float” Python 错误:ValueError:无法将字符串转换为浮点数 - Python error: ValueError: could not convert string to float Python作业浮点错误:ValueError:无法将字符串转换为浮点数: - Python Homework Float Error: ValueError: could not convert string to float: Python(pyspark)错误= ValueError:无法将字符串转换为float:“ 17” - Python (pyspark) Error = ValueError: could not convert string to float: “17”
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM