[英]How can I fix a ValueError when training a model for sentiment analysis?
I am trying to train a model for logistic regression for a sentiment analysis.我正在尝试为情绪分析训练逻辑回归模型。 I get the following error when trying to standardize features and when trying to train the model:
尝试标准化功能和尝试训练模型时出现以下错误:
I have posted the full traceback here我在这里发布了完整的追溯
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18368/1468496602.py in <module>
----> 1 model = logistic_regression.fit(features, target)
~\anaconda3\anacondadownload\lib\site-packages\sklearn\linear_model\_logistic.py in fit(self, X, y, sample_weight)
1342 _dtype = [np.float64, np.float32]
1343
-> 1344 X, y = self._validate_data(X, y, accept_sparse='csr', dtype=_dtype,
1345 order="C",
1346 accept_large_sparse=solver != 'liblinear')
~\anaconda3\anacondadownload\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
431 y = check_array(y, **check_y_params)
432 else:
--> 433 X, y = check_X_y(X, y, **check_params)
434 out = X, y
435
~\anaconda3\anacondadownload\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\anacondadownload\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
869 raise ValueError("y cannot be None")
870
--> 871 X = check_array(X, accept_sparse=accept_sparse,
872 accept_large_sparse=accept_large_sparse,
873 dtype=dtype, order=order, copy=copy,
~\anaconda3\anacondadownload\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\anacondadownload\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
671 array = array.astype(dtype, casting="unsafe", copy=False)
672 else:
--> 673 array = np.asarray(array, order=order, dtype=dtype)
674 except ComplexWarning as complex_warning:
675 raise ValueError("Complex data not supported\n"
~\anaconda3\anacondadownload\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order, like)
100 return _asarray_with_like(a, dtype=dtype, order=order, like=like)
101
--> 102 return array(a, dtype, copy=False, order=order)
103
104
~\anaconda3\anacondadownload\lib\site-packages\pandas\core\series.py in __array__(self, dtype)
855 dtype='datetime64[ns]')
856 """
--> 857 return np.asarray(self._values, dtype)
858
859 # ----------------------------------------------------------------------
~\anaconda3\anacondadownload\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order, like)
100 return _asarray_with_like(a, dtype=dtype, order=order, like=like)
101
--> 102 return array(a, dtype, copy=False, order=order)
103
104
ValueError: could not convert string to float: 'clint eastwood return dirti harri calahan movi dirti harri seri clint older he still got harri told vacat troubl happen robberi memor make day catchphras come citi took vacat wors woman turn vigilant rape attack funfair start get punk one one last movi see sandra lock clint eastwood movi improv enforc bit comedi less seriou clint eastwood sunglass gargoyl best known sunglass worn arnold shwartzeneg termin worth watch like clint eastwood dirti harri film like action crime thriller'
I'm not sure how to fix this, if it needs to be deleted from the data?如果需要从数据中删除,我不确定如何解决这个问题? I have already done some text processing on this, like removing stop words, lower casing, removing punctuation.
我已经对此进行了一些文本处理,例如删除停用词、小写字母、删除标点符号。
I have not converted any of the values to floats我没有将任何值转换为浮点数
May I ask what you convert the string to float for?请问您将字符串转换为浮点数是为了什么? You can refer to the document for the usage of float().
float()的用法可以参考文档。
As I know, they use word2vec to transfer the sentences to numerized sequences rather than float() in sentiment analysis.据我所知,他们在情感分析中使用 word2vec 将句子转换为数字序列,而不是 float()。 It would be nice if you can support more infomation.
如果您可以支持更多信息,那就太好了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.