简体   繁体   中英

How can I fix a ValueError when training a model for sentiment analysis?

I am trying to train a model for logistic regression for a sentiment analysis. I get the following error when trying to standardize features and when trying to train the model:

I have posted the full traceback here

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18368/1468496602.py in <module>
----> 1 model = logistic_regression.fit(features, target)

~\anaconda3\anacondadownload\lib\site-packages\sklearn\linear_model\_logistic.py in fit(self, X, y, sample_weight)
   1342             _dtype = [np.float64, np.float32]
   1343 
-> 1344         X, y = self._validate_data(X, y, accept_sparse='csr', dtype=_dtype,
   1345                                    order="C",
   1346                                    accept_large_sparse=solver != 'liblinear')

~\anaconda3\anacondadownload\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    431                 y = check_array(y, **check_y_params)
    432             else:
--> 433                 X, y = check_X_y(X, y, **check_params)
    434             out = X, y
    435 

~\anaconda3\anacondadownload\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\anaconda3\anacondadownload\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    869         raise ValueError("y cannot be None")
    870 
--> 871     X = check_array(X, accept_sparse=accept_sparse,
    872                     accept_large_sparse=accept_large_sparse,
    873                     dtype=dtype, order=order, copy=copy,

~\anaconda3\anacondadownload\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~\anaconda3\anacondadownload\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    671                     array = array.astype(dtype, casting="unsafe", copy=False)
    672                 else:
--> 673                     array = np.asarray(array, order=order, dtype=dtype)
    674             except ComplexWarning as complex_warning:
    675                 raise ValueError("Complex data not supported\n"

~\anaconda3\anacondadownload\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

~\anaconda3\anacondadownload\lib\site-packages\pandas\core\series.py in __array__(self, dtype)
    855               dtype='datetime64[ns]')
    856         """
--> 857         return np.asarray(self._values, dtype)
    858 
    859     # ----------------------------------------------------------------------

~\anaconda3\anacondadownload\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

ValueError: could not convert string to float: 'clint eastwood return dirti harri calahan movi dirti harri seri clint older he still got harri told vacat troubl happen robberi memor make day catchphras come citi took vacat wors woman turn vigilant rape attack funfair start get punk one one last movi see sandra lock clint eastwood movi improv enforc bit comedi less seriou clint eastwood sunglass gargoyl best known sunglass worn arnold shwartzeneg termin worth watch like clint eastwood dirti harri film like action crime thriller'

​

I'm not sure how to fix this, if it needs to be deleted from the data? I have already done some text processing on this, like removing stop words, lower casing, removing punctuation.

I have not converted any of the values to floats

May I ask what you convert the string to float for? You can refer to the document for the usage of float().

As I know, they use word2vec to transfer the sentences to numerized sequences rather than float() in sentiment analysis. It would be nice if you can support more infomation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM