简体   繁体   English

重新安装sklearn后出现错误

[英]Error after re-installing sklearn

I get the following error once i updated sklearn to a newer version - i don't know why this is . 将sklearn更新为较新版本后,出现以下错误-我不知道为什么。

    Traceback (most recent call last):
    File "/Users/X/Courses/Project/SupportVectorMachine/main.py", line 95, in <module>
y, x = dmatrices(formula, data=finalDataFrame, return_type='matrix')
    File "/Library/Python/2.7/site-packages/patsy/highlevel.py", line 297, in dmatrices
NA_action, return_type)
    File "/Library/Python/2.7/site-packages/patsy/highlevel.py", line 156, in _do_highlevel_design
return_type=return_type)
    File "/Library/Python/2.7/site-packages/patsy/build.py", line 947, in build_design_matrices
value, is_NA = evaluator.eval(data, NA_action)
   File "/Library/Python/2.7/site-packages/patsy/build.py", line 85, in eval
return result, NA_action.is_numerical_NA(result)
   File "/Library/Python/2.7/site-packages/patsy/missing.py", line 135, in is_numerical_NA
mask |= np.isnan(arr)
   TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

This is the code corresponding to this. 这是与此相对应的代码。 I have reinstalled and installed everything from Numpy to scipy patsy etc. But nothing works. 我已经重新安装并安装了从Numpy到scipy patsy等的所有设备。但是没有任何效果。

 # Merging the two dataframes - user and the tweets
 finalDataFrame =  pandas.merge(twitterDataFrame.reset_index(),twitterUserDataFrame.reset_index(),on=['UserID'],how='inner')
 finalDataFrame = finalDataFrame.drop_duplicates()
 finalDataFrame['FrequencyOfTweets'] = numpy.all(numpy.isfinite(finalDataFrame['FrequencyOfTweets']))

 # model formula, ~ means = and C() lets the classifier know its categorical data.
  formula = 'Classifier ~ InReplyToStatusID + InReplyToUserID + RetweetCount + FavouriteCount + Hashtags + UserMentionID + URL + MediaURL + C(MediaType) + UserMentionID + C(PossiblySensitive) + C(Language) + TweetLength + Location + Description + UserAccountURL + Protected + FollowersCount + FriendsCount + ListedCount + UserAccountCreatedAt + FavouritesCount + GeoEnabled + StatusesCount + ProfileBackgroundImageURL + ProfileUseBackgroundImage + DefaultProfile + FrequencyOfTweets'

  ### create a regression friendly data frame y gives the classifiers, x gives the features and gives different columns for Categorical data depending on variables. 
 y, x = dmatrices(formula, data=finalDataFrame, return_type='matrix')

 ## select which features we would like to analyze
 X = numpy.asarray(x)

I've found that error to crop up sometimes when calling np.isnan on an array that contains strings or other non-float values. 我发现在包含字符串或其他非浮点值的数组上调用np.isnan时,有时会出现该错误。 Try casting your np.arrays using arr.astype(float) before passing them in to dmatrices. 在将它们传递给dmatrices之前,尝试使用arr.astype(float)转换np.arrays。

Also, your frequency of tweets column is being set to all False or all True, since np.all returns a scalar. 此外,由于np.all返回标量,因此您的tweets频率列被设置为False或True。

After a lot of looking through code etc the problem was the formula I was passing wanted the program to use all the features below. 经过大量查看代码等之后,问题出在我传递的公式中,希望该程序使用以下所有功能。 Here the 'UserAccountCreatedAt'column was of type datetime[ns]. 这里的“ UserAccountCreatedAt”列的类型为datetime [ns]。 I have currently taken this off the formula and have no errors however, I would like to know how best to convert this to numeric data in order to actually pass it through. 我目前已将其从公式中删除,并且没有错误,但是,我想知道如何最好地将其转换为数字数据,以使其真正通过。 This is because categorical data is handled by C in front of some of the columns as seen below and datetime is considered numeric in patsy. 这是因为类别数据由C在某些列的前面处理(如下所示),并且datetime在patsy中被视为数字。

  formula = 'Classifier ~ UserAccountCreatedAt + InReplyToStatusID + InReplyToUserID + RetweetCount + FavouriteCount + Hashtags + UserMentionID + URL + MediaURL + C(MediaType) + UserMentionID + C(PossiblySensitive) + C(Language) + TweetLength + Location + Description + UserAccountURL + Protected + FollowersCount + FriendsCount + ListedCount + FavouritesCount + GeoEnabled + StatusesCount + ProfileBackgroundImageURL + ProfileUseBackgroundImage + DefaultProfile + FrequencyOfTweets'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 重新安装 python 后,Django 项目找不到 python 解释器 - python interpreter not found for Django project after re-installing python matplotlib,重新安装后每条文字都是粗体 - matplotlib, every text is bold after re-installing ModuleNotFoundError:即使在安装和重新安装后也没有名为“bs4”的模块 - ModuleNotFoundError: No module named 'bs4' even after installing and re-installing 重新安装适用于Python 2.7的GDAL库 - re-installing GDAL library for Python 2.7 删除python,然后在Mac OSX上重新安装 - Removing python and then re-installing on Mac OSX Python 请求 RequestsDependencyWarning 重新安装相同版本的库后消失 - Python requests RequestsDependencyWarning which disappears after re-installing the same version of the library 修改后未通过 setup.py 重新安装 Cython 扩展 - Cython extension not re-installing via setup.py after modification 有什么方法可以不每次都在Google Colaborator上重新安装软件包? - Is there any way to not re-installing packages each time on Google Colaborator? 遇到502重新安装django-haystack - Encountering 502 re-installing django-haystack 安装sklearn后没有名为“ sklearn”的模块 - No module named 'sklearn' after installing sklearn
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM