[英]TypeError: Empty 'DataFrame': no numeric data to plot
I'm new to coding and am still learning. 我是编码新手,仍然在学习。 That being said, I've been following a tutorial on how to do data analysis from twitter API: http://adilmoujahid.com/posts/2014/07/twitter-analytics/
话虽如此,我一直在遵循有关如何从Twitter API进行数据分析的教程: http : //adilmoujahid.com/posts/2014/07/twitter-analytics/
I believe he's using python 2.7 while I am using python 3.6.1 so I have converted the code to the python version I am using and so far it has worked until I got to the top 5 countries graph. 我相信他在使用python 3.6.1时正在使用python 2.7,因此我已将代码转换为我正在使用的python版本,到目前为止,它一直有效,直到进入前5个国家/地区图表为止。 Specifically, when I try to run the code for the top 5 countries which worked two days ago only once, now I only get the following error message:
具体来说,当我尝试为前两天只工作过的前5个国家/地区运行代码时,现在我只收到以下错误消息:
"---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-47-601663476327> in <module>()
7 ax.set_ylabel('Number of tweets' , fontsize=15)
8 ax.set_title('Top 5 countries', fontsize=15, fontweight='bold')
----> 9 tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue')
10 plt.show()
~/Environments/Environments/my_env/lib/python3.6/site- packages/pandas/plotting/_core.py in __call__(self, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
2441 colormap=colormap, table=table, yerr=yerr,
2442 xerr=xerr, label=label, secondary_y=secondary_y,
-> 2443 **kwds)
2444 __call__.__doc__ = plot_series.__doc__
2445
~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
1882 yerr=yerr, xerr=xerr,
1883 label=label, secondary_y=secondary_y,
-> 1884 **kwds)
1885
1886
~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in _plot(data, x, y, subplots, ax, kind, **kwds)
1682 plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
1683
-> 1684 plot_obj.generate()
1685 plot_obj.draw()
1686 return plot_obj.result
~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in generate(self)
236 def generate(self):
237 self._args_adjust()
--> 238 self._compute_plot_data()
239 self._setup_subplots()
240 self._make_plot()
~/Environments/Environments/my_env/lib/python3.6/site-packages/pandas/plotting/_core.py in _compute_plot_data(self)
345 if is_empty:
346 raise TypeError('Empty {0!r}: no numeric data to '
--> 347 'plot'.format(numeric_data.__class__.__name__))
348
349 self.data = numeric_data
TypeError: Empty 'DataFrame': no numeric data to plot"
Has anyone else encountered this and/or what's the best solution? 还有其他人遇到过这个问题和/或什么是最佳解决方案? I can't figure out how to fix this.
我不知道如何解决此问题。 Thank you!
谢谢!
Entire Code (to date) 整个代码(迄今为止)
import json
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
tweets_data_path = '...twitter_data.txt'
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
print (len (tweets_data))
tweets = pd.DataFrame()
tweets['text'] = list(map(lambda tweet: tweet['text'], tweets_data))
tweets['lang'] = list(map(lambda tweet: tweet['lang'], tweets_data))
tweets['country'] = list(map(lambda tweet: tweet['place']['country'] if tweet['place'] != None else None, tweets_data))
tweets_by_lang = tweets['lang'].value_counts()
fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Languages', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 languages', fontsize=15, fontweight='bold')
tweets_by_lang[:5].plot(ax=ax, kind='bar', color='red')
plt.show()
tweets_by_country = tweets['country'].value_counts()
fig, ax = plt.subplots()
ax.tick_params(axis='x', labelsize=15)
ax.tick_params(axis='y', labelsize=10)
ax.set_xlabel('Countries', fontsize=15)
ax.set_ylabel('Number of tweets' , fontsize=15)
ax.set_title('Top 5 countries', fontsize=15, fontweight='bold')
tweets_by_country[:5].plot(ax=ax, kind='bar', color='blue')
plt.show()
Is your data actually numeric? 您的数据实际上是数字的吗? You can check using, for example,
您可以使用以下方法进行检查:
print(type(tweets['country'][0]))
Given that you're using json.loads
(deserializing from string) it's very likely NOT numeric, which is what the error might be referring to. 假设您使用的是
json.loads
(从字符串反序列化),则很可能不是数字形式,这就是错误可能指的是。 Try to convert the data type to float (or whatever): 尝试将数据类型转换为浮点型(或其他类型):
tweets = tweets.astype('float')
and see if that solves the problem. 看看是否能解决问题。 You can also apply this function just to specific columns if you want.
如果需要,您也可以将此功能仅应用于特定的列。 Good luck!
祝好运!
I think your file isn't present or there is a path issue. 我认为您的文件不存在或存在路径问题。 The first two steps http://adilmoujahid.com/posts/2014/07/twitter-analytics/ retrieves the file and saves it locally.
前两个步骤http://adilmoujahid.com/posts/2014/07/twitter-analytics/检索文件并将其保存在本地。 Is the file present in the specified path ?
指定的路径中是否存在文件?
tweets_data_path = '...twitter_data.txt'
What does the following return ? 以下返回什么?
print (len (tweets_data))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.