[英]Pandas read_html() missing columns
I am using following read_html() call to read a table (behind a paywall): 我正在使用以下read_html()调用来读取表(在付费墙后面):
df = pd.read_html('http://markets.ft.com/data/equities/tearsheet/' +
'financials?s=BAG:LSE&subView=BalanceSheet&periodType=a')[0]
It parses fine, other than that it is missing the last two columns. 它解析得很好,除了缺少最后两列。 I am using a recent version of Anaconda (Python 3.5, pandas 0.18.1, html5lib, BeautifulSoup4). 我正在使用Anaconda的最新版本(Python 3.5,pandas 0.18.1,html5lib,BeautifulSoup4)。
The start of the output looks like this: 输出的开始看起来像这样:
Fiscal data as of Jan 30 2016 2016 2015 2014
ASSETS NaN NaN NaN
Cash And Short Term Investments 6.80 25 13
Total Receivables, Net 50 49 45
Total Inventory 16 17 16
(too large to display it all) (太大,无法全部显示)
The start of the HTML looks like this: HTML的开始看起来像这样:
<table class="mod-ui-table">
<thead>
<tr>
<th class="mod-ui-table__header--text">Fiscal data as of Jan 30 2016</th>
<th>2016</th>
<th class="mod-ui-hide-xsmall">2015</th>
<th class="mod-ui-hide-xsmall">2014</th>
<th class="mod-ui-hide-xsmall">2013</th>
<th class="mod-ui-hide-xsmall">2012</th>
</tr>
</thead>
<tr class="mod-ui-table__row--section-header">
<th colspan="6">ASSETS</th>
</tr>
<tr class="mod-ui-table__row--striped">
<th class="mod-ui-table__header--row-label">Cash And Short Term Investments</th>
<td>6.80</td>
<td class="mod-ui-hide-xsmall">25</td>
<td class="mod-ui-hide-xsmall">13</td>
<td class="mod-ui-hide-xsmall">0.91</td>
<td class="mod-ui-hide-xsmall">8.29</td>
</tr>
<tr>
<th class="mod-ui-table__header--row-label">Total Receivables, Net</th>
<td>50</td>
<td class="mod-ui-hide-xsmall">49</td>
<td class="mod-ui-hide-xsmall">45</td>
<td class="mod-ui-hide-xsmall">42</td>
<td class="mod-ui-hide-xsmall">37</td>
</tr>
The end of the HTML looks like this: HTML的结尾如下所示:
<tr class="mod-ui-table__row--highlight">
<th class="mod-ui-table__header--row-label">Total liabilities & shareholders' equity</th>
<td>269</td>
<td class="mod-ui-hide-xsmall">255</td>
<td class="mod-ui-hide-xsmall">227</td>
<td class="mod-ui-hide-xsmall">215</td>
<td class="mod-ui-hide-xsmall">196</td>
</tr>
<tr class="mod-ui-table__row--striped">
<th class="mod-ui-table__header--row-label">Total common shares outstanding</th>
<td>117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
</tr>
<tr>
<th class="mod-ui-table__header--row-label">Treasury shares - common primary issue</th>
<td>0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">--</td>
</tr>
</table>
If it's not immediately obvious what might be wrong, I'd be grateful for some hints on how to start stepping through the read_html() code to find the source of the problem. 如果不是很明显什么地方出了问题,我将不胜感激关于如何开始逐步阅读read_html()代码以查找问题根源的一些提示。 I am pretty novice at Python/pdb at the moment. 我现在是Python / pdb的新手。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.