[英]Pandas read_html() missing columns
我正在使用以下read_html()调用来读取表(在付费墙后面):
df = pd.read_html('http://markets.ft.com/data/equities/tearsheet/' +
'financials?s=BAG:LSE&subView=BalanceSheet&periodType=a')[0]
它解析得很好,除了缺少最后两列。 我正在使用Anaconda的最新版本(Python 3.5,pandas 0.18.1,html5lib,BeautifulSoup4)。
输出的开始看起来像这样:
Fiscal data as of Jan 30 2016 2016 2015 2014
ASSETS NaN NaN NaN
Cash And Short Term Investments 6.80 25 13
Total Receivables, Net 50 49 45
Total Inventory 16 17 16
(太大,无法全部显示)
HTML的开始看起来像这样:
<table class="mod-ui-table">
<thead>
<tr>
<th class="mod-ui-table__header--text">Fiscal data as of Jan 30 2016</th>
<th>2016</th>
<th class="mod-ui-hide-xsmall">2015</th>
<th class="mod-ui-hide-xsmall">2014</th>
<th class="mod-ui-hide-xsmall">2013</th>
<th class="mod-ui-hide-xsmall">2012</th>
</tr>
</thead>
<tr class="mod-ui-table__row--section-header">
<th colspan="6">ASSETS</th>
</tr>
<tr class="mod-ui-table__row--striped">
<th class="mod-ui-table__header--row-label">Cash And Short Term Investments</th>
<td>6.80</td>
<td class="mod-ui-hide-xsmall">25</td>
<td class="mod-ui-hide-xsmall">13</td>
<td class="mod-ui-hide-xsmall">0.91</td>
<td class="mod-ui-hide-xsmall">8.29</td>
</tr>
<tr>
<th class="mod-ui-table__header--row-label">Total Receivables, Net</th>
<td>50</td>
<td class="mod-ui-hide-xsmall">49</td>
<td class="mod-ui-hide-xsmall">45</td>
<td class="mod-ui-hide-xsmall">42</td>
<td class="mod-ui-hide-xsmall">37</td>
</tr>
HTML的结尾如下所示:
<tr class="mod-ui-table__row--highlight">
<th class="mod-ui-table__header--row-label">Total liabilities & shareholders' equity</th>
<td>269</td>
<td class="mod-ui-hide-xsmall">255</td>
<td class="mod-ui-hide-xsmall">227</td>
<td class="mod-ui-hide-xsmall">215</td>
<td class="mod-ui-hide-xsmall">196</td>
</tr>
<tr class="mod-ui-table__row--striped">
<th class="mod-ui-table__header--row-label">Total common shares outstanding</th>
<td>117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
<td class="mod-ui-hide-xsmall">117</td>
</tr>
<tr>
<th class="mod-ui-table__header--row-label">Treasury shares - common primary issue</th>
<td>0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">0</td>
<td class="mod-ui-hide-xsmall">--</td>
</tr>
</table>
如果不是很明显什么地方出了问题,我将不胜感激关于如何开始逐步阅读read_html()代码以查找问题根源的一些提示。 我现在是Python / pdb的新手。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.