繁体   English   中英

熊猫read_html()缺少列

[英]Pandas read_html() missing columns

我正在使用以下read_html()调用来读取表(在付费墙后面):

df = pd.read_html('http://markets.ft.com/data/equities/tearsheet/' + 
              'financials?s=BAG:LSE&subView=BalanceSheet&periodType=a')[0]

它解析得很好,除了缺少最后两列。 我正在使用Anaconda的最新版本(Python 3.5,pandas 0.18.1,html5lib,BeautifulSoup4)。

输出的开始看起来像这样:

                Fiscal data as of Jan 30 2016  2016    2015    2014
                                      ASSETS   NaN     NaN     NaN
             Cash And Short Term Investments  6.80      25      13
                      Total Receivables, Net    50      49      45
                             Total Inventory    16      17      16

(太大,无法全部显示)

HTML的开始看起来像这样:

<table class="mod-ui-table">
            <thead>
                <tr>
                    <th class="mod-ui-table__header--text">Fiscal data as of Jan 30 2016</th>
                    <th>2016</th>
                    <th class="mod-ui-hide-xsmall">2015</th>
                    <th class="mod-ui-hide-xsmall">2014</th>
                    <th class="mod-ui-hide-xsmall">2013</th>
                    <th class="mod-ui-hide-xsmall">2012</th>
                </tr>
            </thead>
            <tr class="mod-ui-table__row--section-header">
                <th colspan="6">ASSETS</th>
            </tr>
            <tr class="mod-ui-table__row--striped">
                <th class="mod-ui-table__header--row-label">Cash And Short Term Investments</th>
                <td>6.80</td>
                <td class="mod-ui-hide-xsmall">25</td>
                <td class="mod-ui-hide-xsmall">13</td>
                <td class="mod-ui-hide-xsmall">0.91</td>
                <td class="mod-ui-hide-xsmall">8.29</td>
            </tr>
            <tr>
                <th class="mod-ui-table__header--row-label">Total Receivables, Net</th>
                <td>50</td>
                <td class="mod-ui-hide-xsmall">49</td>
                <td class="mod-ui-hide-xsmall">45</td>
                <td class="mod-ui-hide-xsmall">42</td>
                <td class="mod-ui-hide-xsmall">37</td>
            </tr>

HTML的结尾如下所示:

<tr class="mod-ui-table__row--highlight">
                    <th class="mod-ui-table__header--row-label">Total liabilities &amp; shareholders&#39; equity</th>
                    <td>269</td>
                    <td class="mod-ui-hide-xsmall">255</td>
                    <td class="mod-ui-hide-xsmall">227</td>
                    <td class="mod-ui-hide-xsmall">215</td>
                    <td class="mod-ui-hide-xsmall">196</td>
                </tr>
                <tr class="mod-ui-table__row--striped">
                    <th class="mod-ui-table__header--row-label">Total common shares outstanding</th>
                    <td>117</td>
                    <td class="mod-ui-hide-xsmall">117</td>
                    <td class="mod-ui-hide-xsmall">117</td>
                    <td class="mod-ui-hide-xsmall">117</td>
                    <td class="mod-ui-hide-xsmall">117</td>
                </tr>
                <tr>
                    <th class="mod-ui-table__header--row-label">Treasury shares - common primary issue</th>
                    <td>0</td>
                    <td class="mod-ui-hide-xsmall">0</td>
                    <td class="mod-ui-hide-xsmall">0</td>
                    <td class="mod-ui-hide-xsmall">0</td>
                    <td class="mod-ui-hide-xsmall">--</td>
                </tr>
            </table>

如果不是很明显什么地方出了问题,我将不胜感激关于如何开始逐步阅读read_html()代码以查找问题根源的一些提示。 我现在是Python / pdb的新手。

事实证明,如果您未登录FT网站,则只能获得三年的数据。

因此,我现在着手研究如何登录FT网站(也许使用Twill)。

还有一个相关的问题在这里

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM