熊猫read_html方法中的错误？

Question

I think the read_html method of pandas is buggy when dealing with rowspan and/or colspan. 我认为，在处理rowpan和/或colspan时，pandas的read_html方法存在问题。

Example: 例：

html_table = io.StringIO(u'''<table>
    <thead>
        <tr>
            <th rowspan="2">Time</th>
            <th rowspan="2">Temp</th>
            <th colspan="3">Cloud Cover</th>
        </tr>
        <tr>
            <th>Low</th>
            <th>Middle</th>
            <th>High</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>22:00</td>
            <td>12C</td>
            <td>Lorem</td>
            <td>Ipsum</td>
            <td>Dolor</td>
        </tr>
    </tbody>
</table>''')

The output of pd.read_html(html_table) is pd.read_html（html_table）的输出是

[                 Time Temp Cloud Cover    Low Middle  High
 0 2014-05-16 22:00:00  12C       Lorem  Ipsum  Dolor   NaN

 [1 rows x 6 columns]]

Is it a bug or am I doing something wrong? 是错误还是我做错了什么？

Answer 1

pandas >= 0.24.0 understands colspan and rowspan attributes. pandas> = 0.24.0可以理解colspan和rowspan属性。 As per the release notes : 根据发行说明：

result = pd.read_html("""
    <table>
      <thead>
        <tr>
          <th>A</th><th>B</th><th>C</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td colspan="2">1</td><td>2</td>
        </tr>
      </tbody>
    </table>""")

result

Out: 日期：

[   A  B  C
 0  1  1  2

Previously this would return the following: 以前，这将返回以下内容：

[   A  B   C
 0  1  2 NaN]

熊猫read_html方法中的错误？

问题描述

1 个解决方案

解决方案1
0 2019-03-05 18:55:17

熊猫read_html方法中的错误？

问题描述

1 个解决方案

解决方案1 0 2019-03-05 18:55:17

解决方案1
0 2019-03-05 18:55:17