[英]Bug in pandas read_html method?
I think the read_html method of pandas is buggy when dealing with rowspan and/or colspan. 我认为,在处理rowpan和/或colspan时,pandas的read_html方法存在问题。
Example: 例:
html_table = io.StringIO(u'''<table>
<thead>
<tr>
<th rowspan="2">Time</th>
<th rowspan="2">Temp</th>
<th colspan="3">Cloud Cover</th>
</tr>
<tr>
<th>Low</th>
<th>Middle</th>
<th>High</th>
</tr>
</thead>
<tbody>
<tr>
<td>22:00</td>
<td>12C</td>
<td>Lorem</td>
<td>Ipsum</td>
<td>Dolor</td>
</tr>
</tbody>
</table>''')
The output of pd.read_html(html_table) is pd.read_html(html_table)的输出是
[ Time Temp Cloud Cover Low Middle High
0 2014-05-16 22:00:00 12C Lorem Ipsum Dolor NaN
[1 rows x 6 columns]]
Is it a bug or am I doing something wrong? 是错误还是我做错了什么?
pandas >= 0.24.0 understands colspan
and rowspan
attributes. pandas> = 0.24.0可以理解
colspan
和rowspan
属性。 As per the release notes : 根据发行说明 :
result = pd.read_html("""
<table>
<thead>
<tr>
<th>A</th><th>B</th><th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="2">1</td><td>2</td>
</tr>
</tbody>
</table>""")
result
Out: 日期:
[ A B C
0 1 1 2
Previously this would return the following: 以前,这将返回以下内容:
[ A B C
0 1 2 NaN]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.