[英]Linearizing a multi-row multi-column table
I have a multi-row multi-column table as follows:我有一个多行多列表如下:
<!doctype html> <html> <head><style> table { border-collapse: collapse; } th, td { word-wrap: break-word; max-width: 100%; font-family: "Trebuchet MS", Arial, Helvetica, sans-serif; border-bottom: 1px solid #ddd; padding: 5px; text-align: left; } tr:hover {background: #f4f4f4;} tr:hover .highlighted {background: repeating-linear-gradient( 45deg, #ffff99, #ffff99 10px, #f4f4f4 10px, #f4f4f4 20px );} .highlighted { background-color: #ffff99; } </style></head><body><table> <tr> <th colspan=1 rowspan=1 > Col1 </th><th colspan=1 rowspan=1 > Col2 </th><th colspan=1 rowspan=1 > Col3 </th><th colspan=1 rowspan=1 > Col4 </th></tr> <tr> <td colspan=1 rowspan=3 > Year / Ending Year </td><td colspan=1 rowspan=2 > Show </td><td colspan=1 rowspan=1 > BB </td><td colspan=1 rowspan=1 > 2013 </td></tr> <tr> <td colspan=1 rowspan=1 > GOT </td><td colspan=1 rowspan=2 > 2019 </td></tr> <tr> <td colspan=2 rowspan=1 style="text-align:center;"> Joker </td></tr>
The table data is a list of lists containing the cell values from top to bottom and left to right with indicated row and column spans.表格数据是一个列表列表,其中包含从上到下和从左到右的单元格值,并带有指示的行和列跨度。 For multi-row cells, the value appears only in the first corresponding row instance in the list.
对于多行单元格,该值仅出现在列表中的第一个对应行实例中。 The data looks like this:
数据如下所示:
table =
[
[
{'value': 'Col1', 'column_span': 1, 'row_span': 1, 'is_header': True},
{'value': 'Col2', 'column_span': 1, 'row_span': 1, 'is_header': True},
{'value': 'Col3', 'column_span': 1, 'row_span': 1, 'is_header': True},
{'value': 'Col4', 'column_span': 1, 'row_span': 1, 'is_header': True}
],
[
{'value': 'Year / Ending Year', 'column_span': 1, 'row_span': 3, 'is_header': False},
{'value': 'Show', 'column_span': 1, 'row_span': 2, 'is_header': False},
{'value': 'BB', 'column_span': 1, 'row_span': 1, 'is_header': False},
{'value': '2013', 'column_span': 1, 'row_span': 1, 'is_header': False}
],
[
{'value': 'GOT', 'column_span': 1, 'row_span': 1, 'is_header': False},
{'value': '2019', 'column_span': 1, 'row_span': 2, 'is_header': False},
],
[
{'value': 'Joker', 'column_span': 2, 'row_span': 1, 'is_header': False}
]
]
How do I convert this to a pandas dataframe with cells spanning only single rows and columns like this:如何将其转换为单元格仅跨越单行和列的 pandas 数据框,如下所示:
<!doctype html> <html> <head><style> table { border-collapse: collapse; } th, td { word-wrap: break-word; max-width: 100%; font-family: "Trebuchet MS", Arial, Helvetica, sans-serif; border-bottom: 1px solid #ddd; padding: 5px; text-align: left; } tr:hover {background: #f4f4f4;} tr:hover .highlighted {background: repeating-linear-gradient( 45deg, #ffff99, #ffff99 10px, #f4f4f4 10px, #f4f4f4 20px );} .highlighted { background-color: #ffff99; } </style></head><body><table> <tr> <th colspan=1 rowspan=1 > Col1 </th><th colspan=1 rowspan=1 > Col2 </th><th colspan=1 rowspan=1 > Col3 </th><th colspan=1 rowspan=1 > Col4 </th></tr> <tr> <td colspan=1 rowspan=1 > Year / Ending Year </td><td colspan=1 rowspan=1 > Show </td><td colspan=1 rowspan=1 > BB </td><td colspan=1 rowspan=1 > 2013 </td></tr> <tr> <td colspan=1 rowspan=1 > Year / Ending Year </td><td colspan=1 rowspan=1 > Show </td><td colspan=1 rowspan=1 > GOT </td><td colspan=1 rowspan=1 > 2019 </td></tr> <tr> <td colspan=1 rowspan=1 > Year / Ending Year </td><td colspan=1 rowspan=1 > Joker </td><td colspan=1 rowspan=1 > Joker </td><td colspan=1 rowspan=1 > 2019 </td></tr> </table> </body></html>
Edit: I don't have the html of the tables.编辑:我没有表格的 html。 I was not able to attach an image here so I showed the tables as html.
我无法在此处附加图像,因此我将表格显示为 html。
Just use read_html from pandas.只需使用 pandas 中的 read_html 即可。 I put your html between ''' ''' and it worked.
我把你的 html 放在 ''' ''' 之间,它起作用了。
import pandas as pd
pd.read_html(your_html)
>[ Col1 Col2 Col3 Col4
0 Year / Ending Year Show BB 2013
1 Year / Ending Year Show GOT 2019
2 Year / Ending Year Joker Joker 2019]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.