pandas html 文件中的海量表解析

Question

我正在尝试在 html 文件中搜索多个表（超过 100 个），如果单元格 [1,0] 中的值为“YYY”，则获取单元格 [0,0] 的值并将其写入单元格 D excel 文件中的“i”，其中“i”是根据条目数递增的数字。

import pandas as pd
import xlwt
from xlwt import Workbook

file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)

wb = Workbook()

sheet1 = wb.add_sheet("Sheet 1")

i=0

filtered_table = [df for df in table if len(df) > 2]

for df in table:
    comp = df.iat[1,0]

    if comp == 'YYY' :
        name = df.iat[0,0]
        print (name)
        sheet1.write(4,i, name)
        i=i+1

wb.save('MarSur.xlsx')

这是我得到的错误日志：

C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe "C:/Users/Ahmed_Abdelmuniem/PycharmProjects/Pandas Parser/main.py"
Traceback (most recent call last):
  File "C:\Users\Ahmed_Abdelmuniem\PycharmProjects\Pandas Parser\main.py", line 19, in <module>
    comp = df.iat[1,0]
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexing.py", line 2103, in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py", line 3127, in _get_value
    return series._values[index]
IndexError: index 1 is out of bounds for axis 0 with size 1

Process finished with exit code 1

Answer 1

感谢 Mustafa Aydin 的帮助，

正确的代码如下：

import pandas as pd
import xlwt
from xlwt import Workbook

file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)

wb = Workbook()

sheet1 = wb.add_sheet("Sheet 1")

i=0
filtered_table = [df for df in table if len(df) > 2]

for df in filtered_table:
    comp = df.iat[1,0]
#    print (comp)
    if comp == 'YYY' :
        name = df.iat[0,0]
        print (name)
        sheet1.write(4,i, name)
        i=i+1

wb.save('MarSur.xlsx')

pandas html 文件中的海量表解析

问题描述

1 个解决方案

解决方案1
0 2021-04-12 10:55:40

pandas html 文件中的海量表解析

问题描述

1 个解决方案

解决方案1 0 2021-04-12 10:55:40

解决方案1
0 2021-04-12 10:55:40