繁体   English   中英

pandas html 文件中的海量表解析

[英]Mass table parsing in pandas html file

我正在尝试在 html 文件中搜索多个表(超过 100 个),如果单元格 [1,0] 中的值为“YYY”,则获取单元格 [0,0] 的值并将其写入单元格 D excel 文件中的“i”,其中“i”是根据条目数递增的数字。

import pandas as pd
import xlwt
from xlwt import Workbook

file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)

wb = Workbook()

sheet1 = wb.add_sheet("Sheet 1")

i=0

filtered_table = [df for df in table if len(df) > 2]

for df in table:
    comp = df.iat[1,0]

    if comp == 'YYY' :
        name = df.iat[0,0]
        print (name)
        sheet1.write(4,i, name)
        i=i+1

wb.save('MarSur.xlsx')



这是我得到的错误日志:

C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe "C:/Users/Ahmed_Abdelmuniem/PycharmProjects/Pandas Parser/main.py"
Traceback (most recent call last):
  File "C:\Users\Ahmed_Abdelmuniem\PycharmProjects\Pandas Parser\main.py", line 19, in <module>
    comp = df.iat[1,0]
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexing.py", line 2103, in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
  File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py", line 3127, in _get_value
    return series._values[index]
IndexError: index 1 is out of bounds for axis 0 with size 1

Process finished with exit code 1

感谢 Mustafa Aydin 的帮助,

正确的代码如下:

import pandas as pd
import xlwt
from xlwt import Workbook

file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)

wb = Workbook()

sheet1 = wb.add_sheet("Sheet 1")

i=0
filtered_table = [df for df in table if len(df) > 2]

for df in filtered_table:
    comp = df.iat[1,0]
#    print (comp)
    if comp == 'YYY' :
        name = df.iat[0,0]
        print (name)
        sheet1.write(4,i, name)
        i=i+1

wb.save('MarSur.xlsx')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM