[英]Mass table parsing in pandas html file
我正在尝试在 html 文件中搜索多个表(超过 100 个),如果单元格 [1,0] 中的值为“YYY”,则获取单元格 [0,0] 的值并将其写入单元格 D excel 文件中的“i”,其中“i”是根据条目数递增的数字。
import pandas as pd
import xlwt
from xlwt import Workbook
file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)
wb = Workbook()
sheet1 = wb.add_sheet("Sheet 1")
i=0
filtered_table = [df for df in table if len(df) > 2]
for df in table:
comp = df.iat[1,0]
if comp == 'YYY' :
name = df.iat[0,0]
print (name)
sheet1.write(4,i, name)
i=i+1
wb.save('MarSur.xlsx')
这是我得到的错误日志:
C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\python.exe "C:/Users/Ahmed_Abdelmuniem/PycharmProjects/Pandas Parser/main.py"
Traceback (most recent call last):
File "C:\Users\Ahmed_Abdelmuniem\PycharmProjects\Pandas Parser\main.py", line 19, in <module>
comp = df.iat[1,0]
File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexing.py", line 2103, in __getitem__
return self.obj._get_value(*key, takeable=self._takeable)
File "C:\Users\Ahmed_Abdelmuniem\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\frame.py", line 3127, in _get_value
return series._values[index]
IndexError: index 1 is out of bounds for axis 0 with size 1
Process finished with exit code 1
感谢 Mustafa Aydin 的帮助,
正确的代码如下:
import pandas as pd
import xlwt
from xlwt import Workbook
file = r'C:\Users\Ahmed_Abdelmuniem\Desktop\XXX.html'
table = pd.read_html(file)
wb = Workbook()
sheet1 = wb.add_sheet("Sheet 1")
i=0
filtered_table = [df for df in table if len(df) > 2]
for df in filtered_table:
comp = df.iat[1,0]
# print (comp)
if comp == 'YYY' :
name = df.iat[0,0]
print (name)
sheet1.write(4,i, name)
i=i+1
wb.save('MarSur.xlsx')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.