[英]Python parse table from HTML using BeautifulSoup
I am trying to get the tables from multiple html files. 我正在尝试从多个HTML文件获取表格。 Ideally, I have the rows and columns in a list, so I can process it further.
理想情况下,我在列表中有行和列,因此可以对其进行进一步处理。 I am new to BeautifulSoup, but I cannot get it working.
我是BeautifulSoup的新手,但无法正常工作。 I think the main problem occurs when the function returns None, so it cannot be processed further.
我认为主要问题是在函数返回None时发生的,因此无法进一步处理。 I tried if statements but this did not help.
我尝试了if语句,但这无济于事。 My code as it is right now:
我现在的代码:
from bs4 import BeautifulSoup
table_dict = {}
for filename, text in tqdm(lowercase_dict.items()):
soup = BeautifulSoup(text, "lxml")
table = soup.find('table')
table_body = table.find('tbody')
if table_body is not None:
tables = table_body
rows = tables.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
table_dict[filename] = cols
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-304-14ade2e7b2ac> in <module>()
7 tables = table_body
8
----> 9 rows = tables.find_all('tr')
10 for row in rows:
11 cols = row.find_all('td')
AttributeError: 'str' object has no attribute 'find_all'
```
According to your error message, the problem is that the variable tables is a string. 根据您的错误消息,问题在于变量表是一个字符串。 Try it without using 'tbody'.
不使用“ tbody”即可尝试。
for filename, text in tqdm(lowercase_dict.items()):
soup = BeautifulSoup(text, "lxml")
table = soup.find('table')
rows = table.find_all('tr')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.