[英]How to extract multiple table from HTML in Python
I want to extract all data of security bulletin table from html https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html .我想从 html https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.ZFC35FDC70D5FC69D2698Z83A8中提取安全公告表的所有数据Based on my code, I only can extract the data in the table one by one.根据我的代码,我只能将表中的数据一一提取出来。 The code cannot extract the overall data from the table.该代码无法从表中提取整体数据。
This is my code这是我的代码
soup = BeautifulSoup(html_content, "lxml")
print(soup.prettify())
gdp = soup.find_all("table")
table = gdp[0]
body = table.find_all("tr")
head = body[0]
body_rows = body[1:]
headings = []
for item in head.find_all("td"):
item = (item.text).rstrip("\n")
headings.append(item)
all_rows = [] # will be a list for list for all rows
for row_num in range(len(body_rows)): # A row at a time
row = [] # this will old entries for one row
for row_item in body_rows[row_num].find_all("td"):
aa = re.sub("(\xa0)|(\n)|,","",row_item.text)
row.append(aa)
all_rows.append(row)
df = pd.DataFrame(data=all_rows,columns=headings)
df.head()
df = pd.DataFrame(data=all_rows,columns=headings)
df.to_csv('C:/Users//AdobeAir-APSB16-23 Security Update Available for Adobe AIR.csv')
df.head()
The output of the code is代码的output是
Bulletin ID Date Published Priority
0 APSB21-13 February 09 2021 3
For this code, I imported library such as Beautifulsoup, requests, pandas and re.对于此代码,我导入了诸如 Beautifulsoup、requests、pandas 和 re 之类的库。 Hope anyone can help me on how to extract the data in the table all at once and can be converted into csv format.希望任何人都可以帮助我如何一次提取表中的数据并可以转换为 csv 格式。 Thank you.谢谢你。
You can make pandas
do the heavy-lifting for you with read_html
:您可以使用 read_html 让pandas
为您完成read_html
的工作:
url = 'https://helpx.adobe.com/security/products/dreamweaver/apsb21-13.html'
dfs = pd.read_html(url, header=0)
dfs[1]
Output: Output:
Product Affected Versions Platform
0 Adobe Dreamweaver 20.2 Windows and macOS
1 Adobe Dreamweaver 21.0 Windows and macOS
PS It outputs a list of all tables found in the HTML. PS 它输出在 HTML 中找到的所有表的列表。 For example, dfs[0]
is the first table:例如, dfs[0]
是第一个表:
Bulletin ID Date Published Priority
0 APSB21-13 February 09, 2021 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.