简体   繁体   English

使用beautifulsoup直接抓取HTML表格?

[英]Directly Scraping HTML table using beautifulsoup?

Is there any direct way to scrape HTML table?有没有直接的方法来抓取 HTML 表格? It would be great if we give the class of HTML table and it provides the results?如果我们给出 HTML table 的类并提供结果,那会很棒吗?

For example, I need to get table for this URL例如,我需要获取此URL 的

I can use this procedure but I need a clean or direct solution我可以使用这个程序,但我需要一个干净或直接的解决方案

Well, then try this:好吧,那么试试这个:

import requests
import pandas as pd

url = "https://buchholz-stadtwerke.de/wasseranalyse.html"

df = pd.read_html(requests.get(url).text, flavor="bs4")
df = pd.concat(df)
df.to_csv("data.csv", index=False)
print(df)

Output:输出:

[                    Parameter  Einheit    Grenzwert Messwert, Februar 2020
0            Wassertemperatur       °C          NaN                     98
1         Leitfähigkeit (25°)    µS/cm         2790                    302
2   Sauerstoff (elektrochem.)     mg/l          NaN                    109
3                     pH-Wert      NaN  6,5 bis 9,5                    806
4             Sättigungsindex      NaN          NaN                    001
5         Karbonathärte (dH°)      °dH          NaN                    454
6           Gesamthärte (dH°)      °dH          NaN                    645
7                Härtebereich      NaN          NaN                  weich
8         Calcitlösekapazität     mg/l            5                    -01
and so on...

Also, this spits out a .csv file with the data from the table.此外,这会输出一个包含表中数据的.csv文件。

EDIT:编辑:

This sort of feels like a hack, but it works.这种感觉就像一个黑客,但它的工作原理。 Based on the comment and the URL, you can loop over the tables from the df and split them up in separate files.根据注释和 URL,您可以遍历df的表并将它们拆分为单独的文件。

import requests
import pandas as pd

url = "https://www.swd-ag.de/energie-wasser/wasser/trinkwasseranalyse/"

df = pd.read_html(io=requests.get(url).text, flavor="bs4")
for index, table in enumerate(df, start=1):
    table.to_csv(f"table_{index}.csv", index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM