使用 Python 从 Web 读取表

Question

我是 Python 的新手，我正在努力从网站https://www.screener.in/company/ABB/consolidated/上的特定表（最后一个表是Shareholding Pattern ）中提取数据

我为此使用 BeautifulSoup 库，但我不知道如何使用 go。

到目前为止，下面是我的代码片段。 由于页面有多个表并且所有表共享公共类和 ID，因此我无法选择正确的表，这使我很难筛选出我想要的一个表。

import requests import urllib.request
from bs4 import BeautifulSoup
    
url = "https://www.screener.in/company/ABB/consolidated/"

r = requests.get(url)
print(r.status_code)
html_content = r.text
soup = BeautifulSoup(html_content,"html.parser")
# print(soup)
#data_table = soup.find('table', class_ = "data-table")
# print(data_table) table_needed = soup.find("<h2>ShareholdingPattern</h2>")
#sub = table_needed.contents[0] print(table_needed)

Answer 1

只需使用requests和pandas 。 抓取最后一张表并将其转储到.csv文件中。

就是这样：

import pandas as pd
import requests

df = pd.read_html(
    requests.get("https://www.screener.in/company/ABB/consolidated/").text,
    flavor="bs4",
)
df[-1].to_csv("last_table.csv", index=False)

Output 来自.csv文件：

使用 Python 从 Web 读取表

问题描述

1 个解决方案

解决方案1
3 已采纳 2021-02-09 11:44:21

使用 Python 从 Web 读取表

问题描述

1 个解决方案

解决方案1 3 已采纳 2021-02-09 11:44:21

解决方案1
3 已采纳 2021-02-09 11:44:21