繁体   English   中英

使用 Python 从网站抓取表数据

[英]Scrape table data from website using Python

我正在尝试使用 BeautifulSoup4 和 Python 链接从网站上抓取下表数据: 1https://i.stack.imgur.com/PfPOQ.png

到目前为止我的代码是

url = "https://www.boerse-frankfurt.de/bond/xs0216072230"
content = requests.get(url)
soup = BeautifulSoup(content.text, 'html.parser')
tbody_data = soup.find_all("table", attrs={"class": "table widget-table"})
table1 = tbody_data[2]
table_body = table1.find('tbody')
rows = table_body.find_all('tr')
 for row in rows:
        cols = row.find_all('td')
        print(cols)

使用此代码,我得到结果: Mycoderesult https://i.stack.imgur.com/C190u.png [Issuer, ] [Industry, ]

我看到发行人,行业,但发行人和行业的价值没有显示在我的结果中。 任何帮助,将不胜感激。 TIA

You are not getting the entire output because data of second td of the table number 6 here is dynamically loaded via JavaScript.So you can mimic that using selenium with pandas.

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.boerse-frankfurt.de/bond/xs0216072230-fuerstenberg-capital-erste-gmbh-2-522'
driver.get(url)
driver.maximize_window()
time.sleep(3)
table=BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(table))[5]
print(df)

Output:

0                            Issuer  Fürstenberg Capital Erste GmbH
1                          Industry       Industrial and bank bonds
2                            Market                     Open Market
3                        Subsegment                             NaN
4         Minimum investment amount                            1000
5                      Listing unit                         Percent
6                        Issue date                      04/04/2005
7                      Issue volume                        61203000
8                Circulating volume                        61203000
9                    Issue currency                             EUR
10               Portfolio currency                             EUR
11                First trading day                      27/06/2012
12                         Maturity                             NaN
13  Extraordinary cancellation type                     Call option
14  Extraordinary cancellation date                             NaN
15                     Subordinated                             Yes

另一种解决方案,仅使用requests 请注意,要从服务器获取结果,必须设置所需的标头(标头可以从开发人员工具 -> 网络选项卡中看到)。

import requests

url = (
    "https://api.boerse-frankfurt.de/v1/data/master_data_bond?isin=XS0216072230"
)

headers = {
    "X-Client-TraceId": "d87b41992f6161c09e875c525c70ffcf",
    "X-Security": "d361b3c92e9c50a248e85a12849f8eee",
    "Client-Date": "2022-08-25T09:07:36.196Z",
}

data = requests.get(url, headers=headers).json()
print(data)

印刷:

{
    "isin": "XS0216072230",
    "type": {
        "originalValue": "25",
        "translations": {
            "de": "(Industrie-) und Bankschuldverschreibungen",
            "en": "Industrial and bank bonds",
        },
    },
    "market": {
        "originalValue": "OPEN",
        "translations": {"de": "Freiverkehr", "en": "Open Market"},
    },
    "subSegment": None,
    "cupon": 2.522,
    "interestPaymentPeriod": None,
    "firstAnnualPayDate": "2006-06-30",
    "minimumInvestmentAmount": 1000.0,
    "issuer": "Fürstenberg Capital Erste GmbH",
    "issueDate": "2005-04-04",
    "issueVolume": 61203000.0,
    "circulatingVolume": 61203000.0,
    "issueCurrency": "EUR",
    "firstTradingDay": "2012-06-27",
    "maturity": None,
    "noticeType": {
        "originalValue": "CALL_OPTION",
        "translations": {"others": "Call option"},
    },
    "extraordinaryCancellation": None,
    "portfolioCurrency": "EUR",
    "subordinated": True,
    "flatNotation": {"originalValue": "01", "translations": {"others": "flat"}},
    "quotationType": {
        "originalValue": "2",
        "translations": {"de": "Prozentnotiert", "en": "Percent"},
    },
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM