使用 Python 从网站抓取表数据

Question

I am trying to scrape below table data from a website using BeautifulSoup4 and Python link is: 1 : https://i.stack.imgur.com/PfPOQ.png我正在尝试使用 BeautifulSoup4 和 Python 链接从网站上抓取下表数据： 1 ： https://i.stack.imgur.com/PfPOQ.png

So far my code is到目前为止我的代码是

url = "https://www.boerse-frankfurt.de/bond/xs0216072230"
content = requests.get(url)
soup = BeautifulSoup(content.text, 'html.parser')
tbody_data = soup.find_all("table", attrs={"class": "table widget-table"})
table1 = tbody_data[2]
table_body = table1.find('tbody')
rows = table_body.find_all('tr')
 for row in rows:
        cols = row.find_all('td')
        print(cols)

With this code, I am getting result: Mycoderesult https://i.stack.imgur.com/C190u.png [Issuer, ] [Industry, ]使用此代码，我得到结果： Mycoderesult https://i.stack.imgur.com/C190u.png [Issuer, ] [Industry, ]

I see Issuer, Industry but value of Issuer and Industry not showing up by my result.我看到发行人，行业，但发行人和行业的价值没有显示在我的结果中。 Any help would be appreciated.任何帮助，将不胜感激。 TIA TIA

Answer 1

You are not getting the entire output because data of second td of the table number 6 here is dynamically loaded via JavaScript.So you can mimic that using selenium with pandas. You are not getting the entire output because data of second td of the table number 6 here is dynamically loaded via JavaScript.So you can mimic that using selenium with pandas.

import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options

webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.boerse-frankfurt.de/bond/xs0216072230-fuerstenberg-capital-erste-gmbh-2-522'
driver.get(url)
driver.maximize_window()
time.sleep(3)
table=BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(table))[5]
print(df)

Output: Output：

0                            Issuer  Fürstenberg Capital Erste GmbH
1                          Industry       Industrial and bank bonds
2                            Market                     Open Market
3                        Subsegment                             NaN
4         Minimum investment amount                            1000
5                      Listing unit                         Percent
6                        Issue date                      04/04/2005
7                      Issue volume                        61203000
8                Circulating volume                        61203000
9                    Issue currency                             EUR
10               Portfolio currency                             EUR
11                First trading day                      27/06/2012
12                         Maturity                             NaN
13  Extraordinary cancellation type                     Call option
14  Extraordinary cancellation date                             NaN
15                     Subordinated                             Yes

Answer 2

Another solution, using just requests .另一种解决方案，仅使用requests 。 Note, to obtain the result from the server one has to set required headers (the headers can be seen from the Developer tools -> Network tab).请注意，要从服务器获取结果，必须设置所需的标头（标头可以从开发人员工具 -> 网络选项卡中看到）。

import requests

url = (
    "https://api.boerse-frankfurt.de/v1/data/master_data_bond?isin=XS0216072230"
)

headers = {
    "X-Client-TraceId": "d87b41992f6161c09e875c525c70ffcf",
    "X-Security": "d361b3c92e9c50a248e85a12849f8eee",
    "Client-Date": "2022-08-25T09:07:36.196Z",
}

data = requests.get(url, headers=headers).json()
print(data)

Prints:印刷：

{
    "isin": "XS0216072230",
    "type": {
        "originalValue": "25",
        "translations": {
            "de": "(Industrie-) und Bankschuldverschreibungen",
            "en": "Industrial and bank bonds",
        },
    },
    "market": {
        "originalValue": "OPEN",
        "translations": {"de": "Freiverkehr", "en": "Open Market"},
    },
    "subSegment": None,
    "cupon": 2.522,
    "interestPaymentPeriod": None,
    "firstAnnualPayDate": "2006-06-30",
    "minimumInvestmentAmount": 1000.0,
    "issuer": "Fürstenberg Capital Erste GmbH",
    "issueDate": "2005-04-04",
    "issueVolume": 61203000.0,
    "circulatingVolume": 61203000.0,
    "issueCurrency": "EUR",
    "firstTradingDay": "2012-06-27",
    "maturity": None,
    "noticeType": {
        "originalValue": "CALL_OPTION",
        "translations": {"others": "Call option"},
    },
    "extraordinaryCancellation": None,
    "portfolioCurrency": "EUR",
    "subordinated": True,
    "flatNotation": {"originalValue": "01", "translations": {"others": "flat"}},
    "quotationType": {
        "originalValue": "2",
        "translations": {"de": "Prozentnotiert", "en": "Percent"},
    },
}

使用 Python 从网站抓取表数据

问题描述

2 个解决方案

解决方案1
1 2022-08-25 09:13:22

解决方案2
1 2022-08-25 09:24:05

使用 Python 从网站抓取表数据

问题描述

2 个解决方案

解决方案1 1 2022-08-25 09:13:22

解决方案2 1 2022-08-25 09:24:05

解决方案1
1 2022-08-25 09:13:22

解决方案2
1 2022-08-25 09:24:05