[英]Scrape table data from website using Python
I am trying to scrape below table data from a website using BeautifulSoup4 and Python link is: 1 : https://i.stack.imgur.com/PfPOQ.png我正在尝试使用 BeautifulSoup4 和 Python 链接从网站上抓取下表数据: 1 : https://i.stack.imgur.com/PfPOQ.png
So far my code is到目前为止我的代码是
url = "https://www.boerse-frankfurt.de/bond/xs0216072230"
content = requests.get(url)
soup = BeautifulSoup(content.text, 'html.parser')
tbody_data = soup.find_all("table", attrs={"class": "table widget-table"})
table1 = tbody_data[2]
table_body = table1.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
print(cols)
With this code, I am getting result: Mycoderesult https://i.stack.imgur.com/C190u.png [Issuer, ] [Industry, ]使用此代码,我得到结果: Mycoderesult https://i.stack.imgur.com/C190u.png [Issuer, ] [Industry, ]
I see Issuer, Industry but value of Issuer and Industry not showing up by my result.我看到发行人,行业,但发行人和行业的价值没有显示在我的结果中。 Any help would be appreciated.
任何帮助,将不胜感激。 TIA
TIA
You are not getting the entire output because data of second td of the table number 6 here is dynamically loaded via JavaScript.So you can mimic that using selenium with pandas. You are not getting the entire output because data of second td of the table number 6 here is dynamically loaded via JavaScript.So you can mimic that using selenium with pandas.
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
from selenium.webdriver.chrome.options import Options
webdriver_service = Service("./chromedriver") #Your chromedriver path
driver = webdriver.Chrome(service=webdriver_service)
url = 'https://www.boerse-frankfurt.de/bond/xs0216072230-fuerstenberg-capital-erste-gmbh-2-522'
driver.get(url)
driver.maximize_window()
time.sleep(3)
table=BeautifulSoup(driver.page_source, 'lxml')
df = pd.read_html(str(table))[5]
print(df)
Output: Output:
0 Issuer Fürstenberg Capital Erste GmbH
1 Industry Industrial and bank bonds
2 Market Open Market
3 Subsegment NaN
4 Minimum investment amount 1000
5 Listing unit Percent
6 Issue date 04/04/2005
7 Issue volume 61203000
8 Circulating volume 61203000
9 Issue currency EUR
10 Portfolio currency EUR
11 First trading day 27/06/2012
12 Maturity NaN
13 Extraordinary cancellation type Call option
14 Extraordinary cancellation date NaN
15 Subordinated Yes
Another solution, using just requests
.另一种解决方案,仅使用
requests
。 Note, to obtain the result from the server one has to set required headers (the headers can be seen from the Developer tools -> Network tab).请注意,要从服务器获取结果,必须设置所需的标头(标头可以从开发人员工具 -> 网络选项卡中看到)。
import requests
url = (
"https://api.boerse-frankfurt.de/v1/data/master_data_bond?isin=XS0216072230"
)
headers = {
"X-Client-TraceId": "d87b41992f6161c09e875c525c70ffcf",
"X-Security": "d361b3c92e9c50a248e85a12849f8eee",
"Client-Date": "2022-08-25T09:07:36.196Z",
}
data = requests.get(url, headers=headers).json()
print(data)
Prints:印刷:
{
"isin": "XS0216072230",
"type": {
"originalValue": "25",
"translations": {
"de": "(Industrie-) und Bankschuldverschreibungen",
"en": "Industrial and bank bonds",
},
},
"market": {
"originalValue": "OPEN",
"translations": {"de": "Freiverkehr", "en": "Open Market"},
},
"subSegment": None,
"cupon": 2.522,
"interestPaymentPeriod": None,
"firstAnnualPayDate": "2006-06-30",
"minimumInvestmentAmount": 1000.0,
"issuer": "Fürstenberg Capital Erste GmbH",
"issueDate": "2005-04-04",
"issueVolume": 61203000.0,
"circulatingVolume": 61203000.0,
"issueCurrency": "EUR",
"firstTradingDay": "2012-06-27",
"maturity": None,
"noticeType": {
"originalValue": "CALL_OPTION",
"translations": {"others": "Call option"},
},
"extraordinaryCancellation": None,
"portfolioCurrency": "EUR",
"subordinated": True,
"flatNotation": {"originalValue": "01", "translations": {"others": "flat"}},
"quotationType": {
"originalValue": "2",
"translations": {"de": "Prozentnotiert", "en": "Percent"},
},
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.