[英]Getting the entire cell information with Beautiful Soup/Web Scraping
[英]Web scraping with beautiful soup, entering all links and getting information
我正在嘗試從 StackOverflow 公司打開每家公司並獲取特定信息(例如整個描述)。有沒有使用 Beautiful Soup 的簡單方法?現在我正在獲取第一頁公司的鏈接。
import requests
from bs4 import BeautifulSoup
r = requests.get('https://stackoverflow.com/jobs/companies')
src = r.content
soup = BeautifulSoup(src,'lxml')
urls=[]
for h2_tag in soup.find_all("h2"):
a_tag = h2_tag.find('a')
urls.append(a_tag.attrs['href'])
print(urls)
import requests
from bs4 import BeautifulSoup as bsoup
for i in range(0, 5):
site_source = requests.get(
f"https://stackoverflow.com/jobs/companies?pg={i}"
).content
soup = bsoup(site_source, "html.parser")
company_list = soup.find("div", class_="company-list")
company_block = company_list.find_all("div", class_="grid--cell fl1 text")
for company in company_block:
if company.find("a"):
company_url = company.find("a").attrs["href"]
base_url = "https://stackoverflow.com"
company_source = requests.get(base_url + company_url).content
company_soup = bsoup(company_source, "html.parser")
company_info = company_soup.find("div", id="company-name-tagline")
print("Name: ", company_info.find("h1").text)
print("Info: ", company_info.find("p").text)
print()
我基本上是循環瀏覽第 1 頁到第 5 頁,獲取每家公司的鏈接,然后轉到公司名稱並打印出名稱和描述。
我的 output
Name: BigCommerce
Info: Think BIG
Name: Facebook
Info: Our mission is to give people the power to build community and bring the world closer together.
Name: trivago N.V.
Info: A diverse team of talents that make a blazing fast accommodation search powered by cutting-edge tech and entrepreneurial innovation.
Name: General Dynamics UK
Info: General Dynamics UK is one of the UK’s leading defence companies, and an important supplier to the UK Ministry of Defence (MoD).
Name: EDF
Info: EDF is leading the transition to a cleaner, low emission electric future, tackling climate change and helping Britain reach net zero.
Name: Radix DLT
Info: Delivering Scalable Trust.
有,你可以滾動第一頁,然后 go 滾動到第二頁,使用 selenium 點擊第二頁按鈕,每次都傳遞頁面源,我認為這應該有效
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.