![](/img/trans.png)
[英]How can I find out correct div, class, span when scraping a html page
[英]How to Nest If Statement Within For Loop When Scraping Div Class HTML
以下是使用“美麗的湯”從該網頁上刮除醫生信息的刮板。 從下面的html代碼中可以看到,每個醫生在網頁上都有一個個人資料,其中顯示了醫生的姓名,診所,專業,分類和城市。
<div class="views-field views-field-title practitioner__name" ><a href="/practitioners/41824">Marilyn Adams</a></div>
<div class="views-field views-field-field-pract-clinic practitioner__clinic" ><a href="/clinic/fortius-sport-health">Fortius Sport & Health</a></div>
<div class="views-field views-field-field-pract-profession practitioner__profession" >Physiotherapist</div>
<div class="views-field views-field-taxonomy-vocabulary-5 practitioner__region" >Fraser River Delta</div>
<div class="views-field views-field-city practitioner__city" ></div>
從示例html代碼中可以看到,醫師個人資料偶爾會丟失信息。 如果發生這種情況,我希望刮板打印“ N / A”。 我需要刮板打印“ N / A”,因為我最終希望將每個div類類別(名稱,診所,專業等)放入一個數組,其中每列的長度完全相同,以便我可以正確導出將數據保存到CSV文件。 這是我希望輸出與實際顯示相比的示例。
Actual Expected
[Names] [Names]
Greg Greg
Bob Bob
[Clinic] [Clinic]
Sport/Health Sport/Health
N/A
[Profession] [Profession]
Physical Therapist Physical Therapist
Physical Therapist Physical Therapist
[Taxonomy] [Taxonomy]
Fraser River Fraser River
N/A
[City] [City]
Vancouver Vancouver
Vancouver Vancouver
我曾嘗試編寫一個嵌套在每個for循環內的if語句,但是由於“ N / A”對於每個div類節僅顯示一次,因此代碼似乎無法正確循環。 有誰知道如何正確地將for語句嵌套在for循環中,以便在每列中獲取適當數量的“ N / As”? 提前致謝!
import requests
import re
from bs4 import BeautifulSoup
page=requests.get('https://sportmedbc.com/practitioners')
soup=BeautifulSoup(page.text, 'html.parser')
#Find Doctor Info
for doctor in soup.find_all('div',attrs={'class':'views-field views-field-title practitioner__name'}):
for a in doctor.find_all('a'):
print(a.text)
for clinic_name in soup.find_all('div',attrs={'class':'views-field views-field-field-pract-clinic practitioner__clinic'}):
for b in clinic_name.find_all('a'):
if b==(''):
print('N/A')
profession_links=soup.findAll('div',attrs={'class':'views-field views-field-field-pract-profession practitioner__profession'})
for profession in profession_links:
if profession.text==(''):
print('N/A')
print(profession.text)
taxonomy_links=soup.findAll('div',attrs={'class':'views-field views-field-taxonomy-vocabulary-5 practitioner__region'})
for taxonomy in taxonomy_links:
if taxonomy.text==(''):
print('N/A')
print(taxonomy.text)
city_links=soup.findAll('div',attrs={'class':'views-field views-field-taxonomy-vocabulary-5 practitioner__region'})
for city in city_links:
if city.text==(''):
print('N/A')
print(city.text)
對於此問題,您可以使用來自collections
模塊的ChainMap
( 此處的文檔 )。 這樣,您可以定義默認值,在這種情況下為'n/a'
並且僅獲取每個醫生存在的信息:
from bs4 import BeautifulSoup
import requests
from collections import ChainMap
url = 'https://sportmedbc.com/practitioners'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
def get_data(soup):
default_data = {'name': 'n/a', 'clinic': 'n/a', 'profession': 'n/a', 'region': 'n/a', 'city': 'n/a'}
for doctor in soup.select('.view-practitioners .practitioner'):
doctor_data = {}
if doctor.select_one('.practitioner__name').text.strip():
doctor_data['name'] = doctor.select_one('.practitioner__name').text
if doctor.select_one('.practitioner__clinic').text.strip():
doctor_data['clinic'] = doctor.select_one('.practitioner__clinic').text
if doctor.select_one('.practitioner__profession').text.strip():
doctor_data['profession'] = doctor.select_one('.practitioner__profession').text
if doctor.select_one('.practitioner__region').text.strip():
doctor_data['region'] = doctor.select_one('.practitioner__region').text
if doctor.select_one('.practitioner__city').text.strip():
doctor_data['city'] = doctor.select_one('.practitioner__city').text
yield ChainMap(doctor_data, default_data)
for doctor in get_data(soup):
print('name:\t\t', doctor['name'])
print('clinic:\t\t',doctor['clinic'])
print('profession:\t',doctor['profession'])
print('city:\t\t',doctor['city'])
print('region:\t\t',doctor['region'])
print('-' * 80)
印刷品:
name: Jaimie Ackerman
clinic: n/a
profession: n/a
city: n/a
region: n/a
--------------------------------------------------------------------------------
name: Marilyn Adams
clinic: Fortius Sport & Health
profession: Physiotherapist
city: n/a
region: Fraser River Delta
--------------------------------------------------------------------------------
name: Mahsa Ahmadi
clinic: Wellpoint Acupuncture (Sports Medicine)
profession: Acupuncturist
city: Vancouver
region: Vancouver & Sea to Sky
--------------------------------------------------------------------------------
name: Tracie Albisser
clinic: Pacific Sport Northern BC, Tracie Albisser
profession: Strength and Conditioning Specialist, Exercise Physiologist
city: n/a
region: Cariboo - North East
--------------------------------------------------------------------------------
name: Christine Alder
clinic: n/a
profession: n/a
city: Vancouver
region: Vancouver & Sea to Sky
--------------------------------------------------------------------------------
name: Steacy Alexander
clinic: Go! Physiotherapy Sports and Wellness Centre
profession: Physiotherapist
city: Vancouver
region: Vancouver & Sea to Sky
--------------------------------------------------------------------------------
name: Page Allison
clinic: AET Clinic, .
profession: Athletic Therapist
city: Victoria
region: Vancouver Island - Central Coast
--------------------------------------------------------------------------------
name: Dana Alumbaugh
clinic: n/a
profession: Podiatrist
city: Squamish
region: Vancouver & Sea to Sky
--------------------------------------------------------------------------------
name: Manouch Amel
clinic: Mountainview Kinesiology Ltd.
profession: Strength and Conditioning Specialist
city: Anmore
region: Vancouver & Sea to Sky
--------------------------------------------------------------------------------
name: Janet Ames
clinic: Dr. Janet Ames
profession: Physician
city: Prince George
region: Cariboo - North East
--------------------------------------------------------------------------------
name: Sandi Anderson
clinic: n/a
profession: n/a
city: Coquitlam
region: Fraser Valley
--------------------------------------------------------------------------------
name: Greg Anderson
clinic: University of the Fraser Valley
profession: Exercise Physiologist
city: Mission
region: Fraser Valley
--------------------------------------------------------------------------------
編輯:
為了獲得列中的輸出,可以使用以下示例:
def print_data(header_text, data, key):
print(header_text)
for d in data:
print(d[key])
print()
data = list(get_data(soup))
print_data('[Names]', data, 'name')
print_data('[Clinic]', data, 'clinic')
print_data('[Profession]', data, 'profession')
print_data('[Taxonomy]', data, 'region')
print_data('[City]', data, 'city')
打印:
[Names]
Jaimie Ackerman
Marilyn Adams
Mahsa Ahmadi
Tracie Albisser
Christine Alder
Steacy Alexander
Page Allison
Dana Alumbaugh
Manouch Amel
Janet Ames
Sandi Anderson
Greg Anderson
[Clinic]
n/a
Fortius Sport & Health
Wellpoint Acupuncture (Sports Medicine)
Pacific Sport Northern BC, Tracie Albisser
n/a
Go! Physiotherapy Sports and Wellness Centre
AET Clinic, .
n/a
Mountainview Kinesiology Ltd.
Dr. Janet Ames
n/a
University of the Fraser Valley
[Profession]
n/a
Physiotherapist
Acupuncturist
Strength and Conditioning Specialist, Exercise Physiologist
n/a
Physiotherapist
Athletic Therapist
Podiatrist
Strength and Conditioning Specialist
Physician
n/a
Exercise Physiologist
[Taxonomy]
n/a
Fraser River Delta
Vancouver & Sea to Sky
Cariboo - North East
Vancouver & Sea to Sky
Vancouver & Sea to Sky
Vancouver Island - Central Coast
Vancouver & Sea to Sky
Vancouver & Sea to Sky
Cariboo - North East
Fraser Valley
Fraser Valley
[City]
n/a
n/a
Vancouver
n/a
Vancouver
Vancouver
Victoria
Squamish
Anmore
Prince George
Coquitlam
Mission
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.