I want to extract data from a website that contains the names of many doctors and hospitals I want to do some evaluations so I decided to use the search bar but unfortunately cannot seem to get my desired result!
How can I do that?
from bs4 import BeautifulSoup
import requests
import urllib.request
types_of_doctor = ['dermatologist', 'gynecologist', 'paediatric-surgeon', 'cardiologist', 'diabetologists', 'eye-specialist']
def search():
for query in types_of_doctor:
# Constracting http query
url = 'http://health.hamariweb.com/doctors/' + query
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
Doctors_name = soup.findAll('a', {"class" : "NormalText"})
for doctors in Doctors_name:
print(doctors.text)
links = soup.select('a')
header = types_of_doctor
filename = 'AllNames.csv'
f = open(filename, 'w')
for head in header:
f.write(head+'\t')
for doctors in Doctors_name:
print(doctors.text)
f.write(doctors.text)
search()
You need to move your
filename = 'AllNames.csv'
f = open(filename, 'w')
outside of the loop. Otherwise you are initializing and overwriting the file for each query.
def search():
filename = 'AllNames.csv'
f = open(filename, 'w')
for query in types_of_doctor:
The technique to extract information from websites is Web scraping . This technique mostly focuses on the transformation of unstructured data (HTML format) on the web into structured data (database or spreadsheet).
You can perform web scraping in various ways. One of these is to use Python using BeautifulSoup which assists this task.
Please, read the articles below:
https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/
https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe
adapting it for your needed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.