简体   繁体   中英

How to extract data from a website's search bar using python?

I want to extract data from a website that contains the names of many doctors and hospitals I want to do some evaluations so I decided to use the search bar but unfortunately cannot seem to get my desired result!

How can I do that?

from bs4 import BeautifulSoup
import requests
import urllib.request


types_of_doctor = ['dermatologist', 'gynecologist', 'paediatric-surgeon', 'cardiologist', 'diabetologists', 'eye-specialist']
def search():
    for query in types_of_doctor:
        # Constracting http query
        url = 'http://health.hamariweb.com/doctors/' + query
        r = requests.get(url)
        soup = BeautifulSoup(r.content, 'html.parser')
        Doctors_name = soup.findAll('a', {"class" : "NormalText"})
        for doctors in Doctors_name:
            print(doctors.text)
        links = soup.select('a')
        header = types_of_doctor
        filename = 'AllNames.csv'
        f = open(filename, 'w')
        for head in header:
            f.write(head+'\t')
        for doctors in Doctors_name:
            print(doctors.text)
            f.write(doctors.text)
    search()

You need to move your

    filename = 'AllNames.csv'
    f = open(filename, 'w')

outside of the loop. Otherwise you are initializing and overwriting the file for each query.

    def search():
    filename = 'AllNames.csv'
    f = open(filename, 'w')
         for query in types_of_doctor:

The technique to extract information from websites is Web scraping . This technique mostly focuses on the transformation of unstructured data (HTML format) on the web into structured data (database or spreadsheet).

You can perform web scraping in various ways. One of these is to use Python using BeautifulSoup which assists this task.

Please, read the articles below:

https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/

https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe

adapting it for your needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM