简体   繁体   中英

Python scraper: How to go in a profile and extract data

I am making a python scraper for a project. I need to scrape some data from a Doctor Review site.

I've got code working for getting there Name, Specialties and Number of Reviews, but I need to enter each of their profile to get Phone Number and Address, I do not know how should I do it, do I need separate function to do it, or I can do it with this one?

Every kind of help will help me a lot.

import requests
from bs4 import BeautifulSoup


base_url = "https://www.ratemds.com/best-doctors/?page=1"
for page in range(1, 5, 1):
    r = requests.get(base_url)
    c = r.content
    soup = BeautifulSoup(c, 'html.parser')
    all = soup.find_all("div", {"class": "search-item doctor-profile"})

    for item in all:
        try:
            print(item.find("a", {"class": "search-item-doctor-link"}).text)
        except:
            pass
        try:
            print(item.find("a", {"class": None}).text)
        except:
            pass

According to @cpander just store all item.find("a", {"class": "search-item-doctor-link"})['href'] and run requests.get() again with stored Urls. Just a short example for getting the phone number:

item.find("div", attrs={"doctordetail":".1.0.0.0.2.2.0.1.1.0.0.1:1.0"}")

I want to thank him espicaly for his idea suggestion.

This was the way I did:

for item in all:
    try:
        n = item.find("a", {"class": "search-item-doctor-link"})
        a = n.get('href')
        print("https://www.ratemds.com/"+a)
    except:
        pass

And I get all the link for there profiles, the rest I know.

Thanks to all, that offered there help :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM