繁体   English   中英

Python Web Scraper:我的脚本只是打印第一个,而不是全部?

[英]Python Web Scraper : My script is just printing the first one, instead of all?

m making a python web scraper for a project, It获取了我想要的所有信息,但唯一的问题是他为第一个配置文件做了它而没有获取其他信息

我试图找出问题,但我被卡住了,任何建议都会有所帮助

import requests
    import pandas
    from bs4 import BeautifulSoup
    
    
    base_url = "https://www.ratemds.com/best-doctors/?page=1"
    for page in range(1, 2, 1):
        r = requests.get(base_url)
        c = r.content
        soup = BeautifulSoup(c, 'html.parser')
        all = soup.find_all("div", {"class": "search-item doctor-profile"})
        l = []
        for item in all:
            d = {}
            d["Name"] = item.find("a", {"class": "search-item-doctor-link"}).text
            d["Phone Number"] = item.find("div", {"class": "search-item-specialty"}).text
            n = item.find("a", {"class": "search-item-doctor-link"})
            a = n.get('href')
            new_url = ("https://www.ratemds.com"+a)
            r1 = requests.get(new_url)
            c1 = r1.content
            soup1 = BeautifulSoup(c1, 'html.parser')
            sve = soup1.find_all("div", {"class": "col-sm-3 col-md-4 search-item-extra"})
            for profil in sve:
                try:
                    d["Phone Number"] = profil.find("meta", itemprop = "telephone")["content"]
                except:
                    d["Phone Number"] = None
                try:
                    d["Adress"] =  profil.find("meta", itemprop = "streetAddress")["content"]
                except:
                    d["Adress"] = None
                try:
                    d["Website"] =  profil.find("a", itemprop = "sameAs")["href"]
                except:
                    d["Website"] = None
                pass
    l.append(d)
    df = pandas.DataFrame(l)
    df.to_csv("123.csv")
    print(df)

这是您的代码,进行了一些调整:

base_url = "https://www.ratemds.com/best-doctors/?page={}"  # Change base url to this
# Moved the list of dicts outsided of the main loop
l = []

for page in range(1, 5):
    r = requests.get(base_url.format(page))   #  substitute 'page' variable in base_url
    c = r.content
    soup = BeautifulSoup(c, 'html.parser')
    all = soup.find_all("div", {"class": "search-item doctor-profile"})
    for item in all:
        d = {}
        d["Name"] = item.find("a", {"class": "search-item-doctor-link"}).text
        d["Phone Number"] = item.find("div", {"class": "search-item-specialty"}).text
        n = item.find("a", {"class": "search-item-doctor-link"})
        a = n.get('href')
        new_url = ("https://www.ratemds.com"+a)
        r1 = requests.get(new_url)
        c1 = r1.content
        soup1 = BeautifulSoup(c1, 'html.parser')
        sve = soup1.find_all("div", {"class": "col-sm-3 col-md-4 search-item-extra"})
        for profil in sve:
            try:
                d["Phone Number"] = profil.find("meta", itemprop = "telephone")["content"]
            except:
                d["Phone Number"] = None
            try:
                d["Adress"] =  profil.find("meta", itemprop = "streetAddress")["content"]
            except:
                d["Adress"] = None
            try:
                d["Website"] =  profil.find("a", itemprop = "sameAs")["href"]
            except:
                d["Website"] = None
            pass
        l.append(d)  # indented this line to append within this loop

df = pd.DataFrame(l)
df.to_csv("123.csv")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM