Python Webscrape with Beautiful Soup

Question

I am new to python and working on a webscraper.我是 python 新手，正在开发 webscraper。 My issues is that my list is only populating the first link in each category.我的问题是我的列表只填充每个类别中的第一个链接。 Length on output is 9, but should be 25. I am pretty sure my error has something to do with my l=[] and d={}, but not sure.输出的长度是 9，但应该是 25。我很确定我的错误与我的 l=[] 和 d={} 有关，但不确定。

Any help would be appreciated.任何帮助，将不胜感激。

import requests
from bs4 import BeautifulSoup
import gspread
import re
#import pandas as pd

url = 'https://www.astro.org/Patient-Care-and-Research/Clinical-Practice-Statements/Clinical-Practice-Guidelines'

r=requests.get(url)
c=r.content

soup=BeautifulSoup(c,'lxml')

all=soup.find_all('div', {'class':'panel-body'})

l=[]
for item in all:
      
    try:
        links=item.find_all('a')
        for a in links:
            d={}
            d['link']=zurl= ("https://www.astro.org" + a['href'])
            r2=requests.get(zurl)
            c2=r2.content
            soup2=BeautifulSoup(c2,'html.parser')
            title=soup2.select('#form > div.wrapper.interior-page > section:nth-child(6) > div > div > div.col-md-8.col-md-offset-1.col-sm-8.col-sm-offset-1.col-xs-12.floatright > div:nth-child(1) > div > h1')
            titlelst = title[:len(title)]
            titleparagraph = []
            for x in titlelst:
                titleparagraph.append(str(x.text))
                d['title']=("".join(map(str,titleparagraph)))
            all3=soup2.select('#form > div.wrapper.interior-page > section:nth-child(6) > div > div > div.col-md-8.col-md-offset-1.col-sm-8.col-sm-offset-1.col-xs-12.floatright > div:nth-child(2) > div')
            lst = all3[:len(all3)]
            paragraphs = []
            for x in lst:
                paragraphs.append(str(x.text))
                d['full']=("".join(map(str,paragraphs)))
                lplinks=x.find_all('a')
                lplinklist = []
                for a in lplinks:
                    lplinklist.append(str(a['href'])+'\n')
                    d['link2']=("".join(map(str,lplinklist)))     
                    
    except:
        print(None)
     
    l.append(d)
    print(len(l))

Answer 1

You just put the l.append(d) out of the for loop.您只需将l.append(d)放在 for 循环之外。 So you only appending the last d in each a you query.所以你只在你查询的每个a附加最后一个d 。 Move it to the end of the loop and it will work fine:将它移动到循环的末尾，它会正常工作：

for item in all:

    try:
        links = item.find_all('a')
        for a in links:
            ... 
            ...    

            l.append(d)

    except:
        print(None)

print(len(l)) # prints 25

Python Webscrape with Beautiful Soup

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-25 00:01:39

Python Webscrape with Beautiful Soup

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-25 00:01:39

解决方案1
1 已采纳 2020-11-25 00:01:39