简体   繁体   中英

Python scraping urls category

Hello I'm starting now in programming and I'm having some problems with Scraping, I'm trying to get all the links in a category with several names, but I'm not getting isolate the cells because there are many with the same name, could someone help me? I will put the picture of my code and the url I want to get.

`

 url = 'http://books.toscrape.com/index.html' reqs = requests.get(url) if reqs.ok: soup = BeautifulSoup(reqs.text, 'html.parser') ul = soup.find('ul', {'class': 'nav nav-list'}) for cells in ul: a = cells.find('a') link = a['href'] #print(link) [print(str(lis) + '\n\n') for lis in link]

=== LINK IMAGE ===

I need to retrieve all urls in (li)

在此处输入图像描述

I think this is what you are after. I have commented the code to explain what I've done and why. You can obviously write this in fewer lines but this explains a little easier.

from bs4 import BeautifulSoup
import requests

url = 'http://books.toscrape.com/index.html'
reqs = requests.get(url)
if reqs.ok:
    soup = BeautifulSoup(reqs.text, 'html.parser')

    # use multiple class selector list
    sidebar = soup.find('ul', {"class": ["nav", "nav-list"]})

    # find all the list tags within the ol
    li = sidebar.find_all('li')

    for item in li:
        # iterate through results to find a link
        link = item.find('a', href=True)
        # if there is a link print it
        if link is not None:
            print(link['href'])

This can be done by going straight to the links:

from bs4 import BeautifulSoup
import requests

url = 'http://books.toscrape.com/index.html'
reqs = requests.get(url)
if reqs.ok:
    soup = BeautifulSoup(reqs.text, 'html.parser')
    ul = soup.find_all('a')
    for cells in ul:
        print(cells['href'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM