简体   繁体   中英

I want to get all links from a certain webpage using python

i want to be able to pull all urls from the following webpage using python https://yeezysupply.com/pages/all i tried using some other suggestions i found but they didn't seem to work with this particular website. i would end up not finding any urls at all.

import urllib
import lxml.html
connection = urllib.urlopen('https://yeezysupply.com/pages/all')

dom =  lxml.html.fromstring(connection.read())

for link in dom.xpath('//a/@href'): 
    print link

perhaps it would be useful for you to make use of modules specifically designed for this. heres a quick and dirty script that gets the relative links on the page

#!/usr/bin/python3

import requests, bs4

res = requests.get('https://yeezysupply.com/pages/all')

soup = bs4.BeautifulSoup(res.text,'html.parser')
links = soup.find_all('a')

for link in links:
    print(link.attrs['href'])

it generates output like this:

/pages/jewelry
/pages/clothing
/pages/footwear
/pages/all
/cart
/products/womens-boucle-dress-bleach/?back=%2Fpages%2Fall
/products/double-sleeve-sweatshirt-bleach/?back=%2Fpages%2Fall
/products/boxy-fit-zip-up-hoodie-light-sand/?back=%2Fpages%2Fall
/products/womens-boucle-skirt-cream/?back=%2Fpages%2Fall
etc...

is this what you are looking for? requests and beautiful soup are amazing tools for scraping.

There are no links in the page source; they are inserted using Javascript after the page is loaded int the browser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM