[英]I want to get all links from a certain webpage using python
i want to be able to pull all urls from the following webpage using python https://yeezysupply.com/pages/all i tried using some other suggestions i found but they didn't seem to work with this particular website. 我希望能够使用python https://yeezysupply.com/pages/all从以下网页提取所有网址,我尝试使用发现的其他建议,但它们似乎不适用于该特定网站。 i would end up not finding any urls at all.
我最终将根本找不到任何网址。
import urllib
import lxml.html
connection = urllib.urlopen('https://yeezysupply.com/pages/all')
dom = lxml.html.fromstring(connection.read())
for link in dom.xpath('//a/@href'):
print link
perhaps it would be useful for you to make use of modules specifically designed for this. 也许使用专门为此设计的模块对您很有用。 heres a quick and dirty script that gets the relative links on the page
这是一个快速又肮脏的脚本,该脚本在页面上获取了相关链接
#!/usr/bin/python3
import requests, bs4
res = requests.get('https://yeezysupply.com/pages/all')
soup = bs4.BeautifulSoup(res.text,'html.parser')
links = soup.find_all('a')
for link in links:
print(link.attrs['href'])
it generates output like this: 它产生如下输出:
/pages/jewelry
/pages/clothing
/pages/footwear
/pages/all
/cart
/products/womens-boucle-dress-bleach/?back=%2Fpages%2Fall
/products/double-sleeve-sweatshirt-bleach/?back=%2Fpages%2Fall
/products/boxy-fit-zip-up-hoodie-light-sand/?back=%2Fpages%2Fall
/products/womens-boucle-skirt-cream/?back=%2Fpages%2Fall
etc...
is this what you are looking for? 这是你想要的? requests and beautiful soup are amazing tools for scraping.
要求和精美的汤是刮amazing的绝佳工具。
There are no links in the page source; 页面源中没有链接。 they are inserted using Javascript after the page is loaded int the browser.
在将页面加载到浏览器后,使用Java脚本插入它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.