简体   繁体   中英

How do I get all the links from multiple web pages in python?

enter image description here

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

#import re

req = Request("https://www.indiegogo.com/individuals/23489031")
html_page = urlopen(req)

soup = BeautifulSoup(html_page, "lxml")

links = []
for link in soup.findAll('a'):
    links.append(link.get('href'))
    
print(links)

This code works if I use only one url but does not work with multiple urls. how do i do the same if i want to do it with multiple urls?

I haven't used bs4 ever, but you may be able to just create a list containing all the URLs you want to check. Then you can use a loop to iterate and work over each URL seperatly. Like:

urls = ["https://","https://","http://"] #But with actual links
for link in urls:
  #Work with each link seperatly here
  pass

Here I leave you a small code that I had to do at some point of scraping

you can adapt it to what you want to achieve .. I hope it helps you

import requests
from bs4 import BeautifulSoup as bs

url_list=['https://www.example1.com' , 'https://www.example2.com' ] 

def getlinks(url) :
    r=requests.get(url)
    tags_list=[ a for a in bs(r.text,'html.parser').find_all('a')]
    links=[ f'{url.split("//")[0]}//{url.split("//")[1]}{link}' if link.split('/')[0]!='https:' else link for link in [href.attrs['href'] if 'href' in href.attrs else '' for href in tags_list ] ]
    return links

you can loop through url_list and execute getlinks(url) with it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM