get all links that avalibale in a website using python?

Question

are there are any way using python to get all links in the web site not only in the web page ? I tried this code but that's give me only links in the web page

import urllib2
import re

#connect to a URL
website = urllib2.urlopen('http://www.example.com/')

#read html code
html = website.read()

#use re.findall to get all the links
links = re.findall('"((http|ftp)s?://.*?)"', html)

print links

Answer 1

Visit recursively the links you have gathered and scrap these pages too:

import urllib2
import re

stack = ['http://www.example.com/']
results = []

while len(stack) > 0:

    url = stack.pop()
    #connect to a URL
    website = urllib2.urlopen(url)

    #read html code
    html = website.read()

    #use re.findall to get all the links
    # you should not only gather links with http/ftps but also relative links
    # you could use beautiful soup for that (if you want <a> links)
    links = re.findall('"((http|ftp)s?://.*?)"', html)

    result.extend([link in links if is_not_relative_link(link)])

    for link in links:
        if link_is_valid(link): #this function has to be written
            stack.push(link)

get all links that avalibale in a website using python?

Question

1 answers

solution1
0 2016-02-29 14:44:09

get all links that avalibale in a website using python?

Question

1 answers

solution1 0 2016-02-29 14:44:09

solution1
0 2016-02-29 14:44:09