从文件中的URL读取内容

Question

I'm trying to get other subset URLs from a main URL. 我正在尝试从主URL获取其他子集URL。 However,as I print to see if I get the content, I noticed that I am only getting the HTML, not the URLs within it. 但是，当我打印以查看是否得到内容时，我注意到我只是得到HTML，而不是其中的URL。

import urllib
file = 'http://example.com'

with urllib.request.urlopen(file) as url:
    collection = url.read().decode('UTF-8')

Answer 1

I think this is what you are looking for. 我认为这就是您想要的。 You can use beautiful soup library of python and this code should work with python3 您可以使用python的漂亮汤库，并且此代码应与python3一起使用

    import urllib
    from urllib.request import urlopen
    from bs4 import BeautifulSoup

    def get_all_urls(url):
        open = urlopen(url)
        url_html = BeautifulSoup(open, 'html.parser')
        for link in url_html.find_all('a'):
            links = str(link.get('href'))
            if links.startswith('http'):
                print(links)
            else:
                print(url + str(links))
    get_all_urls('url.com')

从文件中的URL读取内容

问题描述

1 个解决方案

解决方案1
1 2018-07-26 05:51:47

从文件中的URL读取内容

问题描述

1 个解决方案

解决方案1 1 2018-07-26 05:51:47

解决方案1
1 2018-07-26 05:51:47