從文件中的URL讀取內容

Question

我正在嘗試從主URL獲取其他子集URL。 但是，當我打印以查看是否得到內容時，我注意到我只是得到HTML，而不是其中的URL。

import urllib
file = 'http://example.com'

with urllib.request.urlopen(file) as url:
    collection = url.read().decode('UTF-8')

Answer 1

我認為這就是您想要的。 您可以使用python的漂亮湯庫，並且此代碼應與python3一起使用

    import urllib
    from urllib.request import urlopen
    from bs4 import BeautifulSoup

    def get_all_urls(url):
        open = urlopen(url)
        url_html = BeautifulSoup(open, 'html.parser')
        for link in url_html.find_all('a'):
            links = str(link.get('href'))
            if links.startswith('http'):
                print(links)
            else:
                print(url + str(links))
    get_all_urls('url.com')

從文件中的URL讀取內容

問題描述

1 個解決方案

解決方案1
1 2018-07-26 05:51:47

從文件中的URL讀取內容

問題描述

1 個解決方案

解決方案1 1 2018-07-26 05:51:47

解決方案1
1 2018-07-26 05:51:47