創建一個文本文件來保存每個抓取的 URL 中的數據

Question

我做了以下工作：

for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)

with open("copy.txt", "w") as file:
    file.write(str(soup))

我想為每個抓取的 URL 創建一個文本文件。 目前它正在將所有內容保存到一個文件中。

Answer 1

每次使用不同的名稱打開for循環內的文件。

id = 0
for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)
    # Save to "copy_1.txt", "copy_2.txt", etc
    id += 1
    with open(f"copy_{id}.txt", "w") as file:
        file.write(str(soup))

Answer 2

你快到了。 您只需要在 for 循環中創建文件，並在文件名中添加一個標識符，這樣就不會被覆蓋。

for url in url_list:
    # print (url)
    # concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    # print(soup)

    # I would use an identifier from the url however you can use an index instead.
    url_identifier = url_list.index(url)

    with open(f"copy_{url_identifier}.txt", "w") as file:
        file.write(str(soup))

創建一個文本文件來保存每個抓取的 URL 中的數據

問題描述

2 個解決方案

解決方案1
1 2019-12-30 20:14:44

解決方案2
0 2019-12-30 20:16:58

創建一個文本文件來保存每個抓取的 URL 中的數據

問題描述

2 個解決方案

解決方案1 1 2019-12-30 20:14:44

解決方案2 0 2019-12-30 20:16:58

解決方案1
1 2019-12-30 20:14:44

解決方案2
0 2019-12-30 20:16:58