创建一个文本文件来保存每个抓取的 URL 中的数据

Question

我做了以下工作：

for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)

with open("copy.txt", "w") as file:
    file.write(str(soup))

我想为每个抓取的 URL 创建一个文本文件。 目前它正在将所有内容保存到一个文件中。

Answer 1

每次使用不同的名称打开for循环内的文件。

id = 0
for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)
    # Save to "copy_1.txt", "copy_2.txt", etc
    id += 1
    with open(f"copy_{id}.txt", "w") as file:
        file.write(str(soup))

Answer 2

你快到了。 您只需要在 for 循环中创建文件，并在文件名中添加一个标识符，这样就不会被覆盖。

for url in url_list:
    # print (url)
    # concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    # print(soup)

    # I would use an identifier from the url however you can use an index instead.
    url_identifier = url_list.index(url)

    with open(f"copy_{url_identifier}.txt", "w") as file:
        file.write(str(soup))

创建一个文本文件来保存每个抓取的 URL 中的数据

问题描述

2 个解决方案

解决方案1
1 2019-12-30 20:14:44

解决方案2
0 2019-12-30 20:16:58

创建一个文本文件来保存每个抓取的 URL 中的数据

问题描述

2 个解决方案

解决方案1 1 2019-12-30 20:14:44

解决方案2 0 2019-12-30 20:16:58

解决方案1
1 2019-12-30 20:14:44

解决方案2
0 2019-12-30 20:16:58