繁体   English   中英

创建一个文本文件来保存每个抓取的 URL 中的数据

[英]Create a text file to save data from each scraped URL

我做了以下工作:

for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)

with open("copy.txt", "w") as file:
    file.write(str(soup))

我想为每个抓取的 URL 创建一个文本文件。 目前它正在将所有内容保存到一个文件中。

每次使用不同的名称打开for循环内的文件。

id = 0
for url in url_list:
    print (url)
    #concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    print(soup)
    # Save to "copy_1.txt", "copy_2.txt", etc
    id += 1
    with open(f"copy_{id}.txt", "w") as file:
        file.write(str(soup))

你快到了。 您只需要在 for 循环中创建文件,并在文件名中添加一个标识符,这样就不会被覆盖。

for url in url_list:
    # print (url)
    # concat URL with http// and send it
    if url[:4] != 'http':
        url = 'http://' + url

    response = requests.get(url)
    response
    soup = BeautifulSoup(response.text, "html.parser")
    # print(soup)

    # I would use an identifier from the url however you can use an index instead.
    url_identifier = url_list.index(url)

    with open(f"copy_{url_identifier}.txt", "w") as file:
        file.write(str(soup))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM