简体   繁体   中英

Python Requests: take all lines from a TXT file, one at a time to get requests from each and save them to a new TXT file

This code here gets a bunch of URLS from web archive.org and save them to a new TXT file. I want instead of INPUT (write one url address), to load a bunch of URLS from a TXT file. So x=input('URL:') must be replaced with some code to load each line at a time from a txt file.

I've been trying for few days now, i'm stuck! Please help!

The code:

x=input('Enter your url:-')
r = requests.get('http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey'.format(x))
with open('url.txt', 'a') as f:
    f.write('\n')
    f.writelines(str(r.text))
    f.write('\n')

First, you need to have all URLs in urls.txt file each separated by new lines and then open it with the readlines() function. it will return the list of all URLs. here is the complete code.

import requests
with open('urls.txt') as file:
    # get the list of urls
    urls_list=file.readlines()
    for x in urls_list:
        r = requests.get('http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey'.format(x))
        print(r.status_code)

To read URLs from file you can use next example:

import requests

urls = []
with open("something.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        if line == "":
            continue
        urls.append(line)

archive_url = "http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey"

with open("output.txt", "w") as f_out:
    for url in urls:
        print(url)
        r = requests.get(archive_url.format(url))
        print(r.text, file=f_out)
        print("\n", file=f_out)

something.txt contains domains, for example:

google.com
yahoo.com

output.txt contains response from requests

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM