This code here gets a bunch of URLS from web archive.org and save them to a new TXT file. I want instead of INPUT (write one url address), to load a bunch of URLS from a TXT file. So x=input('URL:')
must be replaced with some code to load each line at a time from a txt file.
I've been trying for few days now, i'm stuck! Please help!
The code:
x=input('Enter your url:-')
r = requests.get('http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey'.format(x))
with open('url.txt', 'a') as f:
f.write('\n')
f.writelines(str(r.text))
f.write('\n')
First, you need to have all URLs in urls.txt file each separated by new lines and then open it with the readlines() function. it will return the list of all URLs. here is the complete code.
import requests
with open('urls.txt') as file:
# get the list of urls
urls_list=file.readlines()
for x in urls_list:
r = requests.get('http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey'.format(x))
print(r.status_code)
To read URLs from file you can use next example:
import requests
urls = []
with open("something.txt", "r") as f_in:
for line in map(str.strip, f_in):
if line == "":
continue
urls.append(line)
archive_url = "http://web.archive.org/cdx/search/cdx?url=*.{}&output=text&fl=original&collapse=urlkey"
with open("output.txt", "w") as f_out:
for url in urls:
print(url)
r = requests.get(archive_url.format(url))
print(r.text, file=f_out)
print("\n", file=f_out)
something.txt
contains domains, for example:
google.com
yahoo.com
output.txt
contains response from requests
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.