简体   繁体   中英

Delete lines from a txt file after read

i'm trying to create a script which makes requests to random urls from a txt file

import urllib2

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 

But I want that when some url does 404 not found erase from the file. Each url is per line, so basically is to erase every url that does 404 not found. How to do it?!

You could write to a second file:

import urllib2

with open('urls.txt', 'r') as urls, open('urls2.txt', 'w') as urls2:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e

        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
            urls2.write(url + '\n')
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 

In order to delete lines from a file, you have to rewrite the entire contents of the file. The safest way to do that is to write out a new file in the same directory and then rename it over the old file. I'd modify your code like this:

import os
import sys
import tempfile
import urllib2

good_urls = set()

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            sys.stdout.write('[{}]: Up!\n'.format(url))
            good_urls.add(url)
        elif r.code == 404:
            sys.stdout.write('[{}]: Not found!\n'.format(url))
        else:
            sys.stdout.write('[{}]: Unexpected response code {}\n'.format(url, r.code))

tmp = None
try:
    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.txt', dir='.', delete=False)
    for url in sorted(good_urls):
        tmp.write(url + "\n")
    tmp.close()
    os.rename(tmp.name, 'urls.txt')
    tmp = None
finally:
    if tmp is not None:
        os.unlink(tmp.name)

You may want to add a good_urls.add(url) to the else clause in the first loop. If anyone knows a tidier way to do what I did with try-finally there at the end, I'd like to hear about it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM