简体   繁体   中英

Reading URLs from a file in python?

Hey guys so I am trying to read URLs from a file and print if the URLs exist/are reachable or not?? I am not sure why this code is not working: (I am reading the urls from a .txt file)

the error I am getting is:

   name 'in_file' is not defined

code:

from urllib.request import urlopen

def is_reachable(url):
   if urlopen(url): 
      return True
   else: 
      return False

in_file_name = input("Enter file name: ")
try:
   in_file = open(in_file_name, "r")
except:
   print("Cannot open " + in_file)

line = in_file.readline().replace(" ", "")
print(line)

counter = 0
while line != "":
  if is_reachable(line) == True:
    counter += 1
    print("The URL on line ", counter, "is unreachable!")
    line = in_file.readline()

There should be an else before printing unreachable. or a not checked to print unreachable urls. Right now even though the url is reachable you are printing its unreachable.

counter = 0
while line != "":
    counter += 1
    if not is_reachable(line):
        print("The URL on line ", counter, "is unreachable!")
    line = in_file.readline()

there are other issues with your program: 1. If the file is not readable still your program continues 2. you are using a counter variable and explicitly maintaining it. You can easily use enumerate

A better approach would be:

from urllib.request import urlopen
import sys

def is_reachable(url):
    try: 
        urlopen(url)
        return True
    except: 
        return False

in_file_name = input("Enter file name: ")
lines = []
try:
    with open(in_file_name, 'r') as f:
        lines = f.read().splitlines()
except:
    print("Cannot open " + in_file_name)
    sys.exit(1)

for counter, line in enumerate(lines):
    if is_reachable(line):
        print("The URL on line ", counter, "is reachable!")
    else:
        print("The URL on line ", counter, "is unreachable!")

You should exit the script if you can't open the file. As your code is currently written, you print an exception if the file can't be opened, then try to run the rest of the code anyway.

One quick fix:

in_file_name = input("Enter file name: ")
try:
  in_file = open(in_file_name, "r")
except:
   print("Cannot open " + in_file)
   sys.exit(1) ### you will need to import the sys module

Also, your output is wrong. You're saying it's UNREACHABLE if urlopen returns True, when you should be printing that it's REACHABLE.

Finally, in is_reachable, you need to handle a likely exception if there is a resolution problem with the URL you're trying to open:

def is_reachable(url):
    try: 
      urlopen(url): 
      return True
    except urllib.error.URLError: 
      return False

You have error in your code:

except:
    print('yadaya ' + in_file_name) # you have used in_file

I have not test this but is should work:

from urllib2 import urlopen # urllib is deprecated

if urlopen('http://google.com').getcode() >= 200 and urlopen('http://google.com') < 400:
    print ('Yes the URL exists and works.')

You will have to work extra for following redirection.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM