简体   繁体   English

从python中的文件中读取URL?

[英]Reading URLs from a file in python?

Hey guys so I am trying to read URLs from a file and print if the URLs exist/are reachable or not?? 大家好,我想从文件中读取URL并打印出URL是否存在/是否可访问? I am not sure why this code is not working: (I am reading the urls from a .txt file) 我不确定为什么此代码无法正常工作:(我正在从.txt文件读取网址)

the error I am getting is: 我得到的错误是:

   name 'in_file' is not defined

code: 码:

from urllib.request import urlopen

def is_reachable(url):
   if urlopen(url): 
      return True
   else: 
      return False

in_file_name = input("Enter file name: ")
try:
   in_file = open(in_file_name, "r")
except:
   print("Cannot open " + in_file)

line = in_file.readline().replace(" ", "")
print(line)

counter = 0
while line != "":
  if is_reachable(line) == True:
    counter += 1
    print("The URL on line ", counter, "is unreachable!")
    line = in_file.readline()

There should be an else before printing unreachable. 在打印无法访问之前,应该有其他选项。 or a not checked to print unreachable urls. 或未选中以打印无法访问的网址。 Right now even though the url is reachable you are printing its unreachable. 现在,即使该URL是可访问的,您也正在打印其不可访问的URL。

counter = 0
while line != "":
    counter += 1
    if not is_reachable(line):
        print("The URL on line ", counter, "is unreachable!")
    line = in_file.readline()

there are other issues with your program: 1. If the file is not readable still your program continues 2. you are using a counter variable and explicitly maintaining it. 程序还有其他问题:1.如果文件仍然无法读取,则程序继续2.您正在使用计数器变量并显式维护它。 You can easily use enumerate 您可以轻松使用枚举

A better approach would be: 更好的方法是:

from urllib.request import urlopen
import sys

def is_reachable(url):
    try: 
        urlopen(url)
        return True
    except: 
        return False

in_file_name = input("Enter file name: ")
lines = []
try:
    with open(in_file_name, 'r') as f:
        lines = f.read().splitlines()
except:
    print("Cannot open " + in_file_name)
    sys.exit(1)

for counter, line in enumerate(lines):
    if is_reachable(line):
        print("The URL on line ", counter, "is reachable!")
    else:
        print("The URL on line ", counter, "is unreachable!")

You should exit the script if you can't open the file. 如果无法打开文件,则应退出脚本。 As your code is currently written, you print an exception if the file can't be opened, then try to run the rest of the code anyway. 当前正在编写代码时,如果无法打开文件,则会打印一个异常,然后尝试仍然运行其余代码。

One quick fix: 快速修复:

in_file_name = input("Enter file name: ")
try:
  in_file = open(in_file_name, "r")
except:
   print("Cannot open " + in_file)
   sys.exit(1) ### you will need to import the sys module

Also, your output is wrong. 另外,您的输出是错误的。 You're saying it's UNREACHABLE if urlopen returns True, when you should be printing that it's REACHABLE. 您说的是,如果urlopen返回True,则在打印时它是REACHABLE。

Finally, in is_reachable, you need to handle a likely exception if there is a resolution problem with the URL you're trying to open: 最后,在is_reachable中,如果您尝试打开的URL出现解析问题,则需要处理可能的异常:

def is_reachable(url):
    try: 
      urlopen(url): 
      return True
    except urllib.error.URLError: 
      return False

You have error in your code: 您的代码中有错误:

except:
    print('yadaya ' + in_file_name) # you have used in_file

I have not test this but is should work: 我还没有测试,但是应该可以工作:

from urllib2 import urlopen # urllib is deprecated

if urlopen('http://google.com').getcode() >= 200 and urlopen('http://google.com') < 400:
    print ('Yes the URL exists and works.')

You will have to work extra for following redirection. 您将需要额外的工作以进行以下重定向。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM