简体   繁体   English

Python 从文件中读取 URL 直到最后一行

[英]Python reading URLs from file until last line

I have script which basically checks domain from the text file and finds its email.我有一个脚本,它基本上从文本文件中检查域并找到它的 email。 I want to add multiple domain names(line by line) then script should take each domain run the function and goes to second line after finishing.我想添加多个域名(逐行)然后脚本应该让每个域运行 function 并在完成后进入第二行。 I tried to google for specific solution but not sure how do i find appropriate answer.我试图用谷歌搜索具体的解决方案,但不知道如何找到合适的答案。

f = open("demo.txt", "r")
    url = f.readline()
     extractUrl(url)


       def extractUrl(url):
            try:
            print("Searching emails... please wait")
        count = 0
        listUrl = []

        req = urllib.request.Request(
            url,
            data=None,
            headers={
                'User-Agent': ua.random
            })
        try:
        conn = urllib.request.urlopen(req, timeout=10)
        status = conn.getcode()
        contentType = conn.info().get_content_type()
        html = conn.read().decode('utf-8')
        emails = re.findall(
            r '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}', html)

        for email in emails:
            if (email not in listUrl):
                count += 1
        print(str(count) + " - " + email)
        listUrl.append(email)
        print(str(count) + " emails were found")

Python files are iterable, so it's basically a simple as: Python 文件是可迭代的,所以基本上很简单:

for line in f:
    extractUrl(line)

But you may want to do it right (ensure you close the file whatever happens, ignore possible empty lines etc):但是您可能希望正确执行(确保无论发生什么都关闭文件,忽略可能的空行等):

# use `with open(...)` to ensure the file will be correctly closed
with open("demo.txt", "r") as f:

    # use `enumerate` to get line numbers too 
    #- we might need them for information  
    for lineno, line in enumerate(f, 1): 

        # make sure the line is clean (no leading / trailing whitespaces)
        # and not empty:
        line = line.strip()

        # skip empty lines
        if not line: 
            continue

         # ok, this one _should_ match - but something could go wrong
         try:
             extractUrl(line)
         except Exception as e:
             # mentioning the line number in error report might help debugging
             print("oops, failed to get urls for line {} ('{}') : {}".format(lineno, line, e))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM