[英]Python reading URLs from file until last line
I have script which basically checks domain from the text file and finds its email.我有一个脚本,它基本上从文本文件中检查域并找到它的 email。 I want to add multiple domain names(line by line) then script should take each domain run the function and goes to second line after finishing.
我想添加多个域名(逐行)然后脚本应该让每个域运行 function 并在完成后进入第二行。 I tried to google for specific solution but not sure how do i find appropriate answer.
我试图用谷歌搜索具体的解决方案,但不知道如何找到合适的答案。
f = open("demo.txt", "r")
url = f.readline()
extractUrl(url)
def extractUrl(url):
try:
print("Searching emails... please wait")
count = 0
listUrl = []
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent': ua.random
})
try:
conn = urllib.request.urlopen(req, timeout=10)
status = conn.getcode()
contentType = conn.info().get_content_type()
html = conn.read().decode('utf-8')
emails = re.findall(
r '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}', html)
for email in emails:
if (email not in listUrl):
count += 1
print(str(count) + " - " + email)
listUrl.append(email)
print(str(count) + " emails were found")
Python files are iterable, so it's basically a simple as: Python 文件是可迭代的,所以基本上很简单:
for line in f:
extractUrl(line)
But you may want to do it right (ensure you close the file whatever happens, ignore possible empty lines etc):但是您可能希望正确执行(确保无论发生什么都关闭文件,忽略可能的空行等):
# use `with open(...)` to ensure the file will be correctly closed
with open("demo.txt", "r") as f:
# use `enumerate` to get line numbers too
#- we might need them for information
for lineno, line in enumerate(f, 1):
# make sure the line is clean (no leading / trailing whitespaces)
# and not empty:
line = line.strip()
# skip empty lines
if not line:
continue
# ok, this one _should_ match - but something could go wrong
try:
extractUrl(line)
except Exception as e:
# mentioning the line number in error report might help debugging
print("oops, failed to get urls for line {} ('{}') : {}".format(lineno, line, e))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.