[英]Python - How to stop the loop
I have this where it reads a file called source1.html, source2.html, source3.html, but when it cant find the next file (because it doesnt exist) it gives me a error. 我有这个,它读取一个名为source1.html,source2.html,source3.html的文件,但当它找不到下一个文件(因为它不存在)时,它给了我一个错误。 there can be an x amount of sourceX.html, so i need something to say if the next sourcex.html file can not be found, stop the loop.
可能有x量的sourceX.html,所以我需要说一下如果找不到下一个sourcex.html文件,请停止循环。
Traceback (most recent call last): File "main.py", line 14, in file = open(filename, "r") IOError: [Errno 2] No such file or directory: 'source4.html
回溯(最近一次调用最后一次):文件“main.py”,第14行,在file = open(文件名,“r”)IOError:[Errno 2]没有这样的文件或目录:'source4.html
how can i stop the script looking for the next source file? 如何停止脚本寻找下一个源文件?
from bs4 import BeautifulSoup
import re
import os.path
n = 1
filename = "source" + str(n) + ".html"
savefile = open('OUTPUT.csv', 'w')
while os.path.isfile(filename):
strjpgs = "Extracted Layers: \n \n"
filename = "source" + str(n) + ".html"
n = n + 1
file = open(filename, "r")
soup = BeautifulSoup(file, "html.parser")
thedata = soup.find("div", class_="cplayer")
strdata = str(thedata)
DoRegEx = re.compile('/([^/]+)\.jpg')
jpgs = DoRegEx.findall(strdata)
strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
savefile.write(filename + '\n')
savefile.write(strjpgs)
print(filename)
print(strjpgs)
savefile.close()
print "done"
use a try / except
and break
使用
try / except
和break
while os.path.isfile(filename):
try: # try to do this
# <your code>
except FileNotFoundError: # if this error occurs
break # exit the loop
The reason your code doesn't currently work is you're checking the previous file exists in your while loop. 您的代码当前不起作用的原因是您正在检查while循环中是否存在以前的文件。 Not the next one.
不是下一个。 Hence you could also do
因此你也可以这样做
while True:
strjpgs = "Extracted Layers: \n \n"
filename = "source" + str(n) + ".html"
if not os.path.isfile(filename):
break
# <rest of your code>
you can try
opening file, and break
out of while loop once you catch an IOError
exception. 您可以
try
打开文件,并在捕获IOError
异常后break
while循环。
from bs4 import BeautifulSoup
import re
import os.path
n = 1
filename = "source" + str(n) + ".html"
savefile = open('OUTPUT.csv', 'w')
while os.path.isfile(filename):
try:
strjpgs = "Extracted Layers: \n \n"
filename = "source" + str(n) + ".html"
n = n + 1
file = open(filename, "r")
except IOError:
print("file not found! breaking out of loop.")
break
soup = BeautifulSoup(file, "html.parser")
thedata = soup.find("div", class_="cplayer")
strdata = str(thedata)
DoRegEx = re.compile('/([^/]+)\.jpg')
jpgs = DoRegEx.findall(strdata)
strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
savefile.write(filename + '\n')
savefile.write(strjpgs)
print(filename)
print(strjpgs)
savefile.close()
print "done"
This appears to be a sequence error. 这似乎是一个序列错误。 Let's look at a small fragment of your code, specifically lines dealing with
filename
: 让我们看一下代码的一小部分,特别是处理
filename
:
filename = "source" + str(n) + ".html"
while os.path.isfile(filename):
filename = "source" + str(n) + ".html"
n = n + 1
file = open(filename, "r")
You're generating the next filename before you open the file (or really, checking the old filename then opening a new one). 您在打开文件之前生成下一个文件名(或者实际上,检查旧文件名然后打开一个新文件名)。 It's a little hard to see because you're really updating
n
while filename
holds the previous number, but if we look at them in sequence it pops out: 这有点难以看到,因为你真正更新了
n
而filename
保留了之前的数字,但是如果我们按顺序查看它们会弹出:
n = 1
filename = "source1.html" # before loop
while os.path.isfile(filename):
filename = "source1.html" # first time inside loop
n = 2
open(filename)
while os.path.isfile(filename): # second time in loop - still source1
filename = "source2.html"
n = 3
open(filename) # We haven't checked if this file exists!
We can fix this a few ways. 我们可以用几种方法解决这个问题。 One is to move the entire updating,
n
before filename
, to the end of the loop. 一种是将整个更新(在
filename
之前的n
移动到循环的结尾。 Another is to let the loop mechanism update n
, which is a sight easier (the real fix here is that we only use one filename
value in each iteration of the loop): 另一个是让循环机制更新
n
,这更容易(这里真正的解决方案是我们在循环的每次迭代中只使用一个filename
值):
for n in itertools.count(1):
filename = "source{}.html".format(n)
if not os.path.isfile(filename):
break
file = open(filename, "r")
#...
At the risk of looking rather obscure, we can also express the steps functionally (I'm using six here to avoid a difference between Python 2 and 3; Python 2's map wouldn't finish): 冒着看起来相当模糊的风险,我们也可以在功能上表达这些步骤(我在这里使用六个来避免Python 2和3之间的差异; Python 2的地图不会完成):
from six.moves import map
from itertools import count, takewhile
numbers = count(1)
filenames = map('source{}.html'.format, numbers)
existingfiles = takewhile(os.path.isfile, filenames)
for filename in existingfiles:
file = open(filename, "r")
#...
Other options include iterating over the numbers alone and using break
when isfile
returns False
, or simply catching the exception when open
fails (eliminating the need for isfile
entirely). 其他选项包括单独迭代数字并在
isfile
返回False
时使用break
,或者在open
失败时简单地捕获异常(完全不需要isfile
)。
I'll suggest you to use os.path.exists() (which returns True/False)
and os.path.isfile() both. 我建议你使用os.path.exists()
(which returns True/False)
和os.path.isfile() 。
Use with statement to open file. 使用with语句打开文件。 It is Pythonic way to open files.
它是Pythonic打开文件的方式。
with statement is best preferred among the professional coders.
有说法是最好的专业编码器中优选。
These are the contents of my current working directory. 这些是我当前工作目录的内容。
H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles>dir
Volume in drive H is New Volume
Volume Serial Number is C867-828E
Directory of H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles
11/05/2018 16:12 <DIR> .
11/05/2018 16:12 <DIR> ..
11/05/2018 15:54 106 source1.html
11/05/2018 15:54 106 source2.html
11/05/2018 15:54 106 source3.html
11/05/2018 16:12 0 stopReadingIfNot.md
11/05/2018 16:11 521 stopReadingIfNot.py
5 File(s) 839 bytes
2 Dir(s) 196,260,925,440 bytes free
The below Python code shows how will you read files source1.html, source2.html, source.3.html and stop if there is no more files of the form sourceX.html (where X is 1, 2, 3, 4, ... etc.). 下面的Python代码显示了如何读取文件source1.html,source2.html,source.3.html并停止,如果没有更多的形式sourceX.html文件(其中X是1,2,3,4 ,. ..等)。
import os
n = 1;
html_file_name = 'source%d.html'
# It is necessary to check if sourceX.html is file or directory.
# If it is directory the check it if it exists or not.
# It it exists then perform operation (read/write etc.) on file.
while os.path.isfile(html_file_name % (n)) and os.path.exists(html_file_name % (n)):
print "Reading ", html_file_name % (n)
# The best way (Pythonic way) to open file
# You don't need to bother about closing the file
# It will be taken care by with statement
with open(html_file_name % (n), "r") as file:
# Make sure it works
print html_file_name % (n), " exists\n";
n += 1;
H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles>python stopReadingIfNot.py
Reading source1.html
source1.html exists
Reading source2.html
source2.html exists
Reading source3.html
source3.html exists
So based on the above logic. 所以基于上面的逻辑。 you can modify your code.
你可以修改你的代码。 It will work.
它会工作。
Thanks. 谢谢。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.