简体   繁体   English

Python - 如何停止循环

[英]Python - How to stop the loop

I have this where it reads a file called source1.html, source2.html, source3.html, but when it cant find the next file (because it doesnt exist) it gives me a error. 我有这个,它读取一个名为source1.html,source2.html,source3.html的文件,但当它找不到下一个文件(因为它不存在)时,它给了我一个错误。 there can be an x amount of sourceX.html, so i need something to say if the next sourcex.html file can not be found, stop the loop. 可能有x量的sourceX.html,所以我需要说一下如果找不到下一个sourcex.html文件,请停止循环。

Traceback (most recent call last): File "main.py", line 14, in file = open(filename, "r") IOError: [Errno 2] No such file or directory: 'source4.html 回溯(最近一次调用最后一次):文件“main.py”,第14行,在file = open(文件名,“r”)IOError:[Errno 2]没有这样的文件或目录:'source4.html

how can i stop the script looking for the next source file? 如何停止脚本寻找下一个源文件?

from bs4 import BeautifulSoup
import re
import os.path

n = 1
filename = "source" + str(n) + ".html"
savefile = open('OUTPUT.csv', 'w')

while os.path.isfile(filename):

    strjpgs = "Extracted Layers: \n \n"
    filename = "source" + str(n) + ".html"
    n = n + 1
    file = open(filename, "r")
    soup = BeautifulSoup(file, "html.parser")
    thedata = soup.find("div", class_="cplayer")
    strdata = str(thedata)
    DoRegEx = re.compile('/([^/]+)\.jpg')
    jpgs = DoRegEx.findall(strdata)
    strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
    savefile.write(filename + '\n')
    savefile.write(strjpgs)

    print(filename)
    print(strjpgs)

savefile.close()
print "done"

use a try / except and break 使用try / exceptbreak

while os.path.isfile(filename):
    try:  # try to do this
         # <your code>
    except FileNotFoundError:  # if this error occurs
         break  # exit the loop

The reason your code doesn't currently work is you're checking the previous file exists in your while loop. 您的代码当前不起作用的原因是您正在检查while循环中是否存在以前的文件。 Not the next one. 不是下一个。 Hence you could also do 因此你也可以这样做

 while True:
     strjpgs = "Extracted Layers: \n \n"
     filename = "source" + str(n) + ".html"
     if not os.path.isfile(filename):
          break
     # <rest of your code>

you can try opening file, and break out of while loop once you catch an IOError exception. 您可以try打开文件,并在捕获IOError异常后break while循环。

from bs4 import BeautifulSoup
import re
import os.path

n = 1
filename = "source" + str(n) + ".html"
savefile = open('OUTPUT.csv', 'w')

while os.path.isfile(filename):

    try:
      strjpgs = "Extracted Layers: \n \n"
      filename = "source" + str(n) + ".html"
      n = n + 1
      file = open(filename, "r")
    except IOError:
      print("file not found! breaking out of loop.")
      break

    soup = BeautifulSoup(file, "html.parser")
    thedata = soup.find("div", class_="cplayer")
    strdata = str(thedata)
    DoRegEx = re.compile('/([^/]+)\.jpg')
    jpgs = DoRegEx.findall(strdata)
    strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
    savefile.write(filename + '\n')
    savefile.write(strjpgs)

    print(filename)
    print(strjpgs)

savefile.close()
print "done"

This appears to be a sequence error. 这似乎是一个序列错误。 Let's look at a small fragment of your code, specifically lines dealing with filename : 让我们看一下代码的一小部分,特别是处理filename

filename = "source" + str(n) + ".html"

while os.path.isfile(filename):

    filename = "source" + str(n) + ".html"
    n = n + 1
    file = open(filename, "r")

You're generating the next filename before you open the file (or really, checking the old filename then opening a new one). 您在打开文件之前生成下一个文件名(或者实际上,检查文件名然后打开一个新文件名)。 It's a little hard to see because you're really updating n while filename holds the previous number, but if we look at them in sequence it pops out: 这有点难以看到,因为你真正更新了nfilename保留了之前的数字,但是如果我们按顺序查看它们会弹出:

n = 1
filename = "source1.html"   # before loop
while os.path.isfile(filename):
 filename = "source1.html"   # first time inside loop
 n = 2
 open(filename)
while os.path.isfile(filename):  # second time in loop - still source1
 filename = "source2.html"
 n = 3
 open(filename)    # We haven't checked if this file exists!

We can fix this a few ways. 我们可以用几种方法解决这个问题。 One is to move the entire updating, n before filename , to the end of the loop. 一种是将整个更新(在filename之前的n移动到循环的结尾。 Another is to let the loop mechanism update n , which is a sight easier (the real fix here is that we only use one filename value in each iteration of the loop): 另一个是让循环机制更新n ,这更容易(这里真正的解决方案是我们在循环的每次迭代中只使用一个filename值):

for n in itertools.count(1):
    filename = "source{}.html".format(n)
    if not os.path.isfile(filename):
        break
    file = open(filename, "r")
    #...

At the risk of looking rather obscure, we can also express the steps functionally (I'm using six here to avoid a difference between Python 2 and 3; Python 2's map wouldn't finish): 冒着看起来相当模糊的风险,我们也可以在功能上表达这些步骤(我在这里使用六个来避免Python 2和3之间的差异; Python 2的地图不会完成):

from six.moves import map
from itertools import count, takewhile

numbers = count(1)
filenames = map('source{}.html'.format, numbers)
existingfiles = takewhile(os.path.isfile, filenames)

for filename in existingfiles:
    file = open(filename, "r")
    #...

Other options include iterating over the numbers alone and using break when isfile returns False , or simply catching the exception when open fails (eliminating the need for isfile entirely). 其他选项包括单独迭代数字并在isfile返回False时使用break ,或者在open失败时简单地捕获异常(完全不需要isfile )。

I'll suggest you to use os.path.exists() (which returns True/False) and os.path.isfile() both. 我建议你使用os.path.exists() (which returns True/False)os.path.isfile()

Use with statement to open file. 使用with语句打开文件。 It is Pythonic way to open files. 它是Pythonic打开文件的方式。

with statement is best preferred among the professional coders. 说法是最好的专业编码器中优选。

These are the contents of my current working directory. 这些是我当前工作目录的内容。

H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles>dir
 Volume in drive H is New Volume
 Volume Serial Number is C867-828E

 Directory of H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles

11/05/2018  16:12    <DIR>          .
11/05/2018  16:12    <DIR>          ..
11/05/2018  15:54               106 source1.html
11/05/2018  15:54               106 source2.html
11/05/2018  15:54               106 source3.html
11/05/2018  16:12                 0 stopReadingIfNot.md
11/05/2018  16:11               521 stopReadingIfNot.py
               5 File(s)            839 bytes
               2 Dir(s)  196,260,925,440 bytes free

The below Python code shows how will you read files source1.html, source2.html, source.3.html and stop if there is no more files of the form sourceX.html (where X is 1, 2, 3, 4, ... etc.). 下面的Python代码显示了如何读取文件source1.html,source2.html,source.3.html并停止,如果没有更多的形式sourceX.html文件(其中X是1,2,3,4 ,. ..等)。

Sample code: 示例代码:

import os

n = 1;
html_file_name = 'source%d.html'

# It is necessary to check if sourceX.html is file or directory.
# If it is directory the check it if it exists or not.
# It it exists then perform operation (read/write etc.) on file.
while os.path.isfile(html_file_name % (n)) and os.path.exists(html_file_name % (n)):
    print "Reading ", html_file_name % (n)

    # The best way (Pythonic way) to open file
    # You don't need to bother about closing the file
    # It will be taken care by with statement
    with open(html_file_name % (n), "r") as file:
        # Make sure it works
        print html_file_name % (n), " exists\n"; 

    n += 1;

Output: 输出:

H:\RishikeshAgrawani\Projects\Stk\ReadHtmlFiles>python stopReadingIfNot.py
Reading  source1.html
source1.html  exists

Reading  source2.html
source2.html  exists

Reading  source3.html
source3.html  exists

So based on the above logic. 所以基于上面的逻辑。 you can modify your code. 你可以修改你的代码。 It will work. 它会工作。

Thanks. 谢谢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM