繁体   English   中英

Python-文件名不正确-循环错误

[英]Python - name for file not correct - loop error

我有一个脚本,该脚本读取html文件,并从该文件中提取相关行。 但是我在打印文件名时遇到问题。 文件名是source1.html source2.html和source3.html。 而是打印source2.html source3.html,source4.html。

from bs4 import BeautifulSoup
import re
import os.path

n = 1
filename = "source"+str(n)+".html"
savefile = open('OUTPUT.csv', 'w')

while os.path.isfile(filename):
    n = n+1
    strjpgs = "Extracted Layers: \n \n"
    file = open(filename, "r")
    filename = "source"+str(n)+".html"


    soup = BeautifulSoup (file, "html.parser")

    thedata = soup.find("div", class_="cplayer")
    strdata = str(thedata)

    DoRegEx = re.compile('/([^/]+)\.jpg')
    jpgs = DoRegEx.findall(strdata)
    strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
    savefile.write(filename + '\n')
    savefile.write(strjpgs)

    print(filename)
    print(strjpgs)

savefile.close()
print "done"

您将n定义为1,然后在WHILE中立即将其增加到2。到打印(文件名)时,n为2,文件名已更改为“ Source2.html”。 移动打印或移动可变增量。

对于下一次迭代,您只需要在循环的开头移动打印语句,并在结尾移动增量即可:

while os.path.isfile(filename):
    print(filename)

    strjpgs = "Extracted Layers: \n \n"
    file = open(filename, "r")
    filename = "source"+str(n)+".html"
    soup = BeautifulSoup (file, "html.parser")

    thedata = soup.find("div", class_="cplayer")
    strdata = str(thedata)

    DoRegEx = re.compile('/([^/]+)\.jpg')
    jpgs = DoRegEx.findall(strdata)
    strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
    savefile.write(filename + '\n')
    savefile.write(strjpgs)

    n = n+1
    print(strjpgs)

您犯了一个逻辑错误,因为在存储变量n之前先对其进行了递增。 最简单的解决方案是将变量定义为0而不是1。下一个错误是您永远不会关闭html文件,因此使用open(“ filename”,'w')作为文件:这会在超出范围时自动关闭文件,并且更pythonic。

from bs4 import BeautifulSoup
import re
import os.path

n = 1
filename = "source"+str(n)+".html"
savefile = open('OUTPUT.csv', 'w')

if os.path.isfile(filename):

    strjpgs = "Extracted Layers: \n \n"
    while True:
        with open(filename, "r") as file:
            filename = "source"+str(n)+".html"

            # parsing things...

            savefile.write(filename + '\n')
            savefile.write(strjpgs)

            print(filename)
            print(strjpgs)

        if filename == "source3.html":
            break
        else:
            n+=1

savefile.close()
print ("done")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM