
[英]git log — <file_name> works correct on terminal, but doing g.log(file_name) in git python shows error
[英]Python - name for file not correct - loop error
我有一个脚本,该脚本读取html文件,并从该文件中提取相关行。 但是我在打印文件名时遇到问题。 文件名是source1.html source2.html和source3.html。 而是打印source2.html source3.html,source4.html。
from bs4 import BeautifulSoup
import re
import os.path
n = 1
filename = "source"+str(n)+".html"
savefile = open('OUTPUT.csv', 'w')
while os.path.isfile(filename):
n = n+1
strjpgs = "Extracted Layers: \n \n"
file = open(filename, "r")
filename = "source"+str(n)+".html"
soup = BeautifulSoup (file, "html.parser")
thedata = soup.find("div", class_="cplayer")
strdata = str(thedata)
DoRegEx = re.compile('/([^/]+)\.jpg')
jpgs = DoRegEx.findall(strdata)
strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
savefile.write(filename + '\n')
savefile.write(strjpgs)
print(filename)
print(strjpgs)
savefile.close()
print "done"
您将n定义为1,然后在WHILE中立即将其增加到2。到打印(文件名)时,n为2,文件名已更改为“ Source2.html”。 移动打印或移动可变增量。
对于下一次迭代,您只需要在循环的开头移动打印语句,并在结尾移动增量即可:
while os.path.isfile(filename):
print(filename)
strjpgs = "Extracted Layers: \n \n"
file = open(filename, "r")
filename = "source"+str(n)+".html"
soup = BeautifulSoup (file, "html.parser")
thedata = soup.find("div", class_="cplayer")
strdata = str(thedata)
DoRegEx = re.compile('/([^/]+)\.jpg')
jpgs = DoRegEx.findall(strdata)
strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
savefile.write(filename + '\n')
savefile.write(strjpgs)
n = n+1
print(strjpgs)
您犯了一个逻辑错误,因为在存储变量n之前先对其进行了递增。 最简单的解决方案是将变量定义为0而不是1。下一个错误是您永远不会关闭html文件,因此使用open(“ filename”,'w')作为文件:这会在超出范围时自动关闭文件,并且更pythonic。
from bs4 import BeautifulSoup
import re
import os.path
n = 1
filename = "source"+str(n)+".html"
savefile = open('OUTPUT.csv', 'w')
if os.path.isfile(filename):
strjpgs = "Extracted Layers: \n \n"
while True:
with open(filename, "r") as file:
filename = "source"+str(n)+".html"
# parsing things...
savefile.write(filename + '\n')
savefile.write(strjpgs)
print(filename)
print(strjpgs)
if filename == "source3.html":
break
else:
n+=1
savefile.close()
print ("done")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.