简体   繁体   English

打开网站并使用python处理html

[英]Open website and edeting the html with python

am a little stuck. 有点卡住。 The program is supposed to open a website and read save it in a file. 该程序应该打开一个网站并阅读将其保存在文件中。 Then it is supposd to read everything up til it finds a string and delete everything before it and save it again in a new file. 然后假定将所有内容全部读取直到找到一个字符串,然后删除之前的所有内容并将其再次保存在新文件中。 But when i run it i get the first file with the html in and the second file i am trying to make turns out to be blank. 但是,当我运行它时,我得到的第一个文件带有html,第二个文件我试图显示为空白。 Anyone that can point me in the right direction? 有人能指出我正确的方向吗?

import fileinput
import re
import requests
import sys

#linkToGet=sys.argv[1]                  //Hvordan hente link fra terminalen
#r = requests.get(linkToGet)

#nameOfFile=sys.argv[2]

#Hent nettsiden og lagre kildekoden som en textfil
r = requests.get('https://www.bibel.no/Nettbibelen?query=ud8MMrJeKwHNJdqN05oJoRgo89+A24MHmKzQYWJRSygk2+FVqgPK3UvcYb+xB3j7')  #Bare sånn jeg kan builde enkelt fra Atom
print (r.text)
f= open("kap3.txt","w+")
f.write(r.text)
f.close

#Fjern all tekst frem til en linje

TAG = """<A HREF="/Nettbibelen?query=ud8MMrJeKwHNJdqN05oJoc7CfBH5MjZKa4lw+sXwPrCzmbEZmCUXfQz2ApCFmHAq" class='versechapter'>50</A> """

tag_found = False
with open('kap3.txt') as in_file:
    with open('kap3ren.txt', 'w') as out_file:
        for line in in_file:
            if not tag_found:
                if line.strip() == TAG:
                    tag_found = True
            else:
                out_file.write(line)

It looks like you are only calling out_file.write(line) if you HAVE found the line you are looking for, your else satement should be indented to be for the inner if. 好像您只是在调用out_file.write(line)如果您找到了要查找的行,则应将else缩进为内部if。

for line in in_file:
    if not tag_found:
        if line.strip() == TAG:
            tag_found = True
        else:
            out_file.write(line)

Of course this makes the outer if basically useless so it can be simplified to this: 当然,这会使外部基本无用,因此可以简化为:

for line in in_file:
    if line.strip() == TAG:
        # you're done here so you can break the loop
        break
    else:
        out_file.write(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM