简体   繁体   English

BeautifulSoup Python将输出链接保存到txt文件

[英]BeautifulSoup Python saving Output links to txt file

I am attempting to use BeautifulSoup to gather links off a webpage. 我正在尝试使用BeautifulSoup来收集网页上的链接。 So far I have been able to do this and print them off in the command prompt using a print command that is commented out of the code currently. 到目前为止,我已经能够做到这一点,并使用当前代码注释掉的打印命令在命令提示符下将其打印出来。 The problem I am running into is when the links are saved to the Output.txt file, they all override each other and only the last link is saved. 我遇到的问题是将链接保存到Output.txt文件时,它们全部相互覆盖,并且仅保存了最后一个链接。 Any help is most appreciated! 任何帮助深表感谢!

If you have any pointers on making this transition all in one program see my end goal. 如果您有任何关于在一个程序中进行全部转换的指示,请参阅我的最终目标。 My end goal is to then search through the links in the txt file to determine if it has specific text in them. 我的最终目标是搜索txt文件中的链接,以确定其中是否包含特定文本。 If they do, I want to return "Broken Link" or "Not Broken". 如果他们这样做,我想返回“ Broken Link”或“ Not Broken”。

soup = BeautifulSoup(html_doc) #html doc is source code for website i am using

for link in soup.find_all(rel="bookmark"):
  Gamma =(link.get('href'))
  f =open('Output.txt','w')
  f.write(Gamma)
  f.close()
  #print(Gamma)

You need to open up the file for writing before the loop and call write() inside: 您需要在循环之前打开要写入的文件,然后在其中调用write()

soup = BeautifulSoup(html_doc)

with open('Output.txt','w') as f:
    for link in soup.find_all(rel="bookmark"):
        f.write(link.get('href'))

Also, note that using with context manager helps here not to worry about closing the file manually. 另外,请注意, with上下文管理器with使用可以帮助您不必担心手动关闭文件。

Just make it "append" mode, by replacing "w" for "a". 只需将“ w”替换为“ a”,使其成为“附加”模式。

soup = BeautifulSoup(html_doc) #html doc is source code for website i am using

for link in soup.find_all(rel="bookmark"):
  Gamma =(link.get('href'))
  f =open('Output.txt','a')
  f.write("{gamma}\n".format(gamma=Gamma))
  f.close()
  #print(Gamma)`enter code here`

As others have stated, you need to append your file. 正如其他人所述,您需要附加文件。 Also its more efficient to to do a single open and close. 进行单个打开和关闭操作也更加高效。

f = open ('Output.txt', 'a')
for link in soup.find_all(rel="bookmark")
   Gamma =(link.get('href')
   f.write(Gamma + '\n')
f.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM