简体   繁体   English

如何在python中删除文本的一部分

[英]how to delete a part of a text in python

I am pretty new to python so got stuck in this problem: 我对python很陌生,所以陷入了这个问题:

there is a txt file like 有一个txt文件,例如

blahh
blah
blah 
...
<start>
 some stuff
</start>
even more blah blah blah

I want to delete all the blah parts before the <start> and after the </start> . 我想删除<start>之前和</start>之后的所有空白部分。 (The main thing is coming from this link . I want to make the html stuff in the page by bs4, so I think I must first delete all the non-html parts. (主要是来自此链接 。我想通过bs4在页面中制作html内容,因此我认为我必须先删除所有非html部分。

Can someone please tell me What is the best way to do this? 有人可以告诉我最好的方法是什么吗? Appreciate any helps! 感谢任何帮助!

Nope, you don't need to delete the non-relevant part of the file. 是的,您不需要删除文件的无关部分。 Let the BeautifulSoup parse the complete file as is and find the tag you need: BeautifulSoup照原样解析完整的文件,然后找到所需的标签:

from urllib2 import urlopen
from bs4 import BeautifulSoup

url = 'http://www.sec.gov/Archives/edgar/data/70858/000119312507058027/0001193125-07-058027.txt'
soup = BeautifulSoup(urlopen(url))
print(soup.document)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM