简体   繁体   English

BeautifulSoup:获取标签文本至特定标签

[英]BeautifulSoup: Get tag text up to certain tag

I would like to get all of the displayed text on an HTML page up until a certain tag is hit. 我想要在HTML页面上显示所有显示的文本,直到命中某个标签。 For example, I would like to get all of the displayed text on a page up until a tag with the id "end_content" is hit. 例如,我想获取页面上所有显示的文本,直到找到ID为“ end_content”的标签为止。

Is there a way to do this with BeautifulSoup? 有办法用BeautifulSoup做到这一点吗? This would be similar to the soup.get_text() method, except it would just stop fetching text after it hits a tag with the id "end_content". 这将与soup.get_text()方法类似,不同的是它将在击中ID为“ end_content”的标签后停止获取文本。

I would do the following: 我将执行以下操作:

html = (
    '<h1>HEY!</h1>'
    '<div>'
        'How are'
        '<h2>you?</h2>'
        '<div id="end_content">END</div>'
    '</div>'
    'Some other text'
)

soup = BeautifulSoup(html, 'lxml')
>>> soup.select_one('#end_content').find_all_previous(string=True)[::-1]
['HEY!', 'How are', 'you?']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM