BeautifulSoup: Get tag text up to certain tag

Question

I would like to get all of the displayed text on an HTML page up until a certain tag is hit. For example, I would like to get all of the displayed text on a page up until a tag with the id "end_content" is hit.

Is there a way to do this with BeautifulSoup? This would be similar to the soup.get_text() method, except it would just stop fetching text after it hits a tag with the id "end_content".

Answer 1

I would do the following:

html = (
    '<h1>HEY!</h1>'
    '<div>'
        'How are'
        '<h2>you?</h2>'
        '<div id="end_content">END</div>'
    '</div>'
    'Some other text'
)

soup = BeautifulSoup(html, 'lxml')

>>> soup.select_one('#end_content').find_all_previous(string=True)[::-1]
['HEY!', 'How are', 'you?']

BeautifulSoup: Get tag text up to certain tag

Question

1 answers

solution1
2 2018-05-29 08:01:58

BeautifulSoup: Get tag text up to certain tag

Question

1 answers

solution1 2 2018-05-29 08:01:58

solution1
2 2018-05-29 08:01:58