beautifulsoup - Fetching text either side of a br tag

Question

I have unfortunately become stuck with the following problem:

 <a href="someurl"> 
"TEXT ONE"
 <br>
 "TEXT TWO"
 </a>

I need text one and text two separately. I can only obtain them as a whole providing "TEXT ONE TEXT TWO" by using "text = container.a.text" , with the container being a parent of the a tags. I have tried as many ways as I could find with no success. I can't manage to use br tag properly.
Thank you for any help.

Answer 1

I would avoid relying on the presence of the br element and would instead locate all the text nodes inside the a :

In [1]: from bs4 import BeautifulSoup

In [2]: html = """ <a href="someurl"> 
    ...: "TEXT ONE"
    ...:  <br>
    ...:  "TEXT TWO"
    ...:  </a>"""

In [3]: soup = BeautifulSoup(html, "html.parser")

In [4]: [item.strip() for item in soup.a(text=True)]
Out[4]: ['"TEXT ONE"', '"TEXT TWO"']

Note that a(text=True) is a short version of a.find_all(text=True) .

You can, of course, unpack it into separate variables if needed :

In [5]: text_one, text_two = [item.strip() for item in soup.a(text=True)]

In [6]: text_one
Out[6]: '"TEXT ONE"'

In [7]: text_two
Out[7]: '"TEXT TWO"'

Answer 2

You could use .previousSibiling and .nextSibling attributes after finding the br tag:

>>> container.a.find("br").previousSibling
' \n"TEXT ONE"\n '
>>> container.a.find("br").nextSibling
'\n "TEXT TWO"\n '

Answer 3

You can do the same in several ways. Here is another way:

from bs4 import BeautifulSoup

content='''
 <a href="someurl"> 
"TEXT ONE"
 <br>
 "TEXT TWO"
 </a>
'''
soup = BeautifulSoup(content,'lxml')
for items in soup.select('a'):
    elem = [' '.join(item.split()) for item in items.strings]
    print(elem)

Output:

['"TEXT ONE"', '"TEXT TWO"']

beautifulsoup - Fetching text either side of a br tag

Question

3 answers

solution1
1 2017-12-16 17:02:38

solution2
0 ACCPTED 2017-12-16 16:54:19

solution3
0 2017-12-16 19:22:55

beautifulsoup - Fetching text either side of a br tag

Question

3 answers

solution1 1 2017-12-16 17:02:38

solution2 0 ACCPTED 2017-12-16 16:54:19

solution3 0 2017-12-16 19:22:55

solution1
1 2017-12-16 17:02:38

solution2
0 ACCPTED 2017-12-16 16:54:19

solution3
0 2017-12-16 19:22:55