BeautifulSoup - How do I extract a substring of a string between tags?

Question

I would like to search the HTML for "Website:" and then return " http://www.aa.com "

<br>Website:  <a href="http://www.aa.com">http://www.aa.com</a><br>

I'm not sure what to do here since there is a clause in between the two strings.

Answer 1

You can search for the text; the result in a NavigableString object, which retains information about where in the tree it lives, which means you can ask it for the next sibling from that element:

>>> from bs4 import BeautifulSoup
>>> import re
>>> sample = '''\
... <br>Website:  <a href="http://www.aa.com">http://www.aa.com</a><br>
... '''
>>> soup = BeautifulSoup(sample)
>>> soup.find(text=re.compile('Website:'))
u'Website:  '
>>> soup.find(text=re.compile('Website:')).next_sibling
<a href="http://www.aa.com">http://www.aa.com</a>

Once you have the <a> element getting either the href attribute or the contained text is trivial:

>>> soup.find(text=re.compile('Website:')).next_sibling['href']
'http://www.aa.com'
>>> soup.find(text=re.compile('Website:')).next_sibling.string
u'http://www.aa.com'

Answer 2

Think of your content as a tree rather than a string.
Beautifulsoup gives you access to the parse tree, issue a findall('a') , then navigate the parsetree whith parent() and contents() , You can navigate to siblings too.

BeautifulSoup - How do I extract a substring of a string between tags?

Question

2 answers

solution1
3 ACCPTED 2015-04-20 15:29:13

solution2
1 2015-04-20 15:27:03

BeautifulSoup - How do I extract a substring of a string between tags?

Question

2 answers

solution1 3 ACCPTED 2015-04-20 15:29:13

solution2 1 2015-04-20 15:27:03

solution1
3 ACCPTED 2015-04-20 15:29:13

solution2
1 2015-04-20 15:27:03