BeautifulSoup: extract between href and class?

Question

I want to store the dates from the following chunk of text:

newsoup = '''<html><body><a href="/president/washington/speeches/speech-3460">Proclamation 
of Pardons in Western Pennsylvania (July 10, 1795)</a>, <a class="transcript" href="/president/washington/speeches/speech-3460">Transcript</a>, 
<a href="/president/washington/speeches/speech-3939">Seventh Annual Message to Congress (December 8, 1795)</a></body></html>'''

But, I'm having trouble getting at the text between > and </a> . Once I get Proclamation of Pardons in Western Pennsylvania (July 10, 1795) , I'll be set. I've tried adapting another approach to my specific data, but I end up with an empty object.

I'm trying something like the following, but having little luck:

a = newsoup.findAll('a',attrs={'href'})
print a

I should have noted that newsoup was already a soup object.

Answer 1

Assuming newsoup is a soup object, I think this should work:

(If it is not, you can run newsoup = BeautifulSoup(newsoup) )

a = newsoup.findAll('a')
for x in a:
    print x.text

Answer 2

This will work for you:

a = newsoup.findAll('a')[0].contents[0]

where newsoup is a BeautifulSoup object.

Or else first do:

newsoup = BeautifulSoup(newsoup)

You can put that in a loop:

a = soup.findAll('a')
for x in a:
    print x.contents[0]

BeautifulSoup: extract between href and class?

Question

2 answers

solution1
2 2015-10-10 17:30:12

solution2
0 2015-10-10 17:35:04

BeautifulSoup: extract between href and class?

Question

2 answers

solution1 2 2015-10-10 17:30:12

solution2 0 2015-10-10 17:35:04

solution1
2 2015-10-10 17:30:12

solution2
0 2015-10-10 17:35:04