Can't find string after a tag with BeautifulSoup in Python?

Question

In this HTML I want to get the string of it but no matter what I try it doesn't work (string = none)

      <a href="/analyze/default/index/49398962/1/34925733" target="_blank">
       <img alt="" class="ajax-tooltip shadow radius lazy" data-id="acctInfo:34925733_1" data-original="/upload/profileIconId/default.jpg" src="/images/common/transbg.png"/>
       Jue VioIe Grace
      </a>

There's a few of these on the page and I tried this:

print([a.string for a in soup.findAll('td', class_='tou')])

The output is just none.

EDIT: here is the entire page HTML, hope this helps, just to clarify, I need to find all instances like the one above and extract their string

http://pastebin.com/4mvcMsJu

Answer 1

You need to select the a from the parent td and call .text , the text is inside the anchor which is a child of the td:

print([td.a.text for td in soup.find_all('td', class_='tou')])

There obviously is a td with the class tou or you would not be getting a list with None:

In [10]: html = """<td class='tou'>
          <a href="/analyze/default/index/49398962/1/34925733" target="_blank">
       <img alt="" class="ajax-tooltip shadow radius lazy" data-id="acctInfo:34925733_1" data-original="/upload/profileIconId/default.jpg" src="/images/common/transbg.png"/>
       Jue VioIe Grace
      </a>
      </td>"""

In [11]: soup = BeautifulSoup(html,"html.parser")

In [12]: [a.string for a in soup.find_all('td', class_='tou')]
Out[12]: [None]

In [13]: [td.a.text for td in soup.find_all('td', class_='tou')]
Out[13]: [u'\n\n       Jue VioIe Grace\n      ']

You could also call .text on the td:

In [14]: [td.text for td in soup.find_all('td', class_='tou')]
Out[14]: [u'\n\n\n       Jue VioIe Grace\n      \n']

But that would maybe get more than you want.

using your full html from pastebin:

In [18]: import requests

In [19]: soup = BeautifulSoup(requests.get("http://pastebin.com/raw/4mvcMsJu").content,"html.parser")

In [20]: [td.a.text.strip() for td in soup.find_all('td', class_='tou')]
Out[20]: 
 [u'KElTHMCBRlEF',
 u'game 5 loser',
 u'Cris',
 u'interestingstare',
 u'ApoIlo Price',
 u'Zary',
 u'Adrian Ma',
 u'Liquid Inori',
 u'focus plz',
 u'Shiphtur',
 u'Cody Sun',
 u'ApoIIo Price',
 u'Pobelter',
 u'Jue VioIe Grace',
 u'Valkrin',
 u'Piggy Kitten',
 u'1 and 17',
 u'BLOCK IT',
 u'JiaQQ1035716423',
 u'Twitchtv Flaresz']

In this case td.text.strip() gives you the same output:

In [23]: [td.text.strip() for td in soup.find_all('td', class_='tou')]
Out[23]: 
[u'KElTHMCBRlEF',
 u'game 5 loser',
 u'Cris',
 u'interestingstare',
 u'ApoIlo Price',
 u'Zary',
 u'Adrian Ma',
 u'Liquid Inori',
 u'focus plz',
 u'Shiphtur',
 u'Cody Sun',
 u'ApoIIo Price',
 u'Pobelter',
 u'Jue VioIe Grace',
 u'Valkrin',
 u'Piggy Kitten',
 u'1 and 17',
 u'BLOCK IT',
 u'JiaQQ1035716423',
 u'Twitchtv Flaresz']

But you should understand that there is a difference. Also the difference between .string vs .text

Answer 2

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(open('input.html'), 'lxml')
>>> [tag.text.strip() for tag in soup]
[u'Jue VioIe Grace']

If we want to restrict the search to text in anchor tags:

>>> [tag.text.strip() for tag in soup.findAll('a')]
[u'Jue VioIe Grace']

Note that there are no td tags in your sample input and no tag has the attribute class_='tou' .

Answer 3

Well, if your soup variable is made off that html piece of code then the output you get is None because there is no td element inside it, and of course there is not td element with class=tou .

Now, if you want to get that text maybe you could call soup.findAll(text=True) which outputs something like ['\\n', '\\n Jue VioIe Grace\\n ']

Can't find string after a tag with BeautifulSoup in Python?

Question

3 answers

solution1
3 ACCPTED 2016-08-25 23:28:35

solution2
2 2016-08-25 23:29:16

solution3
0 2016-08-25 23:25:51

Can't find string after a tag with BeautifulSoup in Python?

Question

3 answers

solution1 3 ACCPTED 2016-08-25 23:28:35

solution2 2 2016-08-25 23:29:16

solution3 0 2016-08-25 23:25:51

solution1
3 ACCPTED 2016-08-25 23:28:35

solution2
2 2016-08-25 23:29:16

solution3
0 2016-08-25 23:25:51