How to print out all cells of a table row in Beautiful Soup

Question

I'm just starting to learn how to use Beautiful Soup.

As an exercise, I picked this page from ESPN .

There's a table in there with NBA players and their fantasy ranks. I was able to print the whole row out and it shows everything I see in my browser.

However, when I go to print each cell by itself, it prints out "None" because for some reason, it can't parse a cell that contains an anchor

Here's my code below:

from bs4 import BeautifulSoup

import urllib2
import re

if __name__ == '__main__':
   url = "http://www.espn.com/espn/print?id=20443164"
   resp = urllib2.urlopen(url)
   soup = BeautifulSoup(resp.read())

   table = soup.find_all("table")
   mytable = table[2]
   rows = mytable.findChildren(['th','tr'])
   print rows
   for row in rows:
       cells = row.findChildren('td')
       for cell in cells:
#           print cell.string  # line in question
           print cell  # line in question

If I use

print cell

I get the following output:

<td>1. <a href="http://www.espn.com/nba/player/_/id/3032977/giannis-antetokounmpo">Giannis Antetokounmpo</a>, SF/PF</td>
<td>PHI</td>
<td>C24</td>

If I use

print cell.string

I get the following output:

None
MIL
SF1

So how can I make everything print out without the "td" tags but recognize everything in the first cell without printing "None"?

Answer 1

try this at your last loop. change cell.string to cell.text

for cell in cells:
    print cell.text

Answer 2

You can do something like this -

print (cell.text)

This will get you text inside the cell skipping all the tags init.

Answer 3

From the official documentation regarding .string (emphasis mine):

.string

If a tag has only one child, and that child is a NavigableString , the child is made available as .string

If a tag's only child is another tag, and that tag has a .string , then the parent tag is considered to have the same .string as its child

If a tag contains more than one thing, then it's not clear what .string should refer to, so .string is defined to be None

What they mean by If a tag contains more than one thing is that if a tag contains another tag, tag.string evaluates to None . That's the reason you are getting None for first the <td> tag in your code (as it contains another tag, <a> ).

So, to get the complete text of a tag, you can use get_text() . So, in your code, use cell.get_text() .

Or, for this case, you could also use cell.text . .text is the same as get_text() , which you can see in the source code :

text = property(get_text)

How to print out all cells of a table row in Beautiful Soup

Question

3 answers

solution1
2 2018-05-15 03:12:10

solution2
1 2018-05-15 03:17:09

solution3
1 ACCPTED 2018-05-15 08:20:39

How to print out all cells of a table row in Beautiful Soup

Question

3 answers

solution1 2 2018-05-15 03:12:10

solution2 1 2018-05-15 03:17:09

solution3 1 ACCPTED 2018-05-15 08:20:39

solution1
2 2018-05-15 03:12:10

solution2
1 2018-05-15 03:17:09

solution3
1 ACCPTED 2018-05-15 08:20:39