简体   繁体   中英

HTML Parsing Table - BeautifulSoup

I am attempting to parse the second table seen below using BeautifulSoup. I am having trouble identifying the second table verses the first because the tables attributes are the exact same. How do I access the information in the table such as name = PATHWAY? What I have used so far to attempt to access the table is:

table = soup.find('table', {'name':'PATHWAY'})

I receive a response of "None" although I know the table is present. To me this means that my method to distinguish between the two is not working. Any suggestions?

<table border="0" cellspacing="0" cellpadding="0" bgcolor="#DCDCDC">
<tr><td>

  <table border="0" cellspacing="1" cellpadding="3">
<tr>
<td class=ue><a name="REACTION TYPE">REACTION TYPE</td><td class=ue>ORGANISM</td><td  class=ue>COMMENTARY</td><td class=ue>LITERATURE</td></tr>
<tr class=tr1>
<td class=g>condensation</td><td class=no>-</td><td class=no>-</td><td class=no>-</td></tr>
  </table>
</td></tr></table>
<br>

<table border="0" cellspacing="0" cellpadding="0" bgcolor="#DCDCDC">
<tr><td>


  <table border="0" cellspacing="1" cellpadding="3">
<tr>
<td class=ue><a name="PATHWAY">PATHWAY</td><td class=ue>KEGG Link</td><td class=ue>MetaCyc Link</td><td class=ue></td></tr>
  <table>

Mu Mind has it right: find the "a" then traverse back up to the parent

soup.find(attrs={"name":"PATHWAY"}).findParent('table')

That's the python way....There is a single xpath command but operating with xpath on axis is more complicated and only worth the effort it it has some specific use (xslt or javascript requirements eg)

>>> soup.find(attrs={"name":"PATHWAY"})
<a name="PATHWAY">PATHWAY</a>

First:

table = soup.find('table' {'name':'PATHWAY'}

is no proper Python code.

What should this match?

This will match only.

Either you iterate through each single table and perform related check inside each table or you iterate over each single node of the tree until you find the related node and then walk up the node hierarchy (by following the parent nodes) until you find a table element. The recursiveChildGenerator() can be used to iterate over all nodes (like in a flat list).

You can use the function form of find :

soup.find(lambda tag: (tag.name=='table' and \
    (tag.find('a', attrs={'name': 'PATHWAY'}) is not None)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM