简体   繁体   English

在h1标头之前找到一个表

[英]Find a table by the h1 header before

I want to find the table in a HTML using the h1 just before using BeautifulSoup 我想在使用BeautifulSoup之前使用h1在HTML中找到表格

 <a name="playerlist"></a> <div class="navbuttons"> <a href="#toc" class="linkbutton">up</a><a class="linkbutton" href="#players">next</a> </div> <h1>Participants</h1> <table class="main"> <thead> <tr> <th>Name </th><th>Major</th><th>Class of</th><th>Ranking</th></tr> </thead> <tbody> <tr> <td>Mike Finge</td><td>Applied Maths</td><td>2015</td><td>155</td> </tr> </tbody> </table> 

In the example above I would like to find the table just under h1 ? 在上面的示例中,我想在h1下找到该表? How can I do this with BeautifulSoup? 如何使用BeautifulSoup做到这一点? Thanks in advance 提前致谢

我认为您应该在BeautifulSoup中使用h1+table ,因为表格位于h1下方

Since the table element is the sibling of the h1 you can do this, ie, you can use the ~ operator available for the select method. 由于table元素是h1的兄弟,因此您可以执行此操作,即,可以使用select方法可用的~运算符。

>>> HTML = '''\
... <a name="playerlist"></a>
... <div class="navbuttons">
... <a href="#toc" class="linkbutton">up</a><a class="linkbutton" href="#players">next</a>
... </div>
... <h1>Participants</h1>
... <table class="main">
... <thead>
... <tr>
... <th>Name </th><th>Major</th><th>Class of</th><th>Ranking</th></tr>
... </thead>
... <tbody>
... <tr>
... <td>Mike Finge</td><td>Applied Maths</td><td>2015</td><td>155</td>
... </tr>
... </tbody>
... </table>
... '''
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(HTML, 'lxml')
>>> soup.select('h1 ~ table')
[<table class="main">
<thead>
<tr>
<th>Name </th><th>Major</th><th>Class of</th><th>Ranking</th></tr>
</thead>
<tbody>
<tr>
<td>Mike Finge</td><td>Applied Maths</td><td>2015</td><td>155</td>
</tr>
</tbody>
</table>]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM