将表嵌入到列表中后，使用bs4在列表中捕获信息

Question

For this html code: 对于此html代码：

<ul><li>Include these codes as defined in http://unitsofmeasure.org
    <table><tr><td><b>Code</b>
    </td><td><b>Display</b></td></tr>
    <tr><td>min</td><td>Minute</td><td></td></tr>
    <tr><td>h</td><td>Hour</td><td></td></tr><tr>
    <td>d</td><td>Day</td><td></td></tr>
    </table></li></ul>

I just want the information in <li> section, I mean "Include these codes as defined in http://unitsofmeasure.org" . 我只想要<li>部分中的信息，我的意思是"Include these codes as defined in http://unitsofmeasure.org" 。 But because </li> is ended after table, BS4 also captures information in the table. 但是因为</li>在表之后结束，所以BS4也会在表中捕获信息。 This is my code: 这是我的代码：

definition = [li.get_text() for li in ul.findAll("li")]

And this is the output: 这是输出：

[u'Include these codes as defined in http://unitsofmeasure.orgCodeDisplayminMinutehHourdDaywkWeekmoMonthaYear']

How can I edit the code to not capture information in the table? 如何编辑代码以不捕获表中的信息？

Answer 1

您可以使用extract（）删除表。

definition = [li.find('table').extract().get_text() for li in ul.findAll("li")]

Answer 2

Try to move up from table tag using previousSibling , more info about available methods at https://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names 尝试使用previousSibling从表标签上移，有关可用方法的更多信息，请访问https://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names

t = soup.find('table')
print t.previousSibling

将表嵌入到列表中后，使用bs4在列表中捕获信息

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-06-03 20:18:06

解决方案2
0 2016-06-03 20:51:21

将表嵌入到列表中后，使用bs4在列表中捕获信息

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-06-03 20:18:06

解决方案2 0 2016-06-03 20:51:21

解决方案1
1 已采纳 2016-06-03 20:18:06

解决方案2
0 2016-06-03 20:51:21