如何使用BeautifulSoup删除父标签

Question

I am trying to remove the header cells from a html table using BeautifulSoup. 我正在尝试使用BeautifulSoup从html表中删除标头单元格。 I have something like; 我有类似的东西；

<tr> <th> head1 </th> <th> head2 </th> </tr>

I am using the following code to remove all the header cells; 我正在使用以下代码删除所有标题单元格；

soup = BeautifulSoup(url)    
for headless in soup.find_all('th'):
        headless.decompose()

This works great, except I am left with an empty row which messes things up later; 这很有效，除了我留有一个空行，以后将事情弄乱了。

<tr> </tr>

I tried the following code but I get an AttributeError: 'NoneType' object has no attribute 'decompose' 我尝试了以下代码，但得到了AttributeError：'NoneType'对象没有属性'decompose'

for headless in soup.find_all('th'):
    headless.parent.decompose()

How can I either get rid of the row containing header cells or remove the blank row later? 如何摆脱包含标头单元格的行或以后删除空白行？ Thanks. 谢谢。

Answer 1

That's because you removed the outer <tr> at the first iteration (when headless=<th>head2</th> ), so that when the iteration reaches <th>head2</th> it's parent is None . 这是因为您在第一次迭代中（当headless=<th>head2</th> ）除去了外部的<tr> ，因此，当迭代达到<th>head2</th> ，其父级为None 。

You could, instead, iterate through <tr> s having child <td> like so : 相反，您可以像这样遍历具有子<td> <tr> ：

for headless in (tr for tr in soup.find_all('tr') if tr.find('th')):
    headless.decompose()

如何使用BeautifulSoup删除父标签

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-05-27 08:47:54

如何使用BeautifulSoup删除父标签

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-05-27 08:47:54

解决方案1
1 已采纳 2015-05-27 08:47:54