How to remove parent tag with BeautifulSoup

Question

I am trying to remove the header cells from a html table using BeautifulSoup. I have something like;

<tr> <th> head1 </th> <th> head2 </th> </tr>

I am using the following code to remove all the header cells;

soup = BeautifulSoup(url)    
for headless in soup.find_all('th'):
        headless.decompose()

This works great, except I am left with an empty row which messes things up later;

<tr> </tr>

I tried the following code but I get an AttributeError: 'NoneType' object has no attribute 'decompose'

for headless in soup.find_all('th'):
    headless.parent.decompose()

How can I either get rid of the row containing header cells or remove the blank row later? Thanks.

Answer 1

That's because you removed the outer <tr> at the first iteration (when headless=<th>head2</th> ), so that when the iteration reaches <th>head2</th> it's parent is None .

You could, instead, iterate through <tr> s having child <td> like so :

for headless in (tr for tr in soup.find_all('tr') if tr.find('th')):
    headless.decompose()

How to remove parent tag with BeautifulSoup

Question

1 answers

solution1
1 ACCPTED 2015-05-27 08:47:54

How to remove parent tag with BeautifulSoup

Question

1 answers

solution1 1 ACCPTED 2015-05-27 08:47:54

solution1
1 ACCPTED 2015-05-27 08:47:54