I am trying to remove the header cells from a html table using BeautifulSoup. I have something like;
<tr> <th> head1 </th> <th> head2 </th> </tr>
I am using the following code to remove all the header cells;
soup = BeautifulSoup(url)
for headless in soup.find_all('th'):
headless.decompose()
This works great, except I am left with an empty row which messes things up later;
<tr> </tr>
I tried the following code but I get an AttributeError: 'NoneType' object has no attribute 'decompose'
for headless in soup.find_all('th'):
headless.parent.decompose()
How can I either get rid of the row containing header cells or remove the blank row later? Thanks.
That's because you removed the outer <tr>
at the first iteration (when headless=<th>head2</th>
), so that when the iteration reaches <th>head2</th>
it's parent is None
.
You could, instead, iterate through <tr>
s having child <td>
like so :
for headless in (tr for tr in soup.find_all('tr') if tr.find('th')):
headless.decompose()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.