I have a list of lists that I am trying to remove an element from each list if the element is present in the list.
Code:
import requests
from bs4 import BeautifulSoup
# get link and parse
page = requests.get('https://www.finviz.com/screener.ashx?v=111&ft=4')
soup = BeautifulSoup(page.text, 'html.parser')
print('List of filters\n')
# return 'Title's for each filter
titles = soup.find_all('span', attrs={'class': 'screener-combo-title'})
title_list = []
for t in titles:
title_list.append(t.contents)
print(title_list)
Sample output:
[['Price/Free Cash Flow'], ['EPS growth', <br/>, 'this year'], ['EPS growth', <br/>, 'next year']]
Desired output:
[['Price/Free Cash Flow'], ['EPS growth', 'this year'], ['EPS growth', 'next year']]
The issue I have been running into is that my checks to see if the element is present aren't working. I have tried if '<br/>' in whatever:
and whatever.remove('<br/>')
. NoneType is non callable
. I see that I am putting <br/>
in as a string, but I also see it's not a string in the list. I have tried dropping ''
and that came back unresolved reference
. I have tried checking if each list has multiple elements and if so to remove the 2nd element but that also came back NoneType is non callable
.
Maybe you can try by only appending objet with isinstance of string:
for t in titles:
title_sublist=[]
for content in t.contents:
if isinstance(content, str) :
title_sublist.append(content)
title_list.append(title_sublist)
Elements of your lists aren't strings. They are instances of bs4.element. class. You have to compare it like this:
title_list = []
for t in titles:
title_list.append([])
for c in t.contents:
if c.string != None:
title_list[-1].append(c) # or c.string if you need only names
.string of </br>
is empty None
and for the others it is what you see in output.
In this case .strings
and .stripped_strings
should be preferred over .contents
So change
for t in titles:
title_list.append(t.contents)
to
for t in titles:
title_list.append(list(t.stripped_strings))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.