I am using the below code to extract text from Bloomberg website
from bs4 import BeautifulSoup, SoupStrainer
url = 'https://www.bloomberg.com/news/articles/2020-01-19/welcome-to-peak-decade-from-globalization-to-central-banks'
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')
p_tags = soup.find_all('p')
sent_list = []
for p in p_tags:
if p.string:
sent_list.append(p.string)
sent = ' '.join(word for word in slist)
print(sent)
the output I get is
To continue, please click the box below to let us know you're not a robot."
Is there any way I can get around this issue and extract the text from the website?
You got captcha. Bloomberg site is very strict against crawlers.
Second important notice. Site is under a paywall. So, you can see fulltext only of several pages.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.