Problem extracting data from Bloomberg using bs4

Question

I am using the below code to extract text from Bloomberg website

from bs4 import BeautifulSoup, SoupStrainer

url = 'https://www.bloomberg.com/news/articles/2020-01-19/welcome-to-peak-decade-from-globalization-to-central-banks'
r = requests.get(url)
soup = bs4.BeautifulSoup(r.text, 'lxml')

p_tags = soup.find_all('p')
sent_list = []
    for p in p_tags:
        if p.string:
            sent_list.append(p.string)

sent = ' '.join(word for word in slist)

print(sent)

the output I get is

To continue, please click the box below to let us know you're not a robot."

Is there any way I can get around this issue and extract the text from the website?

Answer 1

You got captcha. Bloomberg site is very strict against crawlers.

Second important notice. Site is under a paywall. So, you can see fulltext only of several pages.

Problem extracting data from Bloomberg using bs4

Question

1 answers

solution1
0 2020-01-21 10:44:56

Problem extracting data from Bloomberg using bs4

Question

1 answers

solution1 0 2020-01-21 10:44:56

solution1
0 2020-01-21 10:44:56