Using Python to scrape information from a cloudflare site?

Question

I work in a marketing firm, and I have several hundred emails I need to compile and sort from a web-page. I have a bit of python knowledge, so often I'll make a simple scraping tool to make life easier, but cloudflare is blocking the email in the source.

How can I bypass this? Obviously using an automated tool like this is a lot faster than manually copy and pasting all of the emails. Here's the program I've been testing it with:

import requests
import urllib
from bs4 import BeautifulSoup

website = ""
r = requests.get(website)
soup = BeautifulSoup(r.text,'html.parser')

numb = 799

while numb < 800:
    numb += 1
    print(r.status_code)
    print(soup.prettify())

In the source, this replaces the email:

<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="">[email;protected]</a>

Is there any way to automate copying and pasting a certain line on the webpage? I've checked the source in a regular browser and it shows the same thing.

Thanks for the help.

Answer 1

I know its a old thread, but this function will decode the email string :

def cfDecodeEmail(encodedString):
    r = int(encodedString[:2],16)
    email = ''.join([chr(int(encodedString[i:i+2], 16) ^ r) for i in range(2, len(encodedString), 2)])
    return email

Using Python to scrape information from a cloudflare site?

Question

1 answers

solution1
0 2018-12-03 15:41:18

Using Python to scrape information from a cloudflare site?

Question

1 answers

solution1 0 2018-12-03 15:41:18

solution1
0 2018-12-03 15:41:18