简体   繁体   中英

Using Python to scrape information from a cloudflare site?

I work in a marketing firm, and I have several hundred emails I need to compile and sort from a web-page. I have a bit of python knowledge, so often I'll make a simple scraping tool to make life easier, but cloudflare is blocking the email in the source.

How can I bypass this? Obviously using an automated tool like this is a lot faster than manually copy and pasting all of the emails. Here's the program I've been testing it with:

import requests
import urllib
from bs4 import BeautifulSoup

website = ""
r = requests.get(website)
soup = BeautifulSoup(r.text,'html.parser')

numb = 799

while numb < 800:
    numb += 1
    print(r.status_code)
    print(soup.prettify())

In the source, this replaces the email:

<a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="">[email;protected]</a> 

Is there any way to automate copying and pasting a certain line on the webpage? I've checked the source in a regular browser and it shows the same thing.

Thanks for the help.

I know its a old thread, but this function will decode the email string :

def cfDecodeEmail(encodedString):
    r = int(encodedString[:2],16)
    email = ''.join([chr(int(encodedString[i:i+2], 16) ^ r) for i in range(2, len(encodedString), 2)])
    return email

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM