Can't scrape a certain field from a webpage using requests even when that very field is available in page source

Question

I'm trying to scrape email address from a webpage. The email address is available in page source (ctrl + u). However, I still can't fetch it using requests. All I get is AttributeError. Any help on this would be appreciated.

webpage link

My current attempt:

import requests
from bs4 import BeautifulSoup

link = "https://www.facebook.com/pg/theultimatecollectionco/about/?ref=page_internal"

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36'
    r = s.get(link)
    soup = BeautifulSoup(r.text,"lxml")
    try:
        email = soup.select_one("a[href^='mailto:']").get("href")
    except AttributeError: email = ""
    print(email)

Answer 1

The page is constructed with with help of Javascript, so BeautifulSoup alone cannot see it ( selenium helps here).

The easiest way is just to grep the page for any mailto: hrefs:

import re
import html
import requests


link = "https://www.facebook.com/pg/theultimatecollectionco/about/?ref=page_internal"

html_doc = requests.get(link).text
for email in re.findall(r'"mailto:([^"]+)"', html_doc):
    print(html.unescape(email))

Prints:

support@theultimatecollection.co

Can't scrape a certain field from a webpage using requests even when that very field is available in page source

Question

1 answers

solution1
1 ACCPTED 2021-04-19 18:27:14

Can't scrape a certain field from a webpage using requests even when that very field is available in page source

Question

1 answers

solution1 1 ACCPTED 2021-04-19 18:27:14

solution1
1 ACCPTED 2021-04-19 18:27:14