I'm trying to scrape email address from a webpage. The email address is available in page source (ctrl + u). However, I still can't fetch it using requests. All I get is AttributeError. Any help on this would be appreciated.
My current attempt:
import requests
from bs4 import BeautifulSoup
link = "https://www.facebook.com/pg/theultimatecollectionco/about/?ref=page_internal"
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36'
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
try:
email = soup.select_one("a[href^='mailto:']").get("href")
except AttributeError: email = ""
print(email)
The page is constructed with with help of Javascript, so BeautifulSoup alone cannot see it ( selenium
helps here).
The easiest way is just to grep the page for any mailto:
hrefs:
import re
import html
import requests
link = "https://www.facebook.com/pg/theultimatecollectionco/about/?ref=page_internal"
html_doc = requests.get(link).text
for email in re.findall(r'"mailto:([^"]+)"', html_doc):
print(html.unescape(email))
Prints:
support@theultimatecollection.co
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.