简体   繁体   English

无法使用请求模块从网页中抓取 email 地址

[英]Unable to scrape an email address from a webpage using requests module

I'm trying to scrape an email address from this webpage using requests module, not selenium. Although the email address is obfuscated and not present in page source, a javascript function generates this.我正在尝试使用请求模块从该网页中抓取 email 地址,而不是 selenium。虽然 email 地址被混淆并且不存在于页面源中,但 javascript function 会生成此地址。 How can I make use of the following portion to get the email address visible in that webpage?我如何利用以下部分在该网页中显示 email 地址?

document.write("\u003cn uers=\"znvygb:gnneba@zbsb.pbz\"\u003egnneba@zbsb.pbz\u003c/n\u003e".replace(/[a-zA-Z]/g, function(c){return String.fromCharCode((c<="Z"?90:122)>=(c=c.charCodeAt(0)+13)?c:c-26);}));

I've tried so far with:到目前为止,我已经尝试过:

import requests
from bs4 import BeautifulSoup

link = 'https://www.californiatoplawyers.com/lawyer/311805/tobyn-yael-aaron'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
email = soup.select_one("dt:-soup-contains('Email') + dd")
print(email)

Expected output:预计 output:

taaron@mofo.com

For these tasks I recommend js2py module:对于这些任务,我推荐js2py模块:

import js2py
import requests
from bs4 import BeautifulSoup

link = "https://www.californiatoplawyers.com/lawyer/311805/tobyn-yael-aaron"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
res = requests.get(link, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
email = soup.select_one("dt:-soup-contains('Email') + dd")

js_code = email.script.contents[0].replace("document.write", "")
email = BeautifulSoup(js2py.eval_js(js_code), "html.parser").text
print(email)

Prints:印刷:

taaron@mofo.com

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM