[英]Unable to scrape an email address from a webpage using requests module
I'm trying to scrape an email address from this webpage using requests module, not selenium. Although the email address is obfuscated and not present in page source, a javascript function generates this.我正在尝试使用请求模块从该网页中抓取 email 地址,而不是 selenium。虽然 email 地址被混淆并且不存在于页面源中,但 javascript function 会生成此地址。 How can I make use of the following portion to get the email address visible in that webpage?
我如何利用以下部分在该网页中显示 email 地址?
document.write("\u003cn uers=\"znvygb:gnneba@zbsb.pbz\"\u003egnneba@zbsb.pbz\u003c/n\u003e".replace(/[a-zA-Z]/g, function(c){return String.fromCharCode((c<="Z"?90:122)>=(c=c.charCodeAt(0)+13)?c:c-26);}));
I've tried so far with:到目前为止,我已经尝试过:
import requests
from bs4 import BeautifulSoup
link = 'https://www.californiatoplawyers.com/lawyer/311805/tobyn-yael-aaron'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
email = soup.select_one("dt:-soup-contains('Email') + dd")
print(email)
Expected output:预计 output:
taaron@mofo.com
For these tasks I recommend js2py
module:对于这些任务,我推荐
js2py
模块:
import js2py
import requests
from bs4 import BeautifulSoup
link = "https://www.californiatoplawyers.com/lawyer/311805/tobyn-yael-aaron"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
res = requests.get(link, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
email = soup.select_one("dt:-soup-contains('Email') + dd")
js_code = email.script.contents[0].replace("document.write", "")
email = BeautifulSoup(js2py.eval_js(js_code), "html.parser").text
print(email)
Prints:印刷:
taaron@mofo.com
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.