无法使用请求模块从网页中抓取 email 地址

Question

I'm trying to scrape an email address from this webpage using requests module, not selenium. Although the email address is obfuscated and not present in page source, a javascript function generates this.我正在尝试使用请求模块从该网页中抓取 email 地址，而不是 selenium。虽然 email 地址被混淆并且不存在于页面源中，但 javascript function 会生成此地址。 How can I make use of the following portion to get the email address visible in that webpage?我如何利用以下部分在该网页中显示 email 地址？

document.write("\u003cn uers=\"znvygb:gnneba@zbsb.pbz\"\u003egnneba@zbsb.pbz\u003c/n\u003e".replace(/[a-zA-Z]/g, function(c){return String.fromCharCode((c<="Z"?90:122)>=(c=c.charCodeAt(0)+13)?c:c-26);}));

I've tried so far with:到目前为止，我已经尝试过：

import requests
from bs4 import BeautifulSoup

link = 'https://www.californiatoplawyers.com/lawyer/311805/tobyn-yael-aaron'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
email = soup.select_one("dt:-soup-contains('Email') + dd")
print(email)

Expected output:预计 output：

taaron@mofo.com

Answer 1

For these tasks I recommend js2py module:对于这些任务，我推荐js2py模块：

import js2py
import requests
from bs4 import BeautifulSoup

link = "https://www.californiatoplawyers.com/lawyer/311805/tobyn-yael-aaron"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}
res = requests.get(link, headers=headers)
soup = BeautifulSoup(res.text, "html.parser")
email = soup.select_one("dt:-soup-contains('Email') + dd")

js_code = email.script.contents[0].replace("document.write", "")
email = BeautifulSoup(js2py.eval_js(js_code), "html.parser").text
print(email)

Prints:印刷：

taaron@mofo.com

无法使用请求模块从网页中抓取 email 地址

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-09-22 21:36:45

无法使用请求模块从网页中抓取 email 地址

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-09-22 21:36:45

解决方案1
2 已采纳 2022-09-22 21:36:45