[英]Jsoup how to return script generated email id within page
我有一個文檔對象為:
Document secDoc = Jsoup.connect(a.attr("abs:href")).timeout(30*1000).get();
String txt = secDoc.text();
現在,當我調試上面的代碼並檢查secDoc的值時,我得到了具有以下元素的普通頁面源:
For questions about your order, including anything shipping or billing related, please email <script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>.
如果您自己看到該網頁,則可以看到以下行: For questions about your order, including anything shipping or billing related, please email oatmealsupport@gmail.com. We only do email support at this time.
For questions about your order, including anything shipping or billing related, please email oatmealsupport@gmail.com. We only do email support at this time.
有趣的是,此腳本在頁面上生成電子郵件ID。 在執行檢查元素時,我得到:
<p>
For questions about your order, including anything shipping or billing related, please email <a href="mailto:oatmealsupport@gmail.com">oatmealsupport@gmail.com</a><script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>.
We only do email support at this time.<br><br>
Hours of operation: <strong>Monday-Friday 8am - 6pm PT.</strong>
<br>
<strong>Shipping Times</strong>:
We strive to fulfill the orders within 3-5 working days. When we are really busy we may take a day or two longer.
We ship orders Monday - Friday, so if your order is placed Friday evening we may not be able to process it until the following Monday.
If we are behind, it may be a few days before we respond. The Oatmeal is an extremely small operation so please be patient.
<br>
<a href="http://shop.theoatmeal.com/pages/shipping">More Shipping Info</a><br><br>
Questions about shirt sizes? <a href="http://shop.theoatmeal.com/pages/shipping#shirts">Shirt Sizing Info</a>
</p>
因此,錨點: <a href="mailto:oatmealsupport@gmail.com">oatmealsupport@gmail.com</a>
由腳本生成。
無論如何,我可以使用Jsoup(或任何其他方式)獲得此錨點嗎?
對於此特定站點,地址的用戶和域部分位於script標記中,因此請選擇script標記,獲取其文本,使用正則表達式解析該文本,並在用戶和電子郵件之間使用@
進行連接。 您的選擇器可能只是script:contains(write_email)
,假設在頁面的其他位置未使用write_email
。 這僅是因為地址在文本中是公開的,即使它分為兩部分也是如此。
通常,Jsoup不是JavaScript引擎。 如果您想看到使用Web瀏覽器的人會看到的頁面,則可以嘗試使用Selenium之類的瀏覽器自動化工具。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.