簡體   English   中英

Jsoup如何返回頁面內腳本生成的電子郵件ID

[英]Jsoup how to return script generated email id within page

我有一個文檔對象為:

Document secDoc = Jsoup.connect(a.attr("abs:href")).timeout(30*1000).get();
String txt = secDoc.text();

現在,當我調試上面的代碼並檢查secDoc的值時,我得到了具有以下元素的普通頁面源:

For questions about your order, including anything shipping or billing related, please email <script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>.

如果您自己看到該網頁,則可以看到以下行: For questions about your order, including anything shipping or billing related, please email oatmealsupport@gmail.com. We only do email support at this time. For questions about your order, including anything shipping or billing related, please email oatmealsupport@gmail.com. We only do email support at this time. 有趣的是,此腳本在頁面上生成電子郵件ID。 在執行檢查元素時,我得到:

<p>
                For questions about your order, including anything shipping or billing related, please email <a href="mailto:oatmealsupport@gmail.com">oatmealsupport@gmail.com</a><script type="text/javascript">write_email('oatmealsupport','gmail.com')</script>.
                We only do email support at this time.<br><br>
                Hours of operation: <strong>Monday-Friday 8am - 6pm PT.</strong>
                <br>
                <strong>Shipping Times</strong>:
                We strive to fulfill the orders within 3-5 working days. When we are really busy we may take a day or two longer. 
              We ship orders Monday - Friday, so if your order is placed Friday evening we may not be able to process it until the following Monday. 
                If we are behind, it may be a few days before we respond.  The Oatmeal is an extremely small operation so please be patient. 
                <br>
                <a href="http://shop.theoatmeal.com/pages/shipping">More Shipping Info</a><br><br>
                Questions about shirt sizes? <a href="http://shop.theoatmeal.com/pages/shipping#shirts">Shirt Sizing Info</a>
            </p>

因此,錨點: <a href="mailto:oatmealsupport@gmail.com">oatmealsupport@gmail.com</a>由腳本生成。

無論如何,我可以使用Jsoup(或任何其他方式)獲得此錨點嗎?

對於此特定站點,地址的用戶和域部分位於script標記中,因此請選擇script標記,獲取其文本,使用正則表達式解析該文本,並在用戶和電子郵件之間使用@進行連接。 您的選擇器可能只是script:contains(write_email) ,假設在頁面的其他位置未使用write_email 這僅是因為地址在文本中是公開的,即使它分為兩部分也是如此。

通常,Jsoup不是JavaScript引擎。 如果您想看到使用Web瀏覽器的人會看到的頁面,則可以嘗試使用Selenium之類的瀏覽器自動化工具。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM