简体   繁体   English

Java-JSOUP:选择网站的特定部分

[英]Java - JSOUP: selecting a specific part of a website

I try to readout the Office 365 Website to compare it to a proxy configuration. 我尝试读出Office 365网站以将其与代理配置进行比较。 But i cant get the select right so that it just gets me a specific section of those urls and ip addresses. 但是我无法正确选择,因此只能让我获得这些URL和IP地址的特定部分。

public class Office365WebsiteParser {

    Document doc = null;


    String WebseitenInhalt;

    public void Parser() {
        System.setProperty("http.proxyHost", "xxx");
        System.setProperty("http.proxyPort", "8081");
        System.setProperty("https.proxyHost", "xxx");
        System.setProperty("https.proxyPort", "8081");

        for (int i = 1; i <= 5; i++) {
            try {
                doc = Jsoup.connect("https://technet.microsoft.com/de-de/library/hh373144.aspx").userAgent("Mozilla").get();
                break; // Break immediately if successful
            } catch (IOException e) {
                // Swallow exception and try again
                System.out.println("jsoup Timeout occurred " + i + " time(s)");
            }
        }

        if (doc == null) {
            System.out.println("Connection timeout after 5 tries");
        } else { // Wenn alles funktioniert hat Webseite auswerten

            Elements urls_Office365_URLs = doc.select("div.codeSnippetContainerCode");


            // HTML auswahl der Webseite nach div class und div id
        //  urls_Office365_URLs_global = urls_Office365_URLs;

            WebseitenInhalt=urls_Office365_URLs.text();
        }

    }

    public void Print() {
        System.out.println(WebseitenInhalt);
    }

    public String get() {
        return WebseitenInhalt;
    }
}

I just want to select the containers like this: 我只想选择这样的容器:

 <div class="codeSnippetContainerCodeContainer"> <div class="codeSnippetToolBar"> <div class="codeSnippetToolBarText"> <a name="CodeSnippetCopyLink" style="display: none;" title="In Zwischenablage kopieren" href="javascript:if (window.epx.codeSnippet)window.epx.codeSnippet.copyCode('CodeSnippetContainerCode_0f6f9acf-6aa4-471f-8600-f8d059f95493');">Kopieren</a> </div> </div> <div id="CodeSnippetContainerCode_0f6f9acf-6aa4-471f-8600-f8d059f95493" class="codeSnippetContainerCode" dir="ltr"> <div style="color:Black;"><pre> *.live.com *.officeapps.live.com *.microsoft.com *.glbdns.microsoft.com *.microsoftonline.com *.office365.com *.office.com Portal.Office.com *.onmicrosoft.com *.microsoftonline-p.com^ *.microsoftonline-p.net^ *.microsoftonlineimages.com^ *.microsoftonlinesupport.net^ *.msecnd.net^ *.msocdn.com^ *.msn.com^ *.msn.co.jp^ *.msn.co.uk^ *.office.net^ *.aadrm.com^^ *.cloudapp.net^^ *.activedirectory.windowsazure.com^^^ *.phonefactor.net^^^ </pre></div> </div> </div> </div> 

Try this CSS selector: 试试这个CSS选择器:

table:has(th:matches(.+-URLs?)) td:first-of-type pre

DEMO DEMO

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM