Jsoup解析嵌套HTML

Question

我有一個要用Jsoup解析的HTML，並且在HTML的怪異結構之后迷路了。 我可以這樣總結HTML（每一行都是上面的一層）：

<html>
  <body class="page3078">
    <div id="mainCapsule">
      <div id="contentCapsule" class="capsule">
        <div id="content">
          <div id="subCapsule" class="clearFix" xmlns="">
            <div id="contentLeft">
              <iframe width="635" height="1000" frameborder="0" src="apps/Results.aspx">
                #document
                <html xmlns="http://www.w3.org/1999/xhtml">
                  <body style="background:none;">
                    <form id="form1" action="Results.aspx" method="post" name="form1">
                      <div class="pressContent">
                        <div class="tableCapsule details">
                          <table width="100%" border="0" cellspacing="0" cellpadding="0">
                            <tbody>
                              <tr class="even">

基本上，我想使用類“ even”在標簽內獲取文本。 我甚至嘗試像這樣直接調用類：

doc.getElementsByClass("even")

沒用 我嘗試使用選擇器方法建立父級>子級關系。 它也不起作用。 我在第二個html標簽中嘗試了這個：

doc.select("body.page3078 > html > body > #form1 > th");

也沒用。 我哪里錯了？

Answer 1

一個評論在這里總結了解決方案的開始：

如此處所述，您需要在單獨的jsoup解析器中從iframe獲取頁面。 該頁面一點也不奇怪-它只是iframe中顯示的單獨頁面。 – 蜘蛛鮑里斯（Boris）

Jsoup解析嵌套HTML

問題描述

1 個解決方案

解決方案1
0 已采納

Jsoup解析嵌套HTML

問題描述

1 個解決方案

解決方案1 0 已采納

解決方案1
0 已采納