jSoup获取HTML标记的值

Question

I am reading an html file from the internet and when I read the file, the output to my console is as follows: 我正在从互联网上读取一个html文件，当我读取文件时，我的控制台的输出如下：

<string>
       <String1>
        text
       </String1>
       <level2>
        text2
       </level2>
       <level3>
        text3
       </level3>
       <level4>
        text4
       </level4>
       <level5>
         TEXT
       </level5>
</string>
<string>
           <String2>
            text
           </String2>
           <level2>
            text2
           </level2>
           <level3>
            text3
           </level3>
           <level4>
            text4
           </level4>
           <level5>
             THIS TEXT
           </level5>
    </string>

How can I access the level5 text in the second string? 如何访问第二个字符串中的level5文本？ I have been trying all day with no luck and would really appreciate some input from someone who knows more about this. 我一整天都在努力，没有运气，非常感谢那些了解更多相关信息的人的一些意见。

Here is my code: 这是我的代码：

String line = null;

            try {
                // FileReader reads text files in the default encoding.
                FileReader fileReader = new FileReader(String.valueOf(doc));

                // Always wrap FileReader in BufferedReader.
                BufferedReader bufferedReader = new BufferedReader(fileReader);

                while ((line = bufferedReader.readLine()) != null) {
                    Elements tdElements = doc.getElementsByTag("level1");
                    for(Element element : tdElements )
                    {
                        //Print the value of the element
                        System.out.println(element.text());
                    }

                }

                // Always close files.
                bufferedReader.close();
            } catch (FileNotFoundException ex) {
                System.out.println(
                        "Unable to open file '" +
                                doc + "'");
            } catch (IOException ex) {
                System.out.println(
                        "Error reading file '"
                                + doc + "'");
                // Or we could just do this:
                // ex.printStackTrace();
            }
        }
//
        catch (IOException e) {
            e.printStackTrace();
        }

Answer 1

The code below uses JSoup to parse the text you were referring to. 下面的代码使用JSoup来解析您所引用的文本。 The variable 'textToParse' is the above html code that you provided. 变量'textToParse'是您提供的上述html代码。 You can use JSoup's Psuedo selectors to find elements in a specific position in the DOM tree. 您可以使用JSoup的Psuedo选择器来查找DOM树中特定位置的元素。 Hope this is what you were looking for. 希望这是你想要的。

Document document = Jsoup.parse(textToParse);
Elements stringTags = document.select("string:eq(1)");
for(Element e : stringTags) {
    System.out.println(e.select("level5").text());
}

//Output: THIS TEXT

Answer 2

You can use a CSS selector here: 你可以在这里使用CSS选择器：

string:nth-of-type(2) > level5

DEMO: http://try.jsoup.org/~8w_pfCxDhJwIseTKiKsQjQJOBRs 演示： http ： //try.jsoup.org/~8w_pfCxDhJwIseTKiKsQjQJOBRs

DESCRIPTION 描述

string:nth-of-type(2) /* Select the 2nd string node in document... */
> level5                /* ... then select all "level5" child nodes  */

SAMPLE CODE 示例代码

Document doc = ...
Element level5Node = doc.select("string:nth-of-type(2) > level5").first();
if (level5Node ==null) {
   throw new RuntimeException("Unable to locate level5 text...");
}

System.out.println(level5Node.text()); // THIS TEXT

Answer 3

Solution 1: you html is valid XML: use XML tools: 解决方案1：你的HTML是有效的XML：使用XML工具：

you can get your second level5 with XPath: "//string[2]/level5" 你可以使用XPath获得第二级别5：“// string [2] / level5”

Solution 2: parse it with Jsoup and get the document then use Xpath as solution 1 解决方案2：使用Jsoup解析它并获取文档，然后使用Xpath作为解决方案1

See Jsoup with XPath / XSoup: Does jsoup support xpath? 请参阅带有XPath / XSoup的Jsoup： jsoup是否支持xpath？

Solution 1: 解决方案1：

String xml="<root>"+your xml+"</root>";

DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression="//string[2]/level5";
String value = xPath.evaluate(expression, document);
System.out.println("EVALUATE:"+value);

jSoup获取HTML标记的值

问题描述

3 个解决方案

解决方案1
1 2016-01-10 14:11:59

解决方案2
1 2016-01-11 11:33:45

DESCRIPTION 描述

SAMPLE CODE 示例代码

解决方案3
0 2016-01-10 17:37:18

jSoup获取HTML标记的值

问题描述

3 个解决方案

解决方案1 1 2016-01-10 14:11:59

解决方案2 1 2016-01-11 11:33:45

DESCRIPTION 描述

SAMPLE CODE 示例代码

解决方案3 0 2016-01-10 17:37:18

解决方案1
1 2016-01-10 14:11:59

解决方案2
1 2016-01-11 11:33:45

解决方案3
0 2016-01-10 17:37:18