简体   繁体   English

我想解析一个html源代码字符串以在Java中查找特定标记

[英]I would like to parse an html source string to find a specific tag in Java

So I have the following html source: 所以我有以下html源代码:

<form action='http://example.com' method='get'>

        <P>Some example text here.</P>
        <input type='text' class='is-input' id='agent_name' name='deviceName' placeholder='Device Name'>
        <input type='hidden' name='p' value='firefox'>
        <input type='hidden' name='email' value='example@example.com'>
        <input type='hidden' name='k' value='cITBk236gyd56oiY0fhk6lpuo9nt61Va'>
        <p><input type='submit' class='btn-blue' style='margin-top:15px;' value='Install'></p>
</form>

Unfortunately this html source is saved as a string. 不幸的是,这个html源代码保存为字符串。 I would like to parse it using something like jsoup. 我想用jsoup之类的东西来解析它。 and obtain the following String: <input type='hidden' name='k' value='cITBk236gyd56oiY0fhk6lpuo9nt61Va'> 并获取以下字符串: <input type='hidden' name='k' value='cITBk236gyd56oiY0fhk6lpuo9nt61Va'>

or better yet, only grab the following value: cITBk236gyd56oiY0fhk6lpuo9nt61Va 或者更好,只抓住以下值: cITBk236gyd56oiY0fhk6lpuo9nt61Va

The problem I'm running into is that: 我遇到的问题是:

a) that value: cITBk236gyd56oiY0fhk6lpuo9nt61Va is consistently changing I cannot look for the entire html tag. a)该值: cITBk236gyd56oiY0fhk6lpuo9nt61Va一直在变化我找不到整个html标签。

So, I am looking for a better way to do this. 所以,我正在寻找一种更好的方法来做到这一点。 Here is what I currently have that does not seem to be working: 以下是我目前看来不起作用的内容:

//tried use thing, but java was angry for some reason
Jsoup.parse(myString);

// so I used this instead. 
org.jsoup.nodes.Document doc = Jsoup.parse(myString);

// in this case I just tried to select the entire tag. Elements
elements = doc.select("<input name=\"k\"
value=\"cITBkdxJTFd56oiY0fhk6lUu8Owt61Va\" type=\"hidden\">");

//yeah this does not seem to work. I assume it's not a string anymorebut a document. Not sure if it 
//would attempt to print anyway.
System.out.println(elements);

so I guess I can't use select, but even if this would work. 所以我想我不能使用select,但即使这样也行。 I was not sure how to place select that part of the tag and place it into a new string. 我不知道如何选择标签的那一部分并将其放入一个新的字符串中。

You can try this way 你可以试试这种方式

Document doc = Jsoup.parse(myString);
Elements elements = doc.select("input[name=k]");
System.out.println(elements.attr("value"));

output: 输出:

cITBk236gyd56oiY0fhk6lpuo9nt61Va

Try this call to select to get the elements: 尝试此调用以select获取元素:

elements = doc.select("input[name=k][value=cITBkdxJTFd56oiY0fhk6lUu8Owt61Va]")

In this context, elements must be an Elements object. 在此上下文中, elements必须是Elements对象。 If you need to extract data from elements , you can use one of these (among others, obviously): 如果需要从elements提取数据,可以使用其中一个(显然):

elements.html(); // HTML of all elements
elements.text(); // Text contents of all elements
elements.get(i).html(); // HTML of the i-th element
elements.get(i).text(); // Text contents of the i-th element
elements.get(i).attr("value"); // The contents of the "value" attribute of the i-th element

To iterate over elements , you can use any of these: 要迭代elements ,您可以使用以下任何一个:

for(Element element : elements)
    element.html(); // Or whatever you want

for(int i=0;i<elements.size();i++)
    elements.get(i).html(); // Or whatever you want

Jsoup is an excellent library. Jsoup是一个很棒的图书馆。 The select method uses (lightly) modified CSS selectors for document queries. select方法使用(轻微)修改的CSS选择器进行文档查询。 You can check the valid syntax for the method in the Jsoup javadocs . 您可以在Jsoup javadocs中检查该方法的有效语法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM