简体   繁体   English

jsoup从html文件中获取特定的id

[英]jsoup to get a particular id from a html file

I have a html file like 我有一个像html的文件

<div class="student">
<h4 id="Classnumber100" class="studentheading">
   <a id="studentlink22" href="/grade8/greg">22. Greg</a>
</h4>
<div class="studentcategories">
<div class="studentneighborhoods">
</div>
</div>
</div>

I want to use JSOUP to get the url = /grade8/greg and "22. Greg". 我想用JSOUP获取url = / grade8 / greg和“22. Greg”。

I tried with selector 我试过选择器

    Elements listo = doc.select("h4 #studentlink22");

I am not able to get the values. 我无法获得价值。

Actually I want to select based on Classnumber100 There are 300 records in the HTML page , with the only thing consistent is " Classnumber100. 实际上我想基于Classnumber100进行选择HTML页面中有300条记录,唯一一致的是“Classnumber100。

So I want my selector to select all the hrefs and text after classnumber100. 所以我希望我的选择器选择classnumber100之后的所有href和文本。 How can I do that. 我怎样才能做到这一点。

I tried doc.select("class#studentheading"); 我试过doc.select(“class#studentheading”); and many other possibilities but they are not working 和许多其他可能性,但他们没有工作

The select method looks for the html tag, here h4 and a, and then secondarily the attributes if you tell it to do so. select方法查找html标记,这里是h4和a,然后是属性,如果你告诉它这样做。 Have you gone to the jsoup site as the use of select is well described for this situation. 你有没有去过jsoup网站,因为这种情况很好地描述了select的使用。

eg 例如

// code not tested
Elements listo = doc.select("h4[id=Classnumber100]").select("a");

String text = listo.text(); // for  "22. Greg"
String path = listo.attr("href"); // for  "/grade8/greg"

.

First of all, multiple elements should not share the same id, so each of these elements should not have the id Classnumber100 . 首先,多个元素不应该共享相同的ID,所以这些元素应该有ID Classnumber100 However, if this is the case, then you can still select them using the selector [id=Classnumber100] . 但是,如果是这种情况,那么您仍然可以使用选择器[id=Classnumber100]来选择它们。

If you're only interested in the a tags inside, then you can use [id=Classnumber100] > a . 如果您只对里面a标签感兴趣,那么您可以使用[id=Classnumber100] > a

Upon re-reading the question, it appears that the h4 tags you're interested in share the class attribute of studentheading . 重新阅读问题后,您感兴趣的h4标签似乎会共享studentheadingclass属性。 In which case you can use the class selector, ie 在这种情况下,您可以使用类选择器,即

doc.select(".studentheading > a")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM