简体   繁体   中英

Is there a way to access specific attributes of html tags using Java/JSoup?

For example, the code

Elements linksOnPage = htmlDocument.select("a[href]");

will return all tags with an href attribute. But I want <a> tags where there is an attribute called title and it is equal to 'XXX'. Also, I want all <span> tags that have a title attribute equal to 'XXX' and furthermore, want the actual text value is inside that span tag.

Is there an easy way to do this?

You can simply use a[title=XXX] , same for span[title=XXX] .

If you want to find them in one select(..) query, you can group multiple selectors by separating them with comma like

Elements linksOnPage = htmlDocument.select("a[title=XXX], span[title=XXX]");

If you want to get text which will be generated by selected tags you can call text() method on them.

You can find more info about selectors at official tutorial: http://jsoup.org/cookbook/extracting-data/selector-syntax

For checking if attribute matches XXX you can check if linksOnPage.attr("title") is equal to XXX . The body of the span tags can be extracted using the text() function in jsoup and you can get the entire tags from outerHtml() function

Well, according to this documentation :

You can select which tags is XXX by: htmlDocument.select("a[title="+XXX+"]");

For data inside a tag: tag.text() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM