I would like to select the text inside the strong-tag but without the div under it...
Is there a possibility to do this with jsoup directly?
My try for the selection (doesn't work, selects the full content inside the strong-tag):
Elements selection = htmlDocument.select("strong").select("*:not(.dontwantthatclass)");
HTML:
<strong>
I want that text
<div class="dontwantthatclass">
</div>
</strong>
您正在寻找ownText()方法。
String txt = htmlDocument.select("strong").first().ownText();
Have a look at various methods jsoup have to deal with it https://jsoup.org/apidocs/org/jsoup/nodes/Element.html . You can use remove()
, removeChild()
etc. One thing you can do is use regex. Here is a sample regex that matches start and end tag also appended by </br>
tag https://www.debuggex.com/r/1gmcSdz9s3MSimVQ
So you can do it like
selection.replace(/<([^ >]+)[^>]*>.*?<\/\1>|<[^\/]+\/>/ig, "");
You can further modify this regex to match most of your cases.
Another thing you can do is, further process your variable using javascript or vbscript:-
Elements selection = htmlDocument.select("strong")
jquery code here:-
var removeHTML = function(text, selector) {
var wrapped = $("<div>" + text + "</div>");
wrapped.find(selector).remove();
return wrapped.html();
}
With regular expression you can use ownText() methods of jsoup to get and remove unwanted string.
I guess you're using jQuery, so you could use "innerText" property on your "strong" element:
var selection = htmlDocument.select("strong")[0].innerText;
https://jsfiddle.net/scratch_cf/8ds4uwLL/
PS: If you want to wrap the retrieved text into a "strong" tag, I think you'll have to build a new element like $('<strong>retrievedText</strong>');
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.