简体   繁体   中英

How to select text in HTML tag without a tag around it (JSoup)

I would like to select the text inside the strong-tag but without the div under it...

Is there a possibility to do this with jsoup directly?

My try for the selection (doesn't work, selects the full content inside the strong-tag):

Elements selection = htmlDocument.select("strong").select("*:not(.dontwantthatclass)");

HTML:

<strong>
   I want that text
   <div class="dontwantthatclass">
   </div>
</strong>

您正在寻找ownText()方法。

String txt = htmlDocument.select("strong").first().ownText();

Have a look at various methods jsoup have to deal with it https://jsoup.org/apidocs/org/jsoup/nodes/Element.html . You can use remove() , removeChild() etc. One thing you can do is use regex. Here is a sample regex that matches start and end tag also appended by </br> tag https://www.debuggex.com/r/1gmcSdz9s3MSimVQ

So you can do it like

selection.replace(/<([^ >]+)[^>]*>.*?<\/\1>|<[^\/]+\/>/ig, "");

You can further modify this regex to match most of your cases.

Another thing you can do is, further process your variable using javascript or vbscript:-

Elements selection = htmlDocument.select("strong")

jquery code here:-

var removeHTML = function(text, selector) {
    var wrapped = $("<div>" + text + "</div>");
    wrapped.find(selector).remove();
    return wrapped.html();
}

With regular expression you can use ownText() methods of jsoup to get and remove unwanted string.

I guess you're using jQuery, so you could use "innerText" property on your "strong" element:

var selection = htmlDocument.select("strong")[0].innerText;

https://jsfiddle.net/scratch_cf/8ds4uwLL/

PS: If you want to wrap the retrieved text into a "strong" tag, I think you'll have to build a new element like $('<strong>retrievedText</strong>');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM