[英]How to select text in HTML tag without a tag around it (JSoup)
I would like to select the text inside the strong-tag but without the div under it... 我想在strong-tag中选择文本,但不包含div ...
Is there a possibility to do this with jsoup directly? 是否有可能直接使用jsoup?
My try for the selection (doesn't work, selects the full content inside the strong-tag): 我尝试选择(不起作用,选择strong-tag内的完整内容):
Elements selection = htmlDocument.select("strong").select("*:not(.dontwantthatclass)");
HTML: HTML:
<strong>
I want that text
<div class="dontwantthatclass">
</div>
</strong>
您正在寻找ownText()方法。
String txt = htmlDocument.select("strong").first().ownText();
Have a look at various methods jsoup have to deal with it https://jsoup.org/apidocs/org/jsoup/nodes/Element.html . 看看jsoup必须处理的各种方法https://jsoup.org/apidocs/org/jsoup/nodes/Element.html 。 You can use
remove()
, removeChild()
etc. One thing you can do is use regex. 您可以使用
remove()
, removeChild()
等。您可以做的一件事是使用正则表达式。 Here is a sample regex that matches start and end tag also appended by </br>
tag https://www.debuggex.com/r/1gmcSdz9s3MSimVQ 这是一个匹配开始和结束标记的示例正则表达式,也附加了
</br>
标签https://www.debuggex.com/r/1gmcSdz9s3MSimVQ
So you can do it like 所以你可以这样做
selection.replace(/<([^ >]+)[^>]*>.*?<\/\1>|<[^\/]+\/>/ig, "");
You can further modify this regex to match most of your cases. 您可以进一步修改此正则表达式以匹配大多数情况。
Another thing you can do is, further process your variable using javascript or vbscript:- 你可以做的另一件事是,使用javascript或vbscript进一步处理你的变量: -
Elements selection = htmlDocument.select("strong")
jquery code here:- jquery代码在这里: -
var removeHTML = function(text, selector) {
var wrapped = $("<div>" + text + "</div>");
wrapped.find(selector).remove();
return wrapped.html();
}
With regular expression you can use ownText() methods of jsoup to get and remove unwanted string. 使用正则表达式,您可以使用jsoup的ownText()方法来获取和删除不需要的字符串。
I guess you're using jQuery, so you could use "innerText" property on your "strong" element: 我猜你正在使用jQuery,所以你可以在“strong”元素上使用“innerText”属性:
var selection = htmlDocument.select("strong")[0].innerText;
https://jsfiddle.net/scratch_cf/8ds4uwLL/ https://jsfiddle.net/scratch_cf/8ds4uwLL/
PS: If you want to wrap the retrieved text into a "strong" tag, I think you'll have to build a new element like $('<strong>retrievedText</strong>');
PS:如果你想将检索到的文本包装成“强”标签,我想你必须建立一个像
$('<strong>retrievedText</strong>');
这样的新元素$('<strong>retrievedText</strong>');
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.