简体   繁体   English

在段落中查找非HTML标记词

[英]Finding non-HTML tag words in paragraph

I've got something like this: 我有这样的事情:

<p id="tire">I need new tires for my car</p>

I'm trying to write something that would highlight word(s) that I assign, but NOT the stuff that is considered a tag. 我正在尝试写一些能突出显示我分配的单词的东西,而不是那些被认为是标签的东西。 For example, if I want to highlight "tire", I'd theoretically see: 例如,如果我要突出显示“轮胎”,理论上我会看到:

<p id="tire">I need new <strong>tire</strong>s for my car</p>

But unfortunately, I see: 但不幸的是,我看到了:

<p id="<strong>tire</strong>">I need new <strong>tire</strong>s for my car</p>

I'm using just a simple replaceAll(oldWord, newFormat). 我只使用一个简单的replaceAll(oldWord,newFormat)。 Is there a library that can help? 有图书馆可以提供帮助吗? I am using jsoup to grab the HTML I would be searching through. 我正在使用jsoup来获取要搜索的HTML。

You can use the selection method getElementsContainingOwnText(String searchText) to select elements that contain the word you are looking for. 您可以使用选择方法getElementsContainingOwnText(String searchText)来选择包含要查找的单词的元素。 In this case, "tire". 在这种情况下,“轮胎”。

As an example how it works: 例如,它如何工作:

Dummy HTML 虚拟HTML

<html>
 <head></head>
 <body> 
  <p id="tire">I need new tires for my car</p>
 </body>
</html>

Our Jsoup code: 我们的Jsoup代码:

Elements e = doc.getElementsContainingOwnText("tire");
for (Element el : e) {
    el.text(el.ownText().replace("tire", "<strong>tire</strong>"));
}

The resulting document printout: 结果文档打印输出:

<html>
 <head></head>
 <body> 
  <p id="tire">I need new <strong>tire</strong>s for my car</p>
 </body>
</html>

Use find and replace, add a space in front of the word like this " tires" 使用查找和替换,在“轮胎”之类的单词前添加一个空格

and in replace <strong>tire</strong>s 并替换<strong>tire</strong>s

Try: 尝试:

replaceAll("tire", "<strong>tire</strong>");
replaceAll("id=\"<strong>tire</strong>\"", "id=\"tire\"");

This solves the particular problem, but you can get others I think 这可以解决特定的问题,但是我认为您可以得到其他解决方案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM