简体   繁体   English

如果标签重复,如何将信息与html标签内的正则表达式匹配?

[英]How to match the information with the regex expression inside the html tag if the tag is repeating?

Like if I have the tags 就像我有标签一样

<td class="cit-borderleft cit-data">437</td>
<td class="cit-borderleft cit-data">394</td>
<td class="cit-borderleft cit-data">12</td>
<td class="cit-borderleft cit-data">**12**</td>

But I need to match number 12 in the last tag. 但是我需要在最后一个标签中匹配数字12。 I am using the regex expression "<td class=\\"cit-borderleft cit-data\\">(.*?)</td>" but it will match all four of the tags. 我使用的是正则表达式"<td class=\\"cit-borderleft cit-data\\">(.*?)</td>"但它将匹配所有四个标记。

Don't use regex. 不要使用正则表达式。 Use proper XML/HTML parser like jsoup . 使用适当的XML / HTML解析器,例如jsoup If you simply want to get text from last element of type td with classes cit-borderleft cit-data you can use 如果您只想从cit-borderleft cit-data类的td类型的最后一个元素中获取文本,则可以使用

String html = 
        "<table>" +
        "<td class=\"cit-borderleft cit-data\">437</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">394</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">12</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">**12**</td>" +
        "</table>";
Document doc = Jsoup.parse(html);
Element last = doc.select("td.cit-borderleft.cit-data").last();
System.out.println(last.text());

Output: **12** 输出: **12**

If you then want to remove these * simply call replace("*","") on that string and you will get new one without asterisks. 如果随后要删除这些*只需在该字符串上调用replace("*","") ,您将获得一个新的不带星号的字符串。

Try this: 尝试这个:

<td class=\"cit-borderleft cit-data\">\*\*(.*?)\*\*<\/td>

or simple way, this: 或简单的方法,这:

\*\*(\d+)\*\*

Based on your attempt 根据您的尝试

<td class=\"cit-borderleft cit-data\">(.*?)<\/td>(?![\s\S]*<\/td>)

Demo 演示版
added this part (?![\\s\\S]*<\\/td>) 添加了这部分(?![\\s\\S]*<\\/td>)

(?!             # Negative Look-Ahead
  [\s\S]        # Character in [\s\S] Character Class
  *             # (zero or more)(greedy)
  <             # "<"
  \/            # "/"
  td>           # "td>"
)               # End of Negative Look-Ahead

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM