如果标签重复，如何将信息与html标签内的正则表达式匹配？

Question

Like if I have the tags 就像我有标签一样

<td class="cit-borderleft cit-data">437</td>
<td class="cit-borderleft cit-data">394</td>
<td class="cit-borderleft cit-data">12</td>
<td class="cit-borderleft cit-data">**12**</td>

But I need to match number 12 in the last tag. 但是我需要在最后一个标签中匹配数字12。 I am using the regex expression "<td class=\\"cit-borderleft cit-data\\">(.*?)</td>" but it will match all four of the tags. 我使用的是正则表达式"<td class=\\"cit-borderleft cit-data\\">(.*?)</td>"但它将匹配所有四个标记。

Answer 1

Don't use regex. 不要使用正则表达式。 Use proper XML/HTML parser like jsoup . 使用适当的XML / HTML解析器，例如jsoup 。 If you simply want to get text from last element of type td with classes cit-borderleft cit-data you can use 如果您只想从cit-borderleft cit-data类的td类型的最后一个元素中获取文本，则可以使用

String html = 
        "<table>" +
        "<td class=\"cit-borderleft cit-data\">437</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">394</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">12</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">**12**</td>" +
        "</table>";
Document doc = Jsoup.parse(html);
Element last = doc.select("td.cit-borderleft.cit-data").last();
System.out.println(last.text());

Output: **12** 输出： **12**

If you then want to remove these * simply call replace("*","") on that string and you will get new one without asterisks. 如果随后要删除这些*只需在该字符串上调用replace("*","") ，您将获得一个新的不带星号的字符串。

Answer 2

Try this: 尝试这个：

<td class=\"cit-borderleft cit-data\">\*\*(.*?)\*\*<\/td>

or simple way, this: 或简单的方法，这：

\*\*(\d+)\*\*

Answer 3

Based on your attempt 根据您的尝试

<td class=\"cit-borderleft cit-data\">(.*?)<\/td>(?![\s\S]*<\/td>)

Demo 演示版
added this part (?![\\s\\S]*<\\/td>) 添加了这部分(?![\\s\\S]*<\\/td>)

(?!             # Negative Look-Ahead
  [\s\S]        # Character in [\s\S] Character Class
  *             # (zero or more)(greedy)
  <             # "<"
  \/            # "/"
  td>           # "td>"
)               # End of Negative Look-Ahead

Answer 4

I don't get why you're using [tag:regex] to parse an HTML tag but here it is 我不明白为什么您使用[tag：regex]解析HTML标记，但这是

如果标签重复，如何将信息与html标签内的正则表达式匹配？

问题描述

4 个解决方案

解决方案1
2 2016-07-30 15:32:13

解决方案2
0 2016-07-30 15:03:42

解决方案3
0 2016-07-30 15:22:02

解决方案4
0 2016-07-30 15:36:49

Regex101 正则表达式101
`(?<=<td class=\\"cit-borderleft cit-data\\">\\\\)\\d(?=\\\\*<\\/td>)`

如果标签重复，如何将信息与html标签内的正则表达式匹配？

问题描述

4 个解决方案

解决方案1 2 2016-07-30 15:32:13

解决方案2 0 2016-07-30 15:03:42

解决方案3 0 2016-07-30 15:22:02

解决方案4 0 2016-07-30 15:36:49

Regex101 正则表达式101 (?<=<td class=\\"cit-borderleft cit-data\\">\\*\\*)\\d*(?=\\*\\*<\\/td>)

解决方案1
2 2016-07-30 15:32:13

解决方案2
0 2016-07-30 15:03:42

解决方案3
0 2016-07-30 15:22:02

解决方案4
0 2016-07-30 15:36:49

Regex101 正则表达式101
`(?<=<td class=\\"cit-borderleft cit-data\\">\\\\)\\d(?=\\\\*<\\/td>)`