用于获取html起始标记的正则表达式

Question

I want get only the starting html tags. 我想只获得起始的html标签。 Lets say I have html like this 可以说我有这样的HTML

<div class="some">Here is a sample text<br /><p>A paragraph here</p></div>
<ul><li>List Item</li></ul>

From the above html I want to extract this information 从上面的html我想提取这些信息

<div
<br
<p
<ul
<li

see I dont need ending '>' of tags 看，我不需要结束'>'的标签

Answer 1

Try regex /<[a-zA-Z]+[1-6]?/g . 试试regex /<[a-zA-Z]+[1-6]?/g 。 I added the [1-6] for the header HTML tags - I think they're the only ones with numbers. 我为标题HTML标记添加了[1-6] - 我认为它们是唯一具有数字的标记。 If you wanted to be sure you could do /<[a-zA-Z0-9]+/g , since in HTML a < is always a tag (unless it's a comment <-- ), because in-line < get converted to < 如果你想确定你可以做/<[a-zA-Z0-9]+/g ，因为在HTML中一个<始终是一个标签（除非它是一个注释<-- ），因为在线< get转换到< . 。

Answer 2

以下内容将返回一个匹配数组，其中包含您想要的html正文。

'<div class="some">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul>'.match(/<\w+/g)

Answer 3

How about this: 这个怎么样：

String input = "<div class=\"some\">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul><6>";
Scanner scanner = new Scanner(input);
String result = "";
while( (result = scanner.findInLine("<\\w+")) !=null ){
    System.out.println(result);
}

用于获取html起始标记的正则表达式

问题描述

3 个解决方案

解决方案1
1 已采纳 2012-01-20 05:43:36

解决方案2
1 2012-01-20 05:50:26

解决方案3
0 2012-01-20 08:48:37

用于获取html起始标记的正则表达式

问题描述

3 个解决方案

解决方案1 1 已采纳 2012-01-20 05:43:36

解决方案2 1 2012-01-20 05:50:26

解决方案3 0 2012-01-20 08:48:37

解决方案1
1 已采纳 2012-01-20 05:43:36

解决方案2
1 2012-01-20 05:50:26

解决方案3
0 2012-01-20 08:48:37