[英]regex for getting html starting tags
I want get only the starting html tags. 我想只获得起始的html标签。 Lets say I have html like this
可以说我有这样的HTML
<div class="some">Here is a sample text<br /><p>A paragraph here</p></div>
<ul><li>List Item</li></ul>
From the above html I want to extract this information 从上面的html我想提取这些信息
<div
<br
<p
<ul
<li
see I dont need ending '>' of tags 看,我不需要结束'>'的标签
Try regex /<[a-zA-Z]+[1-6]?/g
. 试试regex
/<[a-zA-Z]+[1-6]?/g
。 I added the [1-6]
for the header HTML tags - I think they're the only ones with numbers. 我为标题HTML标记添加了
[1-6]
- 我认为它们是唯一具有数字的标记。 If you wanted to be sure you could do /<[a-zA-Z0-9]+/g
, since in HTML a <
is always a tag (unless it's a comment <--
), because in-line <
get converted to <
如果你想确定你可以做
/<[a-zA-Z0-9]+/g
,因为在HTML中一个<
始终是一个标签(除非它是一个注释<--
),因为在线<
get转换到<
. 。
以下内容将返回一个匹配数组,其中包含您想要的html正文。
'<div class="some">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul>'.match(/<\w+/g)
How about this: 这个怎么样:
String input = "<div class=\"some\">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul><6>";
Scanner scanner = new Scanner(input);
String result = "";
while( (result = scanner.findInLine("<\\w+")) !=null ){
System.out.println(result);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.