简体   繁体   English

用于获取html起始标记的正则表达式

[英]regex for getting html starting tags

I want get only the starting html tags. 我想只获得起始的html标签。 Lets say I have html like this 可以说我有这样的HTML

<div class="some">Here is a sample text<br /><p>A paragraph here</p></div>
<ul><li>List Item</li></ul>

From the above html I want to extract this information 从上面的html我想提取这些信息

<div
<br
<p
<ul
<li

see I dont need ending '>' of tags 看,我不需要结束'>'的标签

Try regex /<[a-zA-Z]+[1-6]?/g . 试试regex /<[a-zA-Z]+[1-6]?/g I added the [1-6] for the header HTML tags - I think they're the only ones with numbers. 我为标题HTML标记添加了[1-6] - 我认为它们是唯一具有数字的标记。 If you wanted to be sure you could do /<[a-zA-Z0-9]+/g , since in HTML a < is always a tag (unless it's a comment <-- ), because in-line < get converted to &lt; 如果你想确定你可以做/<[a-zA-Z0-9]+/g ,因为在HTML中一个<始终是一个标签(除非它是一个注释<-- ),因为在线< get转换到&lt; .

以下内容将返回一个匹配数组,其中包含您想要的html正文。

'<div class="some">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul>'.match(/<\w+/g)

How about this: 这个怎么样:

String input = "<div class=\"some\">Here is a sample text<br /><p>A paragraph here</p></div><ul><li>List Item</li></ul><6>";
Scanner scanner = new Scanner(input);
String result = "";
while( (result = scanner.findInLine("<\\w+")) !=null ){
    System.out.println(result);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM