简体   繁体   English

用于检索标签的正则表达式

[英]Regular expression for retrieving tags

in my project i want to retrieve tags from a web page for that i used dom methods.在我的项目中,我想从 web 页面检索标签,因为我使用了 dom 方法。

But tags can be created dynamically like document.write(“<a href=”http://somedomain.com”>”);但标签可以动态创建,如document.write(“<a href=”http://somedomain.com”>”);

here tags are given in the format of a string so i am trying to use regular expressions.这里的标签以字符串的格式给出,所以我试图使用正则表达式。

I want a regular expression which matches all the tags and attributes provided the expression should be able to extract specific attribute also我想要一个匹配所有标签和属性的正则表达式,前提是该表达式也应该能够提取特定属性

It is very hard to understand what you are asking and it is very unclear.很难理解你在问什么,也很不清楚。

First off: never use regex to parse HTML if you have an option.首先:如果可以的话,切勿使用正则表达式来解析 HTML。 It looks simple right?看起来很简单吧? No. You'll find a problem sooner or later.不,你迟早会发现问题的。

Second: what David said .第二:大卫说的。

Now here's a regex to match any HTML tag (have not tested it or anything so try it out first if you must):现在这里有一个正则表达式来匹配任何 HTML 标签(还没有测试过它或任何东西所以如果你必须先试试):

\<[^>]*\>

Be warned it will match a script tag too (do not let users write any tag to your page, whitelist a few if you must, and be prepared to have trouble if you don't use a library).请注意,它也会匹配脚本标签(不要让用户将任何标签写入您的页面,必要时将一些标签列入白名单,如果您不使用库,请做好遇到麻烦的准备)。

Try these out at RegExr for example (but remind that it uses ActionScript regexes, may be different from Javascript ones sometimes, for example Javascript has no lookahead/lookbehind.例如,在RegExr上尝试这些(但提醒它使用 ActionScript 正则表达式,有时可能与 Javascript 不同,例如 Javascript 没有前瞻/后视。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM