简体   繁体   English

使用Regex从HTML标签提取字符串

[英]using Regex to extract strings from HTML tags


Hello 你好

I'm trying to get all the variables from an underscore template, so in this string : 我正在尝试从下划线模板中获取所有变量,因此在此字符串中:

 <%=userID %> </td><td><%=username %> </td><td><%=firstname %>

I'd like to get an array : 我想要一个数组:

{userID, username,firstname}

Some notes : 一些注意事项:

  1. I can't assume there are any spaces in the string. 我不能假设字符串中有任何空格。

  2. Variable names can repeat themselves in the template. 变量名称可以在模板中重复。

  3. html tags can vary, this is simply an example. html标记可能会有所不同,这只是一个示例。 the template can be based on 's or or anything else. 模板可以基于或其他。

What I tried : 我试过了

    var regexp = /<%=(.+)%>/;

Why it failed 为什么失败

The above Regexp would get the initial string as well, as it fits the requirements of the regexp. 上面的Regexp也将获得初始字符串,因为它符合regexp的要求。 I'm not too experienced with Regexp's and I'm afraid i'm missing something really simple. 我对Regexp的经验不太了解,恐怕我错过了一些非常简单的东西。

I also know that in general it's bad practice to parse HTML with regex, however this specific example isn't exactly HTML parsing (in my opinion), as I don't need a specific html tag. 我也知道,通常来说,用正则表达式解析HTML是一种不好的做法,但是,由于我不需要特定的html标签,因此此特定示例并非完全是HTML解析(在我看来)。

Thanks in advance! 提前致谢!

You need to use brackets for grouping and use character classes to limit the matched characters. 您需要使用方括号进行分组,并使用字符类来限制匹配的字符。 Try: 尝试:

    var regexp = /<%=([\w\s]+)%>/g;
    var html = "<%=userID %> </td><td><%=username %> </td><td><%=firstname %>";
    var match = [], result = [];

    while (match = regexp.exec(html))
        result.push(match[1].trim());
    console.log("Result = " + result);

    // Result = userID,username,firstname 

Change your regex to: 将您的正则表达式更改为:

<%=(.+?)%>

? is a lazy matcher. 是一个懒惰的匹配者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM