简体   繁体   English

正则表达式:检测字符串插值但不检测内部属性

[英]RegEx: Detect string interpolation but not inside attribute

I am working on creating Web Components and I need a Regular Expression that captures instances of string interpolation in a template sting.我正在创建 Web 组件,我需要一个正则表达式来捕获模板字符串中的字符串插值实例。
For example with the following string:例如使用以下字符串:

<img src="${this.image}"/><h5>${this.title}</h5><p>${this.description}</p>

The instances of string interpolation are inside ${} and can be captured with: (this(\.\w+)) .字符串插值的实例在${}内,可以用: (this(\.\w+))捕获。
But I do not want to capture the first instance because it is inside an attribute.但我不想捕获第一个实例,因为它在一个属性中。

I have tried the expression ((?<.".+)this(\?\w+)+(.!.+")) which works with a multiline string (each tag on own line) but now on a single line.我已经尝试过表达式((?<.".+)this(\?\w+)+(.!.+")) ,它适用于多行字符串(每个标记在自己的行上)但现在在一行上。

Here is my RegExr demo .这是我的 RegExr 演示
Perhaps someone with more exp in RegEx can help me out.也许在 RegEx 中有更多 exp 的人可以帮助我。

Edit编辑

To keep the question simple and to the point I didn't mention this...为了让问题简单明了,我没有提到这一点......

The reason I want to do this is because I have am using Lit to create Web Components, I have already created an interpolator function that returns a Lit TemplateResult , now I want highlight data with <b> tags so I want to replace RexEx matches with the unsafeHTML directive , but unsafeHTML throws an error when inside attributes.我想这样做的原因是因为我使用Lit创建 Web 组件,我已经创建了一个返回Lit TemplateResult的插值器函数,现在我想用<b>标签突出显示数据,所以我想用替换 RexEx 匹配unsafeHTML 指令,但 unsafeHTML 在属性内部时会抛出错误。
Here is my interpolator function:这是我的插值函数:

export function FillTemplate(templateString: string, data: any): TemplateResult {
    let regex = /((?<!".+)this(\.\w+)+(?!.+"))/g;
    if (regex.test(templateString)) {
        templateString = templateString.replace(/((?<!".+)this(\.\w+)+(?!.+"))/g, "unsafeHTML($1)");
    }
    return new Function('html', 'unsafeHTML', "return html`"+templateString +"`;").call(data, html, unsafeHTML);
};

.... I will also give this a think, maybe it's better for me to test the object keys and not the template string... ....我也会考虑一下,也许我最好测试对象键而不是模板字符串......

I think this should work for you:我认为这应该适合你:

[^"]\$\{(this\.\w+)

This will only take interpolations that are not preceded by "这只会采用不以"开头的插值

This one will account for attributes too (contrary to what's asked).这个也将考虑属性(与所要求的相反)。
An alternative solution, instead of Regex, (and if you trust the data ) would be using the Function constructor and let the JavaScript's parser interpret and evaluate the string as Template Literal and do the desired job for you:替代 Regex 的替代解决方案(如果您信任data )将使用Function 构造函数并让 JavaScript 的解析器将字符串解释和评估为 Template Literal 并为您完成所需的工作:

 const interpolate = (str, data) => Function("return (`" + str + "`);").call(data); // Use like: const str = '<img src="${this.image}"/><h5>${this.title}</h5><p>${this.description}</p>'; const data = { title: "Lorem ipsum", description: "Dolor sit amet", image: "https://i.stack.imgur.com/zH7ZS.jpg?s=64&g=1", }; document.body.insertAdjacentHTML("beforeend", interpolate(str, data));

Additionally, if you find the this in your template overly repetitive you could use the Object keys directly , and .apply() the values, like in this solution:此外,如果您发现模板中的this过于重复,您可以直接使用 Object 键.apply()值,就像在这个解决方案中一样:

 const interpolate = (str, data) => Function(...Object.keys(data), "return (`" + str + "`);").apply(null, Object.values(data)); // Use like: const str = '<img src="${image}"/><h5>${title}</h5><p>${description}</p>'; const data = { title: "Lorem ipsum", description: "Dolor sit amet", image: "https://i.stack.imgur.com/zH7ZS.jpg?s=64&g=1", }; document.body.insertAdjacentHTML("beforeend", interpolate(str, data));

or, similar as above (without the this , by using the Object keys) without unsafe evaluation, would be by using String.prototype.replace() and a Regex like /\$\{([^}]+)\}/g :或者,与上面类似(没有this ,通过使用 Object 键)而没有不安全的评估,将通过使用String.prototype.replace()和像/\$\{([^}]+)\}/g这样的正则表达式/\$\{([^}]+)\}/g

 const interpolate = (str, data) => str.replace(/\$\{([^}]+)\}/g, (_, k) => data[k]); const str = '<img src="${image}"/><h5>${title}</h5><p>${description}</p>'; const data = { title: "Lorem ipsum", description: "Dolor sit amet", image: "https://i.stack.imgur.com/zH7ZS.jpg?s=64&g=1", }; document.body.insertAdjacentHTML("beforeend", interpolate(str, data));

Use the following regex:使用以下正则表达式:

[^="]{2}\${(\S+?)}

  1. Attributes always will have a = and their value will be in quotes.属性总是有一个=并且它们的值将在引号中。 So [^="]{2} ensures that we match the two characters that are anything but = and " .所以[^="]{2}确保我们匹配除了="之外的两个字符。
  2. (\S+?) then lazily captures the required data in a capturing group. (\S+?)然后懒惰地在捕获组中捕获所需的数据。

Demo演示

You can use a negative lookbehind to account for a quoted attribute: ?<?=["'])\$\{this(:.\.\w+)+\} . This will exclude the src="${this.image}" in your example, but you'll get a false positive for HTML text, such as <p>Quote: "${this.quote}"</p>您可以使用负向回顾来解释引用的属性: ?<?=["'])\$\{this(:.\.\w+)+\} 。这将排除src="${this.image}"在你的例子中,但你会得到 HTML 文本的误报,例如<p>Quote: "${this.quote}"</p>

You can use a negative lookbehind to account for a quoted attribute in an HTML tag: (?<?<\w+ (\w+=["'][^"']*["'] )*\w+=["'])\$\{this(:.\.\w+)+\} .您可以使用否定回顾来说明 HTML 标记中的引用属性: (?<?<\w+ (\w+=["'][^"']*["'] )*\w+=["'])\$\{this(:.\.\w+)+\}

Here is an example with both regexes:这是两个正则表达式的示例:

 const regex1 = /(?<?["'])\$\{this(:.\;\w+)+\}/g? const regex2 = /(?<:<\w+ (\w+=["'][^"']*["'] )*\w+=["'])\$\{this(.;\.\w+)+\}/g. [ '<img src="${this.image}"/><h5>${this:title}</h5><p>${this.description}</p><p>Quote, "${this.quote}"</p>'. '<img foo="bar" src="${this.image}"/><h5>${this:title}</h5><p>${this.description}</p><p>Quote. "${this.quote}"</p>' ];forEach(str => { console.log(str): console,log('- regex1.'; str.match(regex1)): console,log('- regex2.'; str;match(regex2)); });

Explanation of regex2 : regex2的解释:

  • (?<! -- negative lookbehind start (?<! -- 否定回顾开始
  • <\w+ -- start of HTML tag and space <img <\w+ -- HTML 标签和空格的开始<img
  • (\w+=["'][^"']*["'] )* -- 0+ attributes of form attr="value" , with trailing space (\w+=["'][^"']*["'] )* -- 形式为attr="value"的 0+ 个属性,尾随空格
  • \w+=["'] -- attribute start, such as src=" or src=' \w+=["'] -- 属性开始,如src="src='
  • ) -- negative lookbehind end ) -- 消极的回溯结束
  • \$\{this -- literal ${this \$\{this -- 文字${this
  • (?:\.\w+)+ -- non-capture group for 1+ patterns of .something (?:\.\w+)+ -- 1+ 模式的非捕获组.something
  • \} -- literal } \} -- 文字}

Note: If your regex engine does not support negative lookbehind (notably Safari) you can change that to a capture group, and restore it with a .replace()注意:如果你的正则表达式引擎不支持负向后视(特别是 Safari),你可以将其更改为捕获组,并使用.replace()恢复它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM