简体   繁体   中英

RegExp for BBCode tags javascript

I have this RegExp, and i dont know what's wrong with it

tag = new RegExp('(\\['+tag+'=("|'|)(.*?)\1\\])((?:.|\\r?\\n)*?)\\[/'+tag+']','g');

The bbcode tags can have double quotation marks, single quotation marks or no quotation marks.

[tag="teste"]123[/tag]
[tag='teste']123[/tag]
[tag=teste]123[/tag]

Desired output in captures: teste and 123

To match the optional quotation marks, it should be ("|'|) , (["|\\']*) or ("|\\'?) ?

Whats wrong with the string

First, let's correct the syntax in your string

  • You need to define the var tag

     tag = 'tag'; result = new RegExp( <...> ); 
  • You have unballanced quotes in '("|'|) <...> ' , that needs to be escaped as ("|\\'|)

  • Also, escape \\1 as \\\\1

so now we have the expression '(\\\\['+tag+'=("|\\'|)(.*?)\\\\1\\\\])((?:.|\\\\r?\\\\n)*?)\\\\[/'+tag+']' with the value:

(\[tag=("|'|)(.*?)\1\])((?:.|\r?\n)*?)\[/tag]

What's wrong with the RegEx

Only one thing really, in ("|\\'|)(.*?)\\\\1 you're using \\1 to match the same quotation mark as the one used as opening. However, the 1 refers to the first capturing group (the first parenthesis from left to right), but ("|'|) is actually the second set of parenthesis, the second group. All you need to do is change it to \\2 .

(\[tag=("|'|)(.*?)\2\])((?:.|\r?\n)*?)\[/tag]

That's it!

Let's add some final suggestions

  • Instead of .*? I would use [^\\]]+ (any characters except "]")
  • Use the i modifier (case-insensitive match, for "[tag]...[/TaG]")
  • ("|'|) is the same as ("|'?)
  • Instead of (?:.|\\r?\\n)*? I would use [\\s\\S]*? as @nhahtdh suggested

Code:

tag = 'tag';
result = new RegExp('(\\['+tag+'=("|\'?)([^\\]]+)\\2\\])([\\s\\S]*?)\\[/'+tag+']','gi');

Alternative: [EDIT: from info added in comments]

result = new RegExp('\\['+tag+'(?:=("|\'?)([^\\]]+)\\1)?\\]([\\s\\S]*?)\\[/'+tag+']', 'gi');

As for your second question: Although both (["|\\']*) and ("|\\'?) will match, the latter is the correct way for what you're trying to match. The * looks for 0 to infinite repetitions, and the | is interpreted as literal in a character class. Instead, ("|\\'?) matches a single quote, a double quote, or none.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM