简体   繁体   中英

Regex: Match string with substrings with the same pattern

I'm trying to match a string with a pattern, that can have sub strings with the same pattern.

Here's a example string:

Nicaragua [[NOTE|note|Congo was a member of ICCROM from 1999 and Nicaragua from 1971. Both were suspended by the ICCROM General Assembly in November 2013 having omitted to pay contributions for six consecutive calendar years (ICCROM [[Statutes|s|url|www.iccrom.org/about/statutes/]], article 9).]]. Another [[link|url|google.com]] that might appear.

and here's the pattern:

[[display_text|code|type|content]]

So, what I want with that is get the string within the brackets, and then look for some more string that match the pattern within the top level one.

and what I want is match this:

  1. [[NOTE|s|note|Congo was a member of ICCROM from 1999 and Nicaragua from 1971. Both were suspended by the ICCROM General Assembly in November 2013 having omitted to pay contributions for six consecutive calendar years (ICCROM [[Statutes|s|url|www.iccrom.org/about/statutes/]], article 9).]]

1.1 [[Statutes|s|url|www.iccrom.org/about/statutes/]]

  1. [[link|s|url|google.com]]

I was using this /(\\[\\[.*]])/ but it gets everything until the last ]] .

What I want with that is be able to identify the matched string and convert them to HTML elements, where |note| is going to be a blockquote tag and |url| an a tag. So, a blockquote tag can have link tag inside it.

BTW, I'm using CoffeeScript to do that.

Thanks in advance.

In general, regex is not good at dealing with nested expressions. If you use greedy patterns, they'll match too much, and if you use non-greedy patterns, as @bjfletcher suggests, they'll match too little, stopping inside the outer content. The "traditional" approach here is a token-based parser, where you step through characters one by one and build an abstract syntax tree (AST) which you then reformat as desired.

One slightly hacky approach I've used here is to convert the string to a JSON string, and let the JSON parser do the hard work of converting into nested objects: http://jsfiddle.net/t09q783d/1/

function toPoorMansAST(s) {
    // escape double-quotes, as they'll cause problems otherwise. This converts them
    // to unicode, which is safe for JSON parsing.
    s = s.replace(/"/g, "\u0022");
    // Transform to a JSON string!
    s =
        // Wrap in array delimiters
        ('["' + s + '"]')
        // replace token starts
        .replace(/\[\[([^\|]+)\|([^\|]+)\|([^\|]+)\|/g,
             '",{"display_text":"$1","code":"$2","type":"$3","content":["')
        // replace token ends
        .replace(/\]\]/g, '"]},"');

    return JSON.parse(s);
}

This gives you an array of strings and structured objects, which you can then run through a formatter to spit out the HTML you'd like. The formatter is left as an exercise for the user :).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM