简体   繁体   中英

Regex with variable number of groups in ruby or a workaround

So I know that regular expressions don't support variable number of groups , but since there seems to be a way to do this in C# I'm asking if there is any way to make this work in ruby? I don't have any deep knowledge of ruby so I am not really able to work this out myself.

If it is not possible, is there a way to change my logic so I can get what I wanted?

What I want to do is parse the bezier information of SVG files.

Here is my regex:

/(C)\s*(?:(-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?)[-,\s]{1}){5,}/

Here an example of the SVG:

<path d="M 15.43,29.45 C 23.73,28.89 38,25.96 44.2,25.42 46.48,25.22 47.41,27 47.16,29.29 46.59,34.67 45.5,46 44.14,53.63"/>
<path d="M 16.91,41.07 C 19.61,40.8 36.25,38.5 45.64,37.7"/>
<path d="M70.28,15.94c1.21,1.21,1.24,2.32,1.24,3.97c0,7.59-0.01,55.22-0.01,60.22C71.5,91,73,92.23,83.94,92.23c10.31,0,11.56-1.73,11.56-8.68"/>
<path d="M72.67,56.84c0.04,0.3,0.08,0.77-0.07,1.19C-0.9,2.52-6.07,8.03-13.15,11.41"/>

A bezier can have 6*n points. My regex matches the C and 5 successive points (I don't need the 6th) repeating if there are more than 6. When I match it like this, it will only give me the 5th point of the bezier instead of all of them.

So now, is there a feature in ruby that allows me to not overwrite the group every time?

If not, is there another way to match every point of a variable length bezier? I could just repeat the point matching routine of the regex a 100 times to match most of the real world cases but that would be silly and difficult to work with.

My ruby version is 1.9.3, updating would be no problem if it doesn't break any compabilities.

Your example doesn't make it clear why you need the C in the regex. why is that exactly? there is some other place where you can have 6+ points in a row?

Would something like this work?

(?:[\\.\\d]+\\,[\\.\\d]+\\s*?){5,5}

https://regex101.com/r/0VtdjW/1

Anyway, this one works using the \\G construct for version 1.93 on rubular.
In a single match, it grabs the first 5 pts and skips the 6th, then repeats.

(?:(?!^)\\G[-,\\s]|C)\\s*(-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)[-,\\s](-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)[-,\\s](-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)[-,\\s](-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)[-,\\s](-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)(?:[-,\\s]-?\\d+(?:\\.\\d+)?(?:[eE][+-]?\\d+)?)?

Explained

 (?:
      (?! ^ )                       # Not BOS
      \G                            # Start where last match left off to get next 5  pts.
      [-,\s]                        # required separator
   |                             # or,
      C                             # C - the start of a block of pts.
 )
                               # The first/next 5 pts. captured
 \s* 
 (                             # (1 start)
      -? \d+ 
      (?: \. \d+ )?
      (?: [eE] [+-]? \d+ )?
 )                             # (1 end)
 [-,\s] 
 (                             # (2 start)
      -? \d+ 
      (?: \. \d+ )?
      (?: [eE] [+-]? \d+ )?
 )                             # (2 end)
 [-,\s] 
 (                             # (3 start)
      -? \d+ 
      (?: \. \d+ )?
      (?: [eE] [+-]? \d+ )?
 )                             # (3 end)
 [-,\s] 
 (                             # (4 start)
      -? \d+ 
      (?: \. \d+ )?
      (?: [eE] [+-]? \d+ )?
 )                             # (4 end)
 [-,\s] 
 (                             # (5 start)
      -? \d+ 
      (?: \. \d+ )?
      (?: [eE] [+-]? \d+ )?
 )                             # (5 end)

 (?:                           # Skip the 6th pt.
      [-,\s] 
      -? \d+ 
      (?: \. \d+ )?
      (?: [eE] [+-]? \d+ )?
 )?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM