简体   繁体   中英

Regex to match string literal

I'm currently writing my own language and have a few different types of string literals, that use 3 different symbols to represent them. They are below.

1) "Hello" is a simple string literal that is compiled as Hello .

2) 'Hello' is a compressed string that accesses the string compression function. (This returns gibberish)

3) `Hello` returns a number constructed from each characters code points

I am trying to use regex to match a piece of code like

`Hel"lo` 2* "Hel`lo"

but can't come up with one that only matches when the first and last characters are the same. I have currently got

[`'\"]([\s\S]+|[^`'\"]+)['`\"]

but this doesn't produce the result I want.

The expected result for the example should be

['`Hel"lo`', ' ', '2', '*', ' ', '"Hel`lo"']

but my regex returns

['`Hel"lo` 2* "Hel`lo"']

In case you couldn't guess, I am kinda inexperienced at regex and so I'd appreciate any help.

If you just want to get the contents between the first delimiter and the closest identical trailing delimiter, you may use

import re
s = """`Hel"lo` 2* "Hel`lo\""""
print([x.group(2) for x in re.finditer(r"([\"'`])(.*?)\1", s)])

See the Python demo

Details :

  • ([\\"'`]) - Group 1 matching a double, single quote or a backtick
  • (.*?) - Group 2 capturing any 0+ chars, as few as possible, up to the first occurrence of
  • \\1 - the same value as kept in Group 1 ( \\1 is a backreference to Group 1 value).

Using a capture group isn't necessary, you can simply write your pattern like this:

`[^`]*`|"[^"]*"|'[^']*'|\w+|\s+|[^`"'\s\w]

One alternative per quote.

demo

Building on Wiktor Stribiżew this handles for multiline and escaped quotes:

([\"'`])(?:[\s\S])*?(?:(?<!\\)\1)

I tested and am using this in Javascript, but it works in Python as is:

Python Demo

Javascript Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM