简体   繁体   English

正则表达式匹配字符串文字

[英]Regex to match string literal

I'm currently writing my own language and have a few different types of string literals, that use 3 different symbols to represent them. 我目前正在编写自己的语言,并且有一些不同类型的字符串文字,它们使用3个不同的符号来表示它们。 They are below. 他们在下面。

1) "Hello" is a simple string literal that is compiled as Hello . 1) "Hello"是一个简单的字符串文字,编译为Hello

2) 'Hello' is a compressed string that accesses the string compression function. 2) 'Hello'是一个访问字符串压缩函数的压缩字符串。 (This returns gibberish) (这会返回乱码)

3) `Hello` returns a number constructed from each characters code points 3) `Hello`返回从每个字符代码点构造的数字

I am trying to use regex to match a piece of code like 我正在尝试使用正则表达式匹配一段代码

`Hel"lo` 2* "Hel`lo"

but can't come up with one that only matches when the first and last characters are the same. 但是不能提出只在第一个和最后一个字符相同时才匹配的那个。 I have currently got 我现在有

[`'\"]([\s\S]+|[^`'\"]+)['`\"]

but this doesn't produce the result I want. 但这不会产生我想要的结果。

The expected result for the example should be 该示例的预期结果应该是

['`Hel"lo`', ' ', '2', '*', ' ', '"Hel`lo"']

but my regex returns 但我的正则表达式回归

['`Hel"lo` 2* "Hel`lo"']

In case you couldn't guess, I am kinda inexperienced at regex and so I'd appreciate any help. 万一你无法猜测,我对正则表达式缺乏经验,所以我很感激任何帮助。

If you just want to get the contents between the first delimiter and the closest identical trailing delimiter, you may use 如果您只想获取第一个分隔符和最近的相同尾随分隔符之间的内容,您可以使用

import re
s = """`Hel"lo` 2* "Hel`lo\""""
print([x.group(2) for x in re.finditer(r"([\"'`])(.*?)\1", s)])

See the Python demo 请参阅Python演示

Details : 细节

  • ([\\"'`]) - Group 1 matching a double, single quote or a backtick ([\\"'`]) - 组1匹配双引号,单引号或反引号
  • (.*?) - Group 2 capturing any 0+ chars, as few as possible, up to the first occurrence of (.*?) - 第2组捕获任何0+字符,尽可能少,直到第一次出现
  • \\1 - the same value as kept in Group 1 ( \\1 is a backreference to Group 1 value). \\1 - 与组1中保存的值相同( \\1是对组1值的反向引用)。

Using a capture group isn't necessary, you can simply write your pattern like this: 使用捕获组不是必需的,您可以简单地编写您的模式:

`[^`]*`|"[^"]*"|'[^']*'|\w+|\s+|[^`"'\s\w]

One alternative per quote. 每个报价一个替代方案

demo 演示

Building on Wiktor Stribiżew this handles for multiline and escaped quotes: WiktorStribiżew的基础上,它处理多行和转义引号:

([\"'`])(?:[\s\S])*?(?:(?<!\\)\1)

I tested and am using this in Javascript, but it works in Python as is: 我测试并在Javascript中使用它,但它在Python中工作原样:

Python Demo Python演示

Javascript Demo Javascript演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM