正则表达式匹配转义撇号

Question

$str = "'ei-1395529080',0,0,1,1,'Name','email@domain.com','Sentence with \'escaped apostrophes\', which \'should\' be on one line!','no','','','yes','6.50',NULL";

preg_match_all("/(')?(.*?)(?(1)(?!\\\\)'),/s", $str.',', $values);
print_r($values);

我正在尝试用这些目标编写一个正则表达式：

返回的数组,分隔值（注意我追加到$str第2行）
如果数组项以'开头，则匹配结束'
但是，如果它像\\'一样被转义，则继续捕获值，直到找到'没有先前\\ '

如果你尝试这些线路，它遇到\\',时就行为不端\\',

任何人都可以解释发生了什么以及如何解决它？ 谢谢。

Answer 1

这就是我要解决这个问题的方法：

('(?>\\.|.)*?'|[^\,]+)

Regex101

说明：

(              Start capture group
    '          Match an apostrophe
    (?>        Atomically match the following
        \\.    Match \ literally and then any single character
        |.     Or match just any single character
    )          Close atomic group
    *?'        Match previous group 0 or more times until the first '
    |[^\,]     OR match any character that is not a comma (,)
    +          Match the previous regex [^\,] one or more times
)              Close capture group

关于原子团如何工作的说明：

说我有这个字符串'a \\' b'

原子组(?>\\\\.|.)将在每个步骤按以下方式匹配此字符串：

'
a
\\'
b
'

如果匹配在将来失败，它将不会尝试匹配\\' as \\ ， '但是如果匹配则总是匹配/使用第一个选项。

如果你需要帮助逃避正则表达式，这里是转义版本： ('(?>\\\\\\\\.|.)*?'|[^\\\\,]+)

虽然我昨天花了大约10个小时写正则表达式，但我对它并不太熟悉。 我研究过逃避反斜杠但是被我读到的东西搞糊涂了。 你原来的答案没有逃脱的原因是什么？ 它取决于不同的语言/平台吗？ 〜OP

关于为什么你必须在编程语言中逃避正则表达式的部分。

当您编写以下字符串时：

"This is on one line.\nThis is on another line."

您的程序将按字面解释\\n并按以下方式查看：

"This is on one line.
 This is on another line."

在正则表达式中，这可能会导致问题。 假设您想要匹配所有不是换行符的字符。 这是你怎么做的：

"[^\n]*"

但是， \\n在用编程语言编写时会按字面解释，并且可以通过以下方式看到：

"[^
 ]*"

我相信你可以说，这是错误的。 因此，要解决这个问题，我们逃串。 通过在第一个反斜杠前放置一个反斜杠，可以告诉编程语言以不同的方式查看\\n （或任何其他转义序列： \\r ， \\t ， \\\\ ，等）。 在基本级别上，转义原始转义序列\\n换另一个转义序列，然后转换为字符\\\\ ， n 。 这就是逃逸如何影响上面的正则表达式。

"[^\\n]*"

编程语言将看到的方式如下：

"[^\n]*"

这是因为\\\\是一个转义序列，意思是“当你看到\\\\将它按字面解释为\\ ”时。 因为\\\\已经被消费和解释，所以要读取的下一个字符是n ，因此不再是转义序列的一部分。

那么为什么我的转义版本中有4个反斜杠？ 让我们来看看：

(?>\\.|.)

所以这是我们写的原始正则表达式。 我们有两个连续的反斜杠。 正则表达式的这一部分（ \\\\. ）意味着“每当你看到反斜杠，然后是任何字符，匹配”。 为了保留正则表达式引擎的这种解释，我们必须逃避每个单独的反斜杠。

\\ \\ .

所以它们一起看起来像这样：

(?>\\\\.|.)

Answer 2

像这样： (?:'([^'\\\\]*(?:\\\\.[^'\\\\]*)*)'|([^,]+))

正则表达式可视化

# (?:'([^'\\]*(?:\\.[^'\\]*)*)'|([^,]+))
# 
# Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Greedy quantifiers
# 
# Match the regular expression below «(?:'([^'\\]*(?:\\.[^'\\]*)*)'|([^,]+))»
#    Match this alternative (attempting the next alternative only if this one fails) «'([^'\\]*(?:\\.[^'\\]*)*)'»
#       Match the character “'” literally «'»
#       Match the regex below and capture its match into backreference number 1 «([^'\\]*(?:\\.[^'\\]*)*)»
#          Match any single character NOT present in the list below «[^'\\]*»
#             Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#             The literal character “'” «'»
#             The backslash character «\\»
#          Match the regular expression below «(?:\\.[^'\\]*)*»
#             Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#             Match the backslash character «\\»
#             Match any single character that is NOT a line break character (line feed) «.»
#             Match any single character NOT present in the list below «[^'\\]*»
#                Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
#                The literal character “'” «'»
#                The backslash character «\\»
#       Match the character “'” literally «'»
#    Or match this alternative (the entire group fails if this one fails to match) «([^,]+)»
#       Match the regex below and capture its match into backreference number 2 «([^,]+)»
#          Match any character that is NOT a “,” «[^,]+»
#             Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

https://regex101.com/r/pO0cQ0/1

preg_match_all('/(?:\'([^\'\\\\]*(?:\\\\.[^\'\\\\]*)*)\'|([^,]+))/', $subject, $result, PREG_SET_ORDER);
for ($matchi = 0; $matchi < count($result); $matchi++) {
    // @todo here use $result[$matchi][1] to match quoted strings (to then process escaped quotes)
    // @todo here use $result[$matchi][2] to match unquoted strings
}

正则表达式匹配转义撇号

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-12-11 03:42:52

解决方案2
2 2015-12-11 03:45:26

正则表达式匹配转义撇号

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-12-11 03:42:52

解决方案2 2 2015-12-11 03:45:26

解决方案1
3 已采纳 2015-12-11 03:42:52

解决方案2
2 2015-12-11 03:45:26