简体   繁体   English

正则表达式匹配前缀为'\\\\'的非空白空格或空格

[英]regex to match non-whitespace OR space prefixed with '\\'

I have a space delimited list of files names, where spaces in the file names are prefixed by '\\' 我有一个以空格分隔的文件名列表,其中文件名中的空格以'\\'为前缀

eg "first\\ file second\\ file" 例如“first \\ file second \\ file”

How can I get my regex to match each file name? 如何让我的正则表达式匹配每个文件名?

(\\ |[^ ])+

Everything except spaces, except when they're escaped. 除了空格之外的所有东西,除非它们被逃脱。 Should work, sorry for misunderstanding your question initially. 应该工作,抱歉最初误解你的问题。

(\S|(?<=\\) )+

Explanation: 说明:

You are looking for either non white-space characters ( \\S ) or a space preceded by a backslash, multiple times. 您正在寻找非空白字符( \\S )或前面带有反斜杠的空格,多次。

All matches will be saved to mach group 1, apply the pattern globally to get all matches in the string. 所有匹配项将保存到mach组1,全局应用模式以获取字符串中的所有匹配项。

EDIT 编辑

Thinking about it, you would not even need capturing to a sub-group. 想一想,你甚至不需要捕捉到一个小组。 The match alone will be enough, so this could be a tiny bit more efficient (the ?: switches to a non-capturing group): 仅仅匹配就足够了,所以这可能会更高效( ?:切换到非捕获组):

(?:\S|(?<=\\) )+

I would do it like this: 我会这样做:

/[^ \\]*(?:\\ [^\\ ]*)*/

This is Friedl's "unrolled loop" idiom. 这是弗里德尔的“展开循环”成语。 There will probably be very few escaped spaces in the target string relative to the other characters, so you gobble up as many of the other characters as you can each time you get a chance. 目标字符串中可能只有很少的转义空格相对于其他字符,因此每次有机会时,您可以尽可能多地吞噬其他字符。 This is much more efficient than an alternation matching one character at a time. 这比一次匹配一个字符的交替更有效。

Edit: (Tomalak) I put slashes around the regex because the syntax highlighter seems to recognize them and paints the whole regex in one color. 编辑:( Tomalak)我在正则表达式周围添加了斜杠,因为语法高亮显示器似乎识别它们并用一种颜色绘制整个正则表达式。 Without them, it can pick up on other characters, like quotation marks, and incorrectly (and confusingly) paint parts of the regex in different colors. 没有它们,它可以拾取其他字符,如引号,并且错误地(并且容易混淆地)以不同颜色绘制正则表达式的部分。

(Brad) The OP only mentioned spaces, so I only allowed for quoting them, but you're right. (布拉德)OP只提到空格,所以我只允许引用它们,但你是对的。 The original unrolled-loop example in the book was for double-quoted strings, which may contain any of several escape sequences, one of which is an escaped quotation mark. 本书中原始的展开循环示例是针对双引号字符串的,它可能包含几个转义序列中的任何一个,其中一个是转义引号。 Here's the regex: 这是正则表达式:

/"[^\\"]*(?:\\.[^\\"]*)*"/

(Tomalak) I don't know what you mean when you say that it doesn't match "the file name at the start of the string." (Tomalak)当你说它与字符串开头的文件名不匹配时,我不知道你的意思。 It seems to match both of the file names in the OP's example. 它似乎匹配OP示例中的两个文件名。 However, it also matches an empty string, which isn't good. 但是,它也匹配一个空字符串,这是不好的。 That can be fixed, but unless efficiency is proved to be a problem, it isn't worth the effort. 这可以修复,但除非效率被证明是一个问题,否则不值得努力。 Stefan's solution works fine. Stefan的解决方案运行良好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM