简体   繁体   English

用于从带有可选字符 ~ 的字符串中捕获路径的正则表达式 (perl|awk|sed|..)

[英]regex for capturing path from a string with optional character ~ (perl|awk|sed|..)

I want to match everything between first and last slash / including optional ~ before first slash.我想匹配第一个和最后一个斜杠之间的所有内容/包括可选的~在第一个斜杠之前。

I used this for the first part:我在第一部分使用了这个:

echo ~~a~/dir1/di r2/b.c \
| perl -pe 's/[^\/]*(\/.*\/).*/\1/'

which produces /dir1/di r2/ .产生/dir1/di r2/

This match includes the tilde:这场比赛包括波浪号:

perl -pe 's/ .* ( ~ \\/.*\\/).*/\\1/' perl -pe 's/ .* ( ~ \\/.*\\/).*/\\1/'

but adding ?但添加? for optional character doesn't seem to work like in these cases:在这些情况下,对于可选字符似乎不起作用:

perl -pe 's/ .* ( ~? \\/.*\\/).*/\\1/' -> /di r2/ perl -pe 's/ .* ( ~? \\/.*\\/).*/\\1/' -> /di r2/
perl -pe 's/ .* ( (?:~) \\/.*\\/).*/\\1/' -> ~~a/dir1/di r2/bc perl -pe 's/ .* ( (?:~) \\/.*\\/).*/\\1/' -> ~~a/dir1/di r2/bc

What am I doing wrong?我做错了什么?

If I understood the desired output right, this works for me with or without tilde如果我正确理解了所需的输出,无论是否使用波浪号,这都适用于我

echo "path /d1/d2/43a/" | perl -nE 'm{ ( ~? (?: /.*/ | /) ) }x; say "$1"'

Prints版画

/d1/d2/43a/

Same Perl code, with a tilde before the first slash in the input相同的 Perl 代码,在输入的第一个斜杠前有一个波浪号

echo "path ~/d1/d2/43a/" | perl -nE 'm{ ( ~? (?: /.*/ | /) ) }x; say "$1"'

prints版画

~/d1/d2/43a/

Notes Use of /1 in the substitution is deprecated.注释不推荐在替换中使用/1 Use $1 instead.请改用$1 With {} for the delimiters we don't have to escape / , making it more readable (while with delimiters other than // we can't leave out m in front).使用{}作为分隔符,我们不必转义/ ,使其更具可读性(而使用除//以外的分隔符时,我们不能在前面省略m )。 Otherwise the same works when using / for delimiter and then escaping it inside.否则,在使用/作为分隔符然后将其转义到内部时,同样有效。


Update更新

To also catch a lone ~/ (or / ), the simplest change was to add that explicitly, /.*/ | /为了同时捕获一个单独的~/ (或/ ),最简单的更改是显式添加/.*/ | / /.*/ | / . /.*/ | / . In order to capture the (optinal) ~ in both cases there is a (non-capturing) grouping around this.为了在这两种情况下捕获(可选) ~ ,围绕此有一个(非捕获)分组。 Removed -w flag so no warnings are issued when the input string has no slashes at all, but only an empty line is printed.删除了-w标志,因此当输入字符串根本没有斜杠时不会发出警告,但只打印一个空行。

Original requirements原始要求

File data档案data

~~a~/dir1/di r2/b.c
/dir1/di r2/z.y
~/dir1/di r3/p.q
gobbledegook~/name/more/still/more/notwanted.c
xxx~//yyy

Script脚本

perl -ple 's%(?:^.*?)((?:^|~)/.*/).*%$1%' data

Example output示例输出

~/dir1/di r2/
/dir1/di r2/
~/dir1/di r3/
~/name/more/still/more/
~//

Is that what you needed?那是你需要的吗?

Dissecting the regex剖析正则表达式

s%(?:^.*?)((?:^|~)/.*/).*%$1%

The first part, (?:^.*?) is a non-capturing non-greedy match for an arbitrary sequence of characters at the start of the line.第一部分(?:^.*?)是对行首任意字符序列的非捕获非贪婪匹配。

The second part, ((?:^|~)/.*/) , is a capturing expression that contains a non-capturing term that matches at the start of a line, or a tilde, followed by a slash and a greedy anything up to the last slash on the line.第二部分((?:^|~)/.*/)是一个捕获表达式,它包含一个在行首或波浪号匹配的非捕获术语,后跟一个斜杠和一个贪婪的任何东西直到最后一个斜线。

The trailing .* matches everything after the second part.尾随的.*匹配第二部分之后的所有内容。

The replacement is simply what was captured;替换只是捕获的内容; the rest is Perl being Perl.剩下的就是 Perl 就是 Perl。


Revised requirements修订要求

The original problem statement was incomplete, it seems.原来的问题陈述似乎是不完整的。 Apparently:显然:

for single slash it should output just / (with accompanying tilde if present).对于单斜杠,它应该只输出/ (如果存在,则带有伴随的波浪号)。 For no slashes preferably empty string as there is no match.对于没有斜杠,最好是空字符串,因为没有匹配项。 … And for this case ~ab/c/df it returns full string; ... 对于这种情况, ~ab/c/df它返回完整的字符串; instead it should return /c/ .相反,它应该返回/c/

So, here is a revised script to deal with the special extra cases (what happened to 'learning how to fish'?).所以,这里有一个修改过的脚本来处理特殊的额外情况(“学习如何钓鱼”发生了什么?)。 The ~ab/c/df case was a missing ? ~ab/c/df案例丢失了? qualifier on a 'start of string or tilde' grouping. “字符串开头或波浪号”分组上的限定符。

Revised data file修订的data文件

~~a~/dir1/di r2/b.c
/dir1/di r2/z.y
~/dir1/di r3/p.q
gobbledegook~/name/more/still/more/notwanted.c
xxx~//yyy
not-a-slash-in-sight
just-the-one/with-extra-info
just-the~/with-more-info
~/one-slash-at-start-with-tilde
/one-slash-at-start-without-tilde
~a b/c/d.f

Revised script修改脚本

perl -ple 's%^[^/]*$%%; s%(?:^[^/]*?)((?:^|~)?/)[^/]*$%$1%; s%(?:^[^/]*?)((?:^|~)?/.*/).*%$1%' data

A mildly modified of the original expression comes last.对原始表达式的轻微修改放在最后。

The first s/// looks for lines without any / and replaces them with nothing.第一个s///查找没有任何/并用空替换它们。

The second s/// looks for lines with a slash, possibly preceded by tilde or start of line, followed by non-slashes to end of line with the optional tilde and the slash.第二个s///查找带有斜杠的行,前面可能是波浪号或行首,然后是非斜杠到带有可选波浪号和斜杠的行尾。

The output of the first two in event of a match does not match the third s/// .匹配事件中前两个的输出与第三个s///不匹配。

Revised output修订后的产出

~/dir1/di r2/
/dir1/di r2/
~/dir1/di r3/
~/name/more/still/more/
~//

/
~/
~/
/
/c/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM