简体   繁体   中英

regex for capturing path from a string with optional character ~ (perl|awk|sed|..)

I want to match everything between first and last slash / including optional ~ before first slash.

I used this for the first part:

echo ~~a~/dir1/di r2/b.c \
| perl -pe 's/[^\/]*(\/.*\/).*/\1/'

which produces /dir1/di r2/ .

This match includes the tilde:

perl -pe 's/ .* ( ~ \\/.*\\/).*/\\1/'

but adding ? for optional character doesn't seem to work like in these cases:

perl -pe 's/ .* ( ~? \\/.*\\/).*/\\1/' -> /di r2/
perl -pe 's/ .* ( (?:~) \\/.*\\/).*/\\1/' -> ~~a/dir1/di r2/bc

What am I doing wrong?

If I understood the desired output right, this works for me with or without tilde

echo "path /d1/d2/43a/" | perl -nE 'm{ ( ~? (?: /.*/ | /) ) }x; say "$1"'

Prints

/d1/d2/43a/

Same Perl code, with a tilde before the first slash in the input

echo "path ~/d1/d2/43a/" | perl -nE 'm{ ( ~? (?: /.*/ | /) ) }x; say "$1"'

prints

~/d1/d2/43a/

Notes Use of /1 in the substitution is deprecated. Use $1 instead. With {} for the delimiters we don't have to escape / , making it more readable (while with delimiters other than // we can't leave out m in front). Otherwise the same works when using / for delimiter and then escaping it inside.


Update

To also catch a lone ~/ (or / ), the simplest change was to add that explicitly, /.*/ | / /.*/ | / . In order to capture the (optinal) ~ in both cases there is a (non-capturing) grouping around this. Removed -w flag so no warnings are issued when the input string has no slashes at all, but only an empty line is printed.

Original requirements

File data

~~a~/dir1/di r2/b.c
/dir1/di r2/z.y
~/dir1/di r3/p.q
gobbledegook~/name/more/still/more/notwanted.c
xxx~//yyy

Script

perl -ple 's%(?:^.*?)((?:^|~)/.*/).*%$1%' data

Example output

~/dir1/di r2/
/dir1/di r2/
~/dir1/di r3/
~/name/more/still/more/
~//

Is that what you needed?

Dissecting the regex

s%(?:^.*?)((?:^|~)/.*/).*%$1%

The first part, (?:^.*?) is a non-capturing non-greedy match for an arbitrary sequence of characters at the start of the line.

The second part, ((?:^|~)/.*/) , is a capturing expression that contains a non-capturing term that matches at the start of a line, or a tilde, followed by a slash and a greedy anything up to the last slash on the line.

The trailing .* matches everything after the second part.

The replacement is simply what was captured; the rest is Perl being Perl.


Revised requirements

The original problem statement was incomplete, it seems. Apparently:

for single slash it should output just / (with accompanying tilde if present). For no slashes preferably empty string as there is no match. … And for this case ~ab/c/df it returns full string; instead it should return /c/ .

So, here is a revised script to deal with the special extra cases (what happened to 'learning how to fish'?). The ~ab/c/df case was a missing ? qualifier on a 'start of string or tilde' grouping.

Revised data file

~~a~/dir1/di r2/b.c
/dir1/di r2/z.y
~/dir1/di r3/p.q
gobbledegook~/name/more/still/more/notwanted.c
xxx~//yyy
not-a-slash-in-sight
just-the-one/with-extra-info
just-the~/with-more-info
~/one-slash-at-start-with-tilde
/one-slash-at-start-without-tilde
~a b/c/d.f

Revised script

perl -ple 's%^[^/]*$%%; s%(?:^[^/]*?)((?:^|~)?/)[^/]*$%$1%; s%(?:^[^/]*?)((?:^|~)?/.*/).*%$1%' data

A mildly modified of the original expression comes last.

The first s/// looks for lines without any / and replaces them with nothing.

The second s/// looks for lines with a slash, possibly preceded by tilde or start of line, followed by non-slashes to end of line with the optional tilde and the slash.

The output of the first two in event of a match does not match the third s/// .

Revised output

~/dir1/di r2/
/dir1/di r2/
~/dir1/di r3/
~/name/more/still/more/
~//

/
~/
~/
/
/c/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM