简体   繁体   中英

simple filtering with `grep` , `awk`, `sed` or whatever else that's capable

I have a file, each line of which can be described by this grammar:

<text> <colon> <fullpath> <comma> <"by"> <text> <colon> <text> <colon> <text> <colon> <text>

Eg.,

needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... random comment ...>

How do I get the <fullpath> portion, which lies between the first <colon> and the first <comma>

(I'm not very inclined to write a program to parse this, though this looks like it could be done easily with javacc. Hoping to use some built-in tools like sed , awk , ...)

Or with a regex substitution

sed -n 's/^[^:]*:\([^:,]*\),.*/\1/p' file

Linux sed dialect; if on a different platform, maybe you need an -E option and/or take out the backslashes before the round parentheses; or just go with Perl instead;

perl -nle 'print $1 if m/:(.*?),/' file

Assuming the input will be similar to what you have above:

awk '{print $4}' | tr -d ,

For the entire file you can just type the file name next to the awk command to the command I have above.

If you're using bash script to parse this stuff, you don't even need tools like awk or sed.

$ text="needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... comment ...>"
$ text=${text%%,*}
$ text=${text#*: }
$ echo "$text"
src/foo/io.c

Read about this on the bash man page under Parameter Expansion .

with GNU grep:

grep -oP '(?<=: ).*?(?=,)'

This may find more than one substring if there are subsequent commas in the line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM