I have a file, each line of which can be described by this grammar:
<text> <colon> <fullpath> <comma> <"by"> <text> <colon> <text> <colon> <text> <colon> <text>
Eg.,
needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... random comment ...>
How do I get the <fullpath>
portion, which lies between the first <colon>
and the first <comma>
(I'm not very inclined to write a program to parse this, though this looks like it could be done easily with javacc. Hoping to use some built-in tools like sed
, awk
, ...)
Or with a regex substitution
sed -n 's/^[^:]*:\([^:,]*\),.*/\1/p' file
Linux sed
dialect; if on a different platform, maybe you need an -E
option and/or take out the backslashes before the round parentheses; or just go with Perl instead;
perl -nle 'print $1 if m/:(.*?),/' file
Assuming the input will be similar to what you have above:
awk '{print $4}' | tr -d ,
For the entire file you can just type the file name next to the awk
command to the command I have above.
If you're using bash script to parse this stuff, you don't even need tools like awk or sed.
$ text="needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... comment ...>"
$ text=${text%%,*}
$ text=${text#*: }
$ echo "$text"
src/foo/io.c
Read about this on the bash
man page under Parameter Expansion .
with GNU grep:
grep -oP '(?<=: ).*?(?=,)'
This may find more than one substring if there are subsequent commas in the line.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.