I am using EGREP regex to search for some patterns in a file that contains URLs. I want to find the first instance only in each line. For example, this is my regex:
egrep -io '^\<http(s)://home\>+\..+\.gov(\.au)?' input.txt
It output this instance:
https://home.xxx.gov/uuu.aspx?url=https://home.xxx.gov
But what I really look for in this specific example is:
https://home.xxx.gov
I do not care what comes after the .gov and I want to trim it. How can I do this?
You'll need a lazy quantifier , and for that you need Perl-style regexes:
egrep -P -io '^https?://home\..+?\.gov(\.au|\.uk)?' input.txt
If your egrep
doesn't support Perl regexes, you need to find a different way, for example
egrep -io '^https?://home\.[A-Za-z0-9.]+\.gov(\.au|\.uk)?' input.txt
or
egrep -io '^https?://home\.[^/]+\.gov(\.au|\.uk)?' input.txt
limiting the range of characters that may be matched by the regex. See also @sshashank124's solution.
你可以这样做:
^\\<https?://home\\.\\w+\\.gov(\\.au|\\.uk)?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.