如何使用EGREP搜索行中第一次出现的模式

Question

I am using EGREP regex to search for some patterns in a file that contains URLs. 我正在使用EGREP正则表达式在包含URL的文件中搜索某些模式。 I want to find the first instance only in each line. 我想在每一行中找到第一个实例。 For example, this is my regex: 例如，这是我的正则表达式：

egrep -io '^\<http(s)://home\>+\..+\.gov(\.au)?' input.txt

It output this instance: 它输出这个实例：

https://home.xxx.gov/uuu.aspx?url=https://home.xxx.gov

But what I really look for in this specific example is: 但是我在这个具体例子中真正寻找的是：

https://home.xxx.gov

I do not care what comes after the .gov and I want to trim it. 我不在乎.gov之后会发生什么，我想修剪它。 How can I do this? 我怎样才能做到这一点？

Answer 1

You'll need a lazy quantifier , and for that you need Perl-style regexes: 你需要一个懒惰的量词，为此你需要Perl风格的正则表达式：

egrep -P -io '^https?://home\..+?\.gov(\.au|\.uk)?' input.txt

If your egrep doesn't support Perl regexes, you need to find a different way, for example 如果你的egrep不支持Perl正则表达式，你需要找到一种不同的方式，例如

egrep -io '^https?://home\.[A-Za-z0-9.]+\.gov(\.au|\.uk)?' input.txt

or 要么

egrep -io '^https?://home\.[^/]+\.gov(\.au|\.uk)?' input.txt

limiting the range of characters that may be matched by the regex. 限制正则表达式可能匹配的字符范围。 See also @sshashank124's solution. 另见@ sshashank124的解决方案。

Answer 2

你可以这样做：

^\\<https?://home\\.\\w+\\.gov(\\.au|\\.uk)?

如何使用EGREP搜索行中第一次出现的模式

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-04-25 08:47:52

解决方案2
1 2014-04-25 08:47:04

如何使用EGREP搜索行中第一次出现的模式

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-04-25 08:47:52

解决方案2 1 2014-04-25 08:47:04

解决方案1
2 已采纳 2014-04-25 08:47:52

解决方案2
1 2014-04-25 08:47:04