简体   繁体   English

awk / sed从模式之间提取字符串

[英]awk/sed extract string from between patterns

I know there has probably been a few hundred forms of this question asked on stackoverflow, but I can't seem to find a suitable answer to my question. 我知道在stackoverflow上可能有几百种形式的问题,但我似乎无法找到适合我的问题的答案。

I'm trying to parse through the /etc/ldap.conf file on a Linux box so that I can specifically pick out the description fields from between (description= and ) : 我正在尝试解析Linux盒子上的/etc/ldap.conf文件,以便我可以从两者之间专门挑选描述字段(description=)

*-bash-3.2$ grep '^nss_base_passwd' /etc/ldap.conf

nss_base_passwd ou=People,dc=ca,dc=somecompany,dc=com?one?|(description= TD_FI )(description= TD_F6 )(description= TD_F6 )(description= TRI_142 )(description= 14_142 )(description= REX5 )(description= REX5 )(description= 1950 )*

I'm looking to extract these into their own list with no duplicates: 我希望将这些提取到他们自己的列表中,没有重复:

TD_FI
TD_F6
TRI_142
14_142
REX5
1950

(or all on one line with a proper delimiter) (或所有在一行上有适当的分隔符)

I had played with sed for a few hours but couldn't get it to work - I'm not entirely sure how to use the global option. 我玩了几个小时的sed但是无法让它工作 - 我不完全确定如何使用全局选项。

You could use grep with -P option, 你可以使用grep和-P选项,

$ grep '^nss_base_passwd' /etc/ldap.conf | grep -oP '(?<=description\=)[^)]*' | uniq
TD_FI
TD_F6
TRI_142
14_142
REX5
1950

Explanation: 说明:

A positive lookbehind is used in grep to print all the characters which was just after to the description= upto the next ) bracket. 在grep中使用正向lookbehind打印所有字符,这些字符位于description=到下一个)括号之后。 uniq command is used to remove the duplicates. uniq命令用于删除重复项。

Try this: 尝试这个:

grep '^nss_base_passwd' /etc/ldap.conf |
grep -oE '[(]description=[^)]*' | sort -u |
cut -f2- -d=

Explanations: 说明:

  1. With bash , if you end a line with | 使用bash ,如果你用|结束一行 (or || or && ), the shell knows that the command continues on the next line, so you don't need to use \\ . (或||&& ),shell知道命令在下一行继续,因此您不需要使用\\

  2. The second grep uses the -o flag to indicate that the matching expressions should be printed out, one per line. 第二个grep使用-o标志来指示应该打印匹配的表达式,每行一个。 It also uses the -E flag to indicate that the pattern is an "Extended" (ie normal) regular expression. 它还使用-E标志来指示模式是“扩展”(即正常)正则表达式。

  3. Since -o will print the entire match, we need to extract the part after the prefix, for which we use cut , specifying a delimiter of = . 由于-o将打印整个匹配,我们需要在前缀之后提取部分,我们使用cut ,指定=的分隔符。 -f2- means "all the fields starting with the second field", which we need in case there is an = in the description. -f2-表示“以第二个字段开头的所有字段”,如果描述中有= ,我们需要这些字段。

perl -nE 'say join(",", /description=\K([^)]+)/g) if /^nss_base_passwd/' /etc/ldap.conf
TD_FI,TD_F6,TD_F6,TRI_142,14_142,REX5,REX5,1950

Avinash's answer was very close. 阿维纳什的答案非常接近。 Here is my improved version: 这是我的改进版本:

grep '^nss_base_passwd' /etc/ldap.conf | grep -Po '\(description=\K[^)]+' | sort -u

There is no need to use lookaround syntax when you can simply use \\K (which is actually a shortcut for a corresponding zero-width assertion). 当您可以简单地使用\\K (实际上是相应的零宽度断言的快捷方式)时,无需使用环视语法。

Also, you said that you want NO duplicates, but uniq will only remove duplicate adjacent lines, it will not remove duplicates if there is something in between. 另外,你说你想要没有重复,但uniq只会删除重复的相邻行,如果中间存在某些内容,它将不会删除重复行。 That's why I am using sort -u instead. 这就是我使用sort -u的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM