简体   繁体   English

使用 sed 从文件中的匹配行中提取几个匹配的字符串

[英]Extract few matching strings from matching lines in file using sed

I have a file with strings similar to this:我有一个类似这样的字符串的文件:

abcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'

I have to find current_count and total_count for each line of file.我必须为每一行文件找到 current_count 和 total_count 。 I am trying below command but its not working.我正在尝试下面的命令,但它不起作用。 Please help.请帮忙。

grep current_count file | sed "s/.*\('current_count': u'\d+'\).*/\1/"

It is outputting the whole line but I want something like this:它正在输出整行,但我想要这样的东西:

'current_count': u'3', 'total_count': u'3'

It's printing the whole line because the pattern in the s command doesn't match, so no substitution happens.它正在打印整行,因为s命令中的模式不匹配,因此不会发生替换。

sed regexes don't support \\d for digits, or x+ for xx* . sed表达式不支持\\d代表数字,或x+代表xx* GNU sed has a -r option to enable extended-regex support so + will be a meta-character, but \\d still doesn't work. GNU sed 有一个-r选项来启用扩展正则表达式支持,因此+将是一个元字符,但\\d仍然不起作用。 GNU sed also allows \\+ as a meta-character in basic regex mode, but that's not POSIX standard. GNU sed 还允许\\+作为基本正则表达式模式中的元字符,但这不是 POSIX 标准。

So anyway, this will work:所以无论如何,这将起作用:

echo -e "foo\nabcd u'current_count': u'2', u'total_count': u'3', u'order_id': u'90'" |
sed -nr "s/.*('current_count': u'[0-9]+').*/\1/p"
# output:  'current_count': u'2'

Notice that I skip the grep by using sed -ns///p .请注意,我使用sed -ns///p跳过了 grep。 I could also have used /current_count/ as an address:我也可以使用/current_count/作为地址:

sed  -r -e '/current_count/!d' -e "s/.*('current_count': u'[0-9]+').*/\1/"

Or with just grep printing only the matching part of the pattern, instead of the whole line:或者只用 grep 打印模式的匹配部分,而不是整行:

grep -E -o "'current_count': u'[[:digit:]]+'

(or egrep instead of grep -E). (或 egrep 而不是 grep -E)。 I forget if grep -o is POSIX-required behaviour.我忘记了grep -o是否是 POSIX 要求的行为。

For me this looks like some sort of serialized Python data.对我来说,这看起来像是某种序列化的 Python 数据。 Basically I would try to find out the origin of that data and parse it properly.基本上我会尝试找出该数据的来源并正确解析它。

However, while being hackish, sed can also being used here:然而,虽然是 hackish,但sed也可以在这里使用:

sed "s/.*current_count': [a-z]'\([0-9]\+\).*/\1/" input.txt
sed "s/.*total_count': [a-z]'\([0-9]\+\).*/\1/" input.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM