简体   繁体   English

如何使用grep / sed提取子字符串和数字

[英]how to extract substring and numbers only using grep/sed

I have a text file containing both text and numbers, I want to use grep to extract only the numbers I need for example, given a file as follow: 我有一个包含文本和数字的文本文件,我想使用grep只提取我需要的数字,例如,给定一个文件如下:

miss rate 0.21  
ipc 222  
stalls n shdmem 112

So say I only want to extract the data for miss rate which is 0.21 . 所以说我只想提取miss rate 0.21的数据。 How do I do it with grep or sed? 我如何用grep或sed做到这一点? Plus, I need more than one number, not only the one after miss rate . 另外,我需要多个号码,而不仅仅是未miss rate之后的号码。 That is, I may want to get both 0.21 and 112 . 也就是说,我可能想要得到0.21112 A sample output might look like this: 示例输出可能如下所示:

0.21 222 112

Cause I need the data for later plot. 因为我需要以后绘图的数据。

Using the special look around regex trick \\K with engine with : 使用特殊的外观围绕正则表达式技巧\\ K引擎与

grep -oP 'miss rate \K.*' file.txt

or with : 或者使用

perl -lne 'print $& if /miss rate \K.*/' file.txt

The grep -and- cut solution would look like: grep -and- cut解决方案看起来像:

to get the 3rd field for every successful grep use: 为每个成功的grep使用获取第3个字段:

grep "^miss rate " yourfile | cut -d ' ' -f 3

or to get the 3rd field and the rest use: 或获得第3场和其他使用:

grep "^miss rate " yourfile | cut -d ' ' -f 3-

Or if you use bash and "miss rate" only occurs once in your file you can also just do: 或者如果你使用bash并且“miss miss”只在你的文件中出现一次,你也可以这样做:

a=( $(grep -m 1 "miss rate" yourfile) )
echo ${a[2]}

where ${a[2]} is your result. 其中${a[2]}是你的结果。

If "miss rate" occurs more then once you can loop over the grep output reading only what you need. 如果“未命中率”发生得更多,那么一旦你可以遍历grep输出只读取你需要的东西。 (in bash) (在bash中)

Use awk instead: 使用awk代替:

awk '/^miss rate/ { print $3 }' yourfile

To do it with just grep, you need non-standard extensions like here with GNU grep using PCRE (-P) with positive lookbehind (?<=..) and match only (-o): 要使用grep来完成它,你需要非标准扩展,比如这里使用GNRE grep使用PCRE(-P)和正向lookbehind(?<= ..)并且只匹配(-o):

grep -Po '(?<=miss rate ).*' yourfile

If you really want to use only grep for this, then you can try: 如果您真的只想使用grep,那么您可以尝试:

grep "miss rate" file | grep -oe '\([0-9.]*\)'

It will first find the line that matches, and then only output the digits. 它将首先找到匹配的行,然后只输出数字。

Sed might be a bit more readable, though: 但是,Sed可能更具可读性:

sed -n 's#miss rate ##p' file

You can use: 您可以使用:

grep -P "miss rate \d+(\.\d+)?" file.txt

or: 要么:

grep -E "miss rate [0-9]+(\.[0-9]+)?"

Both of those commands will print out miss rate 0.21 . 这两个命令都会打印出未miss rate 0.21 If you want to extract the number only, why not use Perl, Sed or Awk? 如果您只想提取数字,为什么不使用Perl,Sed或Awk?

If you really want to avoid those, maybe this will work? 如果你真的想避免这些,也许这会有效吗?

grep -E "miss rate [0-9]+(\.[0-9]+)?" g | xargs basename | tail -n 1

I believe 我相信

sed 's|[^0-9]*\\([0-9\\.]*\\)|\\1 |g' fiilename

will do the trick. 会做的。 However every entry will be on it's own line if that is ok. 但是,如果可以的话,每个条目都将在它自己的行上。 I am sure there is a way for sed to produce a comma or space delimited list but I am not a super master of all things sed. 我确信有一种方法可以让sed生成逗号或空格分隔列表,但我不是所有sed的超级大师。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM