[英]how to extract substring and numbers only using grep/sed
I have a text file containing both text and numbers, I want to use grep to extract only the numbers I need for example, given a file as follow: 我有一个包含文本和数字的文本文件,我想使用grep只提取我需要的数字,例如,给定一个文件如下:
miss rate 0.21
ipc 222
stalls n shdmem 112
So say I only want to extract the data for miss rate
which is 0.21
. 所以说我只想提取
miss rate
0.21
的数据。 How do I do it with grep or sed? 我如何用grep或sed做到这一点? Plus, I need more than one number, not only the one after
miss rate
. 另外,我需要多个号码,而不仅仅是未
miss rate
之后的号码。 That is, I may want to get both 0.21
and 112
. 也就是说,我可能想要得到
0.21
和112
。 A sample output might look like this: 示例输出可能如下所示:
0.21 222 112
Cause I need the data for later plot. 因为我需要以后绘图的数据。
The grep
-and- cut
solution would look like: grep
-and- cut
解决方案看起来像:
to get the 3rd field for every successful grep use: 为每个成功的grep使用获取第3个字段:
grep "^miss rate " yourfile | cut -d ' ' -f 3
or to get the 3rd field and the rest use: 或获得第3场和其他使用:
grep "^miss rate " yourfile | cut -d ' ' -f 3-
Or if you use bash and "miss rate" only occurs once in your file you can also just do: 或者如果你使用bash并且“miss miss”只在你的文件中出现一次,你也可以这样做:
a=( $(grep -m 1 "miss rate" yourfile) )
echo ${a[2]}
where ${a[2]}
is your result. 其中
${a[2]}
是你的结果。
If "miss rate" occurs more then once you can loop over the grep output reading only what you need. 如果“未命中率”发生得更多,那么一旦你可以遍历grep输出只读取你需要的东西。 (in bash)
(在bash中)
Use awk
instead: 使用
awk
代替:
awk '/^miss rate/ { print $3 }' yourfile
To do it with just grep, you need non-standard extensions like here with GNU grep using PCRE (-P) with positive lookbehind (?<=..) and match only (-o): 要使用grep来完成它,你需要非标准扩展,比如这里使用GNRE grep使用PCRE(-P)和正向lookbehind(?<= ..)并且只匹配(-o):
grep -Po '(?<=miss rate ).*' yourfile
If you really want to use only grep for this, then you can try: 如果您真的只想使用grep,那么您可以尝试:
grep "miss rate" file | grep -oe '\([0-9.]*\)'
It will first find the line that matches, and then only output the digits. 它将首先找到匹配的行,然后只输出数字。
Sed might be a bit more readable, though: 但是,Sed可能更具可读性:
sed -n 's#miss rate ##p' file
You can use: 您可以使用:
grep -P "miss rate \d+(\.\d+)?" file.txt
or: 要么:
grep -E "miss rate [0-9]+(\.[0-9]+)?"
Both of those commands will print out miss rate 0.21
. 这两个命令都会打印出未
miss rate 0.21
。 If you want to extract the number only, why not use Perl, Sed or Awk? 如果您只想提取数字,为什么不使用Perl,Sed或Awk?
If you really want to avoid those, maybe this will work? 如果你真的想避免这些,也许这会有效吗?
grep -E "miss rate [0-9]+(\.[0-9]+)?" g | xargs basename | tail -n 1
I believe 我相信
sed 's|[^0-9]*\\([0-9\\.]*\\)|\\1 |g' fiilename
will do the trick. 会做的。 However every entry will be on it's own line if that is ok.
但是,如果可以的话,每个条目都将在它自己的行上。 I am sure there is a way for sed to produce a comma or space delimited list but I am not a super master of all things sed.
我确信有一种方法可以让sed生成逗号或空格分隔列表,但我不是所有sed的超级大师。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.