如何使用grep / sed提取子字符串和数字

Question

I have a text file containing both text and numbers, I want to use grep to extract only the numbers I need for example, given a file as follow: 我有一个包含文本和数字的文本文件，我想使用grep只提取我需要的数字，例如，给定一个文件如下：

miss rate 0.21  
ipc 222  
stalls n shdmem 112

So say I only want to extract the data for miss rate which is 0.21 . 所以说我只想提取miss rate 0.21的数据。 How do I do it with grep or sed? 我如何用grep或sed做到这一点？ Plus, I need more than one number, not only the one after miss rate . 另外，我需要多个号码，而不仅仅是未miss rate之后的号码。 That is, I may want to get both 0.21 and 112 . 也就是说，我可能想要得到0.21和112 。 A sample output might look like this: 示例输出可能如下所示：

0.21 222 112

Cause I need the data for later plot. 因为我需要以后绘图的数据。

Answer 1

Using the special look around regex trick \\K with pcre engine with grep : 使用特殊的外观围绕正则表达式技巧\\ K与pcre引擎与grep ：

grep -oP 'miss rate \K.*' file.txt

or with perl : 或者使用perl ：

perl -lne 'print $& if /miss rate \K.*/' file.txt

Answer 2

The grep -and- cut solution would look like: grep -and- cut解决方案看起来像：

to get the 3rd field for every successful grep use: 为每个成功的grep使用获取第3个字段：

grep "^miss rate " yourfile | cut -d ' ' -f 3

or to get the 3rd field and the rest use: 或获得第3场和其他使用：

grep "^miss rate " yourfile | cut -d ' ' -f 3-

Or if you use bash and "miss rate" only occurs once in your file you can also just do: 或者如果你使用bash并且“miss miss”只在你的文件中出现一次，你也可以这样做：

a=( $(grep -m 1 "miss rate" yourfile) )
echo ${a[2]}

where ${a[2]} is your result. 其中${a[2]}是你的结果。

If "miss rate" occurs more then once you can loop over the grep output reading only what you need. 如果“未命中率”发生得更多，那么一旦你可以遍历grep输出只读取你需要的东西。 (in bash) （在bash中）

Answer 3

Use awk instead: 使用awk代替：

awk '/^miss rate/ { print $3 }' yourfile

To do it with just grep, you need non-standard extensions like here with GNU grep using PCRE (-P) with positive lookbehind (?<=..) and match only (-o): 要使用grep来完成它，你需要非标准扩展，比如这里使用GNRE grep使用PCRE（-P）和正向lookbehind（？<= ..）并且只匹配（-o）：

grep -Po '(?<=miss rate ).*' yourfile

Answer 4

If you really want to use only grep for this, then you can try: 如果您真的只想使用grep，那么您可以尝试：

grep "miss rate" file | grep -oe '\([0-9.]*\)'

It will first find the line that matches, and then only output the digits. 它将首先找到匹配的行，然后只输出数字。

Sed might be a bit more readable, though: 但是，Sed可能更具可读性：

sed -n 's#miss rate ##p' file

Answer 5

You can use: 您可以使用：

grep -P "miss rate \d+(\.\d+)?" file.txt

or: 要么：

grep -E "miss rate [0-9]+(\.[0-9]+)?"

Both of those commands will print out miss rate 0.21 . 这两个命令都会打印出未miss rate 0.21 。 If you want to extract the number only, why not use Perl, Sed or Awk? 如果您只想提取数字，为什么不使用Perl，Sed或Awk？

If you really want to avoid those, maybe this will work? 如果你真的想避免这些，也许这会有效吗？

grep -E "miss rate [0-9]+(\.[0-9]+)?" g | xargs basename | tail -n 1

Answer 6

I believe 我相信

sed 's|[^0-9]*\$[0-9\\.]*\$|\\1 |g' fiilename

will do the trick. 会做的。 However every entry will be on it's own line if that is ok. 但是，如果可以的话，每个条目都将在它自己的行上。 I am sure there is a way for sed to produce a comma or space delimited list but I am not a super master of all things sed. 我确信有一种方法可以让sed生成逗号或空格分隔列表，但我不是所有sed的超级大师。

如何使用grep / sed提取子字符串和数字

问题描述

6 个解决方案

解决方案1
4 2013-03-12 21:03:59

解决方案2
4 2013-03-12 22:05:17

解决方案3
3 已采纳 2013-03-12 20:35:50

解决方案4
3 2013-03-12 20:43:21

解决方案5
0 2013-03-12 20:36:11

解决方案6
0 2013-03-13 00:01:12

如何使用grep / sed提取子字符串和数字

问题描述

6 个解决方案

解决方案1 4 2013-03-12 21:03:59

解决方案2 4 2013-03-12 22:05:17

解决方案3 3 已采纳 2013-03-12 20:35:50

解决方案4 3 2013-03-12 20:43:21

解决方案5 0 2013-03-12 20:36:11

解决方案6 0 2013-03-13 00:01:12

解决方案1
4 2013-03-12 21:03:59

解决方案2
4 2013-03-12 22:05:17

解决方案3
3 已采纳 2013-03-12 20:35:50

解决方案4
3 2013-03-12 20:43:21

解决方案5
0 2013-03-12 20:36:11

解决方案6
0 2013-03-13 00:01:12