简体   繁体   English

带grep -F的通配符

[英]Wildcard symbol with grep -F

I have the following file 我有以下文件

0 0
0 0.001
0 0.032
0 0.1241
0 0.2241
0 0.42
0.0142 0
0.0234 0
0.01429 0.01282
0.001 0.224
0.098 0.367
0.129 0
0.123 0.01282
0.149 0.16
0.1345 0.216
0.293 0
0.2439 0.01316
0.2549 0.1316
0.2354 0.5
0.3345 0
0.3456 0.0116
0.3462 0.316
0.3632 0.416
0.429 0
0.42439 0.016
0.4234 0.3
0.5 0
0.5 0.33
0.5 0.5

Notice that the two columns are sorted ascending, first by the first column and then by the second one. 请注意,这两列按升序排列,首先是第一列,然后是第二列。 The minimum value is 0 and the maximum is 0.5. 最小值为0,最大值为0.5。

I would like to count the number of lines that are: 我想算一下行数:

0 0

and store that number in a file called "0_0". 并将该号码存储在名为“ 0_0”的文件中。 In this case, this file should contain "1". 在这种情况下,该文件应包含“ 1”。

Then, the same for those that are: 然后,对于那些是相同的:

0 0.0*

For example, 例如,

0 0.032

And call it "0_0.0" (it should contain "2"), and this for all combinations only considering the first decimal digit (0 0.1*, 0 0.2* ... 0.0* 0, 0.0* 0.0* ... 0.5 0.5). 并将其称为“ 0_0.0”(应包含“ 2”),并且对于所有组合,仅考虑第一个十进制数字(0 0.1 *,0 0.2 * ... 0.0 * 0、0.0 * 0.0 * ... 0.5 0.5)。

I am using this loop: 我正在使用此循环:

for i in 0 0.0 0.1 0.2 0.3 0.4 0.5
do
    for j in 0 0.0 0.1 0.2 0.3 0.4 0.5
    do
        grep -F ""$i" "$j"" file | wc -l > "$i"_"$j"
    done
done

rm 0_0 #this 0_0 output is badly done, the good way is with the next command, which accepts \n
pcregrep -M "0 0\n" file | wc -l > 0_0

The problem is that for example, line 问题是,例如,线

0.0142 0

will not be recognized by the iteration "0.0 0", since there are digits after the "0.0". 将不会被迭代“ 0.0 0”识别,因为在“ 0.0”之后有数字。 Removing the -F option in grep in order to consider all numbers that start by "0.0" will not work, since the point will be considered a wildcard symbol and therefore for example in the iteration "0.1 0" the line 删除grep中的-F选项以考虑所有以“ 0.0”开头的数字将不起作用,因为该点将被视为通配符,因此例如在迭代“ 0.1 0”中,该行

 0.0142 0

will be counted, because 0.0142 is a 0"anything"1. 将被计数,因为0.0142是0“任何” 1。

I hope I am making myself clear! 我希望我能使自己清楚!

Is there any way to include a wildcard symbol with grep -F, like in: 有什么办法可以在grep -F中包含通配符,例如:

for i in 0 0.0 0.1 0.2 0.3 0.4 0.5
do
    for j in 0 0.0 0.1 0.2 0.3 0.4 0.5
    do
        grep -F ""$i"* "$j"*" file | wc -l > "$i"_"$j"
    done
done

(Please notice the asterisks after the variables in the grep command). (请注意grep命令中变量后面的星号)。

Thank you! 谢谢!

Don't use shell loops just to manipulate text, that's what the guys who invented shell also invented awk to do. 不要仅仅使用shell循环来操纵文本,这就是发明shell的人也发明了awk来做的。 See why-is-using-a-shell-loop-to-process-text-considered-bad-practice . 请参阅为什么使用shell循环处理文本被认为是不好的做法

It sounds like all you need is: 听起来您需要做的只是:

awk '{cnt[substr($1,1,3)"_"substr($2,1,3)]++} END{ for (pair in cnt) {print cnt[pair] > pair; close(pair)} }' file

That will be vastly more efficient than your nested shell loops approach. 这将比嵌套的shell循环方法效率更高。

Here's what it'll be outputting to the files it creates: 这是将输出到它创建的文件中的内容:

$ awk '{cnt[substr($1,1,3)"_"substr($2,1,3)]++} END{for (pair in cnt) print pair "\t" cnt[pair]}' file
0.0_0.3 1
0_0.4   1
0.5_0   1
0.2_0.5 1
0.4_0.3 1
0.0_0   2
0.1_0.0 1
0.3_0   1
0.1_0.1 1
0.1_0.2 1
0.3_0.0 1
0_0     1
0.1_0   1
0.5_0.3 1
0.4_0   1
0.3_0.3 1
0.2_0.0 1
0_0.0   2
0.5_0.5 1
0.3_0.4 1
0.2_0.1 1
0.0_0.0 1
0_0.1   1
0_0.2   1
0.4_0.0 1
0.2_0   1
0.0_0.2 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM