简体   繁体   English

如何在N行内grep多个字符串

[英]How to grep multiples strings within N lines

I was wondering if there is anyway that I could grep (or any other command) that will search multiple strings within N lines. 我想知道是否我可以grep(或任何其他命令)在N行内搜索多个字符串。

Example

Search for "orange", "lime", "banana" all within 3 lines 在3行中搜索“橙色”,“石灰”,“香蕉”

If the input file is 如果输入文件是

xxx
a lime
b orange
c banana
yyy
d lime
foo
e orange
f banana

I want to print the three lines starting with a, b, c. 我要打印以a,b,c开头的三行。 The lines with the searched strings can appear in any order. 带有搜索字符串的行可以按任何顺序出现。

I do not want to print the lines d, e, f, as there is a line in between, and so the three strings are not grouped together. 我不想打印d,e,f行,因为它们之间有一行,因此这三个字符串没有组合在一起。

Your question is rather unclear. 您的问题还不清楚。 Here is a simple Awk script which collects consecutive matching lines and prints iff the array is longer than three elements. 这是一个简单的Awk脚本,该脚本收集连续的匹配行并在数组长于三个元素时进行打印。

awk '/orange|lime|banana/ { a[++n] = $0; next }
    { if (n>=3) for (i=1; i<=n; i++) print a[i]; delete a; n=0 }
    END { if (n>=3) for (i=1; i<=n; i++) print a[i] }' file

It's not clear whether you require all of your expressions to match; 目前尚不清楚是否需要所有表达式都匹配。 this one doesn't attempt to. 这个没有尝试。 If you see three successive lines with orange , that's a match, and will be printed. 如果看到三个连续的orange行,则表示匹配,将被打印。

The logic should be straightforward. 逻辑应该很简单。 The array a collects matches, with n indexing into it. 数组a收集匹配项,其中n索引。 When we see a non-match, we check its length, and print if it's 3 or more, then start over with an empty array and index. 当我们看到不匹配项时,我们检查其长度,并打印是否为3或更大,然后从一个空数组和索引开始。 This is (clumsily) repeated at end of file as well, in case the file ends with a match. 如果文件以匹配结尾,也将在文件末尾重复(重复)此操作。

If you want to permit gap (so, if there are three successive lines where one matches "orange" and "banana", then one which doesn't match, then one which matches "lime", print those three lines? Your question is unclear) you could change to always keeping an array of the last three lines, though then you also need to specify how to deal with eg a sequence of five lines which matches by these rules. 如果要允许间隙(因此,如果连续三行中有一个匹配“橙色”和“香蕉”,那么不匹配的一行,然后匹配“石灰”的一行,则打印这三行?尚不清楚),您可以更改为始终保留最后三行的数组,尽管这时您还需要指定如何处理例如由这些规则匹配的五行序列。

Similar to tripleee's answer, I would also use awk for this purpose. 与Tripleee的答案类似,我也将awk用于此目的。 The main idea is to implement a simple state machine. 主要思想是实现一个简单的状态机。

Simple example 简单的例子

As a simple example, first try to find three consecutive lines of banana. 作为一个简单的示例,首先尝试查找连续三行香蕉。 Consider the pattern-action statement 考虑模式动作语句

/banana/ { bananas++ }

For every line matching the regex banana , it increases the variable bananas (in awk, all variables are initialised with 0). 对于与regex banana匹配的每一行,它都会增加变量bananas (在awk中,所有变量均以0初始化)。

Of course, you want bananas to be reset to 0 when there is non-matching line, so your search starts from the beginning: 当然,当有不匹配的行时,您希望将bananas重置为0,因此搜索从头开始:

/banana/ { bananas++; next }
{ bananas = 0 }

You can also test for values of variables in the pattern of actions. 您还可以测试操作模式中的变量值。 For example, if you want to print "Found" after three lines containing banana , extend the rule: 例如,如果要在包含banana三行之后打印“ Found”,请扩展​​规则:

/banana/ {
    bananas++
    if (bananas >= 3) {
        print "Found"
        bananas = 0
    }
    next
}

This resets the variable bananas to 0, and prints the string "Found". 这会将变量bananas重置为0,并打印字符串“ Found”。

How to proceed further 如何进一步进行

Using this basic idea, you should be able to write your own awk script that handles all the cases. 使用这个基本思想,您应该能够编写自己的处理所有情况的awk脚本。 First, you should familiarise yourself with awk (pattern, actions, program execution). 首先,您应该熟悉awk(模式,操作,程序执行)。

Then, extend and adapt my example to fit your needs. 然后,扩展并调整我的示例以适合您的需求。

  • In particular, you probably need an associative array matched , with indices "banana", "orange", "lime". 特别是,您可能需要一个matched数组,其索引为“香蕉”,“橙色”,“石灰”。
  • You set matched["banana"] = $0 when the current line matches /banana/ . 当前行匹配/banana/时,您可以设置matched["banana"] = $0 This saves the current line for later output. 这将保存当前行以供以后输出。
  • You clear that whole array when the current line does not match any of your expressions. 当当前行与您的任何表达式都不匹配时,请清除整个数组。
  • When all strings are found ( matched[s] is not empty for every string s ), you can print the contents of matched[s] . 找到所有字符串后(每个字符串s matched[s]都不为空),您可以打印matched[s]的内容。

I leave the actual implementation to you. 我将实际的实现留给您。 As others have said, your description leaves many corner-cases unclear. 正如其他人所说,您的描述使许多极端情况不清楚。 You should figure them out for yourself and adapt your implementation accordingly. 您应该自己弄清楚它们,并相应地调整实现。

I think you want this: 我想你想要这个:

awk '
  /banana/ {banana=3}
  /lime/   {lime=3}
  /orange/ {orange=3}
 (orange>0)&&(lime>0)&&(banana>0){print l2,l1,$0}
 {orange--;lime--;banana--;l2=l1;l1=$0}' OFS='\n' yourFile

So, if you see the word banana you set banana=3 so it is valid for the next 3 lines. 因此,如果看到banana一词,则将banana=3设置为对接下来的3行有效。 Likewise, if you see lime , give it 3 lines of chances to make a group, and similarly for orange . 同样,如果您看到lime ,给它3行机会以组成一个小组,同样地给orange

Now, if all of orange , lime and banana have been seen in the previous three lines, print the second to last line ( l2 ), the last line ( l1 ) and the current line $0 . 现在,如果在前三行中都看到了orangelimebanana ,则打印倒数第二行( l2 ),最后一行( l1 )和当前行$0

Now decrement the counts for each fruit before we move to the next line, and save the current line and shuffle backwards in time order the previous 2 lines. 现在减少每个水果的计数,然后再移至下一行,并保存当前行并按时间顺序将前2行按顺序向后移。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM