简体   繁体   English

Grep大文件中不同元素的最后一次出现

[英]Grep the last occurence of different elements in a big file

I have a file where different elements are repeated on several lines. 我有一个文件,其中不同的元素在几行上重复出现。 My file contains lines like this: 我的文件包含以下行:

1  $element_(1)
10 $element_(2)
20 $element_(1)
30 $element_(3)
40 $element_(1)
50 $element_(2)
60 $element_(3)
70 $element_(1)

I want to get the last occurrence of each of these elements and put them in a file resultfile . 我想获取每个元素的最后一次出现并将其放在文件resultfile

50 $element_(2)
60 $element_(3)
70 $element_(1)

I tried 我试过了

for  i in {1..8000} do 
     grep $element_\($i\) sourcefile | tail -1 >> resultfile 
done

But it is giving me errors. 但这给了我错误。 Besides, how to make distinction between $ as part of the string name and $ to increment the number of the element I am searching for? 此外,如何区分$作为字符串名称的一部分和$以增加我要搜索的元素的数量?

Also I don't know exactly how many elements I am going to have in the file so I took 8000 as a max value, but it can be less or more. 另外我也不知道文件中将要包含多少个元素,因此我将8000作为最大值,但是它可以更少或更多。

Output sorted by element index 输出按元素索引排序

You can tell grep to stop after finding the first match ( -m 1 ), and to make this match the last in your file, you can pipe the file in reverse to grep: 您可以告诉grep在找到第一个匹配项( -m 1 )之后停止,并使该匹配项成为文件中的最后一个匹配项,您可以将文件反向传递给grep:

for i in {1..8000}; do
    tac sourcefile | grep -m 1 "\$element_($i)"
done > resultfile

I've also moved the output redirection outside the loop, and fixed the quoting in your pattern: I quote the whole pattern; 我还将输出重定向移到了循环之外,并修复了您的模式中的引用:我引用了整个模式; the first $ has to be escaped so the shell doesn't try to expand a variable $element_ , and the parentheses must not be escaped or grep thinks it's a capture group. 必须对第一个$进行转义,以便shell不会尝试扩展变量$element_ ,并且不能对括号进行转义或grep认为这是一个捕获组。 In your try, you correctly escaped them, but this is avoided here by quoting the whole pattern. 在尝试中,您正确地使它们转义了,但是在这里通过引用整个模式可以避免这种情况。

It's usually easier to single quote the pattern so we don't have to care about shell expansion, but in this case, we want $i to actually expand. 单引号通常更容易,因此我们不必关心shell扩展,但是在这种情况下,我们希望$i实际扩展。

Your try had a syntax error in that the ; 您的尝试在语法上存在错误; was missing after the braces. 大括号后不见了。

Output sorted by order of appearance in input file 输出按输入文件中出现的顺序排序

If the lines have to be in the same order as in the input file, we can prepend line numbers ( nl ) and sort by them in the end ( sort -n ) before removing them again with cut : 如果行的顺序必须与输入文件中的顺序相同,我们可以在行号( nl )之前添加行号并在末尾按顺序对其进行sort -nsort -n ),然后再使用cut再次删除它们:

for i in {1..8000}; do
    nl sourcefile | tac | grep -m 1 "\$element_($i)"
done | sort -n | cut -f 2 > resultfile

Stop after first unsuccessful search 第一次搜索失败后停止

If we know that the element indices are contiguous and we can stop as soon as we don't find an element, we can tweak the loop as follows (still assuming we want to keep elements in order of appearance in the input file): 如果我们知道元素索引是连续的并且可以在找不到元素后立即停止,则可以按如下方式调整循环(仍然假设我们要按输入文件中出现的顺序保留元素):

i=0
while true; do
    ((++i))
    nl sourcefile | tac | grep -m 1 "\$element_($i)" || break
done | sort -n | cut -f 2 > resultfile

This uses an increasing counter instead of a predetermined sequence. 这使用递增计数器而不是预定序列。 If the exit status of the pipe is non-zero, ie, grep couldn't find the element, we exit the loop. 如果管道的退出状态为非零,即grep找不到元素,则退出循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM