简体   繁体   English

grep / pcregrep / sed / awk最后一次匹配到文件末尾后的数据

[英]grep/pcregrep/sed/awk the data after the last match to the end of a file

I need to grab the content after the last match of ENTRY to the end of the file, and I can't seem to do it. 我需要在ENTRY的最后一个匹配到文件末尾之后抓取内容,我似乎无法做到。 It can be multiple lines and the data can include any character to the end of the file including (,\\n, ). 它可以是多行,数据可以包含文件末尾的任何字符,包括(,\\ n,)。

I've tried: 我试过了:

tail -1 file # doesn’t work due to it not consistently being one line
grep "^(.*"  # only grabs one line
pcregrep -M  '\n(.*' file # I think a variation of this is the solution, but I’ve had no luck so far.  

File that grows below: 文件增长如下:

TOP OF FILE                
%
ENTRY
(S®s
√6ûíπ‹ôTìßÅDPˆ¬k·Ù"=ÓxF)*†‰ú˚ÃQ´¿J‘\˜©ŒG»‡∫QÆ’<πsµ-ù±ñ∞NäAOilWçk
N+P}V<ôÒ∏≠µW*`Hß”;–GØ»14∏åR"ºã
FD‘mÍõ?*ÊÎÉC)(S®s
√6ûíπ‹ôTìßÅDPˆ¬k·Ù"=ÓxF)*†‰ú˚ÃQ´¿J‘\˜©ŒG»‡∫QÆ’<πsµ-ù±ñ∞NäAOilWçk
N+P}V<ôÒ∏≠µW*`Hß”;–GØ»14∏åR"ºã
FD‘mÍõ?*ÊÎÉC)eq  
{
DATA
}
ENTRY
(A® S\kÉflã1»Âbπ¯Ú∞⁄äπHZ@F◊§•Ã*‹¡‹…ÿPkJòÑíòú˛¶à˛¨¢v|u«Ùbó–Ö¶¢∂5ıÜ@¨•˘®@W´≥‡*`H∑”ı–Só¬<˙ìEçöf∞Gg±:œe™flflå)A®  S\kÉflã1»Âbπ¯Ú∞⁄äπHZ@F◊§•Ã*‹¡‹…ÿPkJòÑíòú˛¶à˛¨¢v|u«Ùbó–Ö¶¢∂5ıÜ@¨•˘®@W´≥‡*`H∑”ı–Só¬<˙ìEçöf∞Gg±:œe™flflå)eq  
{
DATA
}if
ENTRY
(ÌSYõ˛9°\K¬∞≈fl|”/í÷L
Ö˙h/ÜÇi"û£fi±€ÀNéÓ›bÏÿmâ[≈4J’XPü´Z
oÜlø∫…qìõ¢,ßü©cÓ{—˜e&ÚÀÓHÏÜ‚m(Œ∆⁄ˆQ˝òêpoÉÄÂ(S‘E ⁄ !ŸQ§ô6ÉH
$ awk '/^[(]/{s="";} {s=s"\n"$0;} END{print substr(s,2);}' file
(ÌSYõ˛9°\K¬∞≈fl|”/í÷L
Ö˙h/ÜÇi"û£fi±€ÀNéÓ›bÏÿmâ[≈4J’XPü´Z
oÜlø∫…qìõ¢,ßü©cÓ{—˜e&ÚÀÓHÏÜ‚m(Œ∆⁄ˆQ˝òêpoÉÄÂ(S‘E ⁄ !ŸQ§ô6ÉH

How it works 这个怎么运作

awk implicitly loops through files line-by-line. awk隐式循环遍历文件。 This script stores whatever we want to print in the variable s . 该脚本存储我们想要在变量s打印的任何内容。

  • /^[(]/{s="";}

    Every time that we find a line which starts with ( , we set s to an empty string. 每次我们找到一个以(s开头)为空字符串的行。

    The purpose of this is to remove everything before the last occurrence of a line starting with ( . 这样做的目的是在最后一次以( 。开头)出现的行之前删除所有内容。

  • s=s"\\n"$0

    We add the current line onto the end of s . 我们将当前行添加到s的末尾。

  • END{print substr(s,2);}

    After we reach the end of the file, we print s (omitting the first character which will be a surplus newline character). 在我们到达文件末尾之后,我们打印s (省略第一个字符,它将是一个多余的换行符)。

Interesting problem. 有趣的问题。 I think you can do it with just sed . 我认为你可以用sed做到这一点。 When you find a match, zero the hold space and add the match line to the hold space. 找到匹配项时,将保留空间归零并将匹配线添加到保留空间。 On the last line, print the hold space. 在最后一行,打印保留空间。

sed -n -e '/ENTRY/,$ { /ENTRY/ { h; n; }; H; $ { x; p; } }'

Don't print by default. 默认情况下不打印。 From the first entry to the end of the file: 从文件的第一个条目到结尾:

  • If it is an entry line; 如果是入境线; copy the new line over the hold space and move on. 将新线复制到保留空间并继续。
  • Otherwise append the line to the hold space. 否则将该行附加到保留空间。
  • If it is the last line, swap the hold space and pattern space, and print the pattern space (what was in the hold space). 如果它是最后一行,则交换保留空间和图案空间,然后打印图案空间(保留空间中的内容)。

You might worry about what happens if the last line in the file is an ENTRY line. 如果文件中的最后一行是ENTRY行,您可能会担心会发生什么。

Given a data file: 给定一个data文件:

TOP OF FILE
not wanted
ENTRY
could be wanted
ENTRY
but it wasn't
and this isn't
because
ENTRY
this is here
EOF

The output is: 输出是:

ENTRY
this is here
EOF

If you don't want ENTRY to appear, modify the script slightly: 如果您不想显示ENTRY ,请稍微修改脚本:

sed -n -e '/ENTRY/,$ { /ENTRY/ { s/.*//; h; n; }; H; $ { x; s/^\n//; p; } }'

Using tac you could do it: 使用tac你可以做到:

tac <file> | sed -e '/ENTRY/,$d' | tac

This will print the file with the lines reversed, then use sed to remove everything from what is now the first occurrence of ENTRY to the now end of the file, then reverse the lines again to get the original order. 这将打印带有反转行的文件,然后使用sed删除从现在第一次出现的ENTRY到文件现在结束的所有内容,然后再次反转这些行以获得原始顺序。

As Jonathan Leffler pointed out, a faster way to do this--though probably not much because tac will still have a lot to do and it has all the overhead of rquireing 3 processes instead of just one, but the sed could be done more efficiently, but just ending when we find the ENTRY line, instead of processing the rest of the file to remove the lines: 正如乔纳森·莱弗勒所指出的那样,一种更快的方法 - 虽然可能并不多,因为tac仍然有很多工作要做,并且它需要获得3个进程而不仅仅是一个进程的所有开销,但是sed可以更有效地完成,但只是在我们找到ENTRY行时结束,而不是处理文件的其余部分以删除行:

tac <file> | sed -e '/ENTRY/q' | tac

though his answer is often going to be better still. 虽然他的回答往往会更好。 That answer will include the ENTRY line. 答案将包括ENTRY系列。 If you don't want that you could also do 如果你不想要你也可以

tac <file> | sed -n '/ENTRY/q;p' | tac

to not print any ouptut by default, then quit as soon as you find the ENTRY line, but use the p command to print the lines until you get to that line. 默认情况下不打印任何ouptut,然后在找到ENTRY行后立即退出,但使用p命令打印行,直到到达该行。

This should work too (at least with gawk) 这也应该工作(至少与gawk)

awk -vRS="ENTRY" 'END{print $0}'

set the record separator as your pattern and print the last record. 将记录分隔符设置为您的模式并打印最后一条记录。

loadind文件在内存中

 sed -e 'H;$!d' -e 'x;s/.*ENTRY[[:blank:]]*\n//' YourFile

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM