简体   繁体   English

得到文本正文中的最后一个单词

[英]get the last word in body of text

Given a body of text than can span a varying number of lines, I need to use a , or solution to search through many files for the same pattern and get the last word in the body. 给定一段文本可以跨越不同数量的行,我需要使用解决方案来搜索相同模式的许多文件并获取正文中的最后一个单词。

A file can include formats such as these where the word I want can be named anything 文件可以包含这样的格式,其中我想要的单词可以被命名为任何名称

call function1(input1,  
               input2,    #comment  
               input3)    #comment  
               returning randomname1,    
             randomname2,  
                 success3

call function1(input1,
               input2,    
               input3)    
               returning randomname3, 
randomname2, 
randomname3


call function1(input1,
               input2,    
               input3)   
               returning anothername3, 
randomname2, anothername3

I need to print out results as 我需要打印出结果

success3 success3
randomname3 randomname3
anothername3 anothername3

Also I need some the filename and line information about each . 另外,我需要一些关于每个的文件名和行信息。

I've tried 我试过了

pcregrep -M 'function1.*(\s*.*){6}(\w+)$' filename.txt

which is too greedy and I still need to print out just the specific grouped value and not the whole pattern. 这太贪婪了,我仍然需要打印出特定的分组值,而不是整个模式。 The words function1 and returning in my sample code will always be named as this and can be hard coded within my expression. function1和我的示例代码中返回的单词将始终以此命名,并且可以在我的表达式中进行硬编码。

Last word of code blocks 最后一个代码块

Split file in blocks using 's record separator RS . 使用的记录分隔符RS在块中拆分文件。 A record will be defined as a block of text, records are separated by double newlines. 记录将被定义为文本块,记录由双换行符分隔。

A record consists of fields, each two consecutive fields are separated by white space or a single newline. 记录由字段组成,每两个连续字段由空格或单个换行符分隔。

Now all we have to do is print the last field for each record, resulting in following code: 现在我们要做的就是打印每条记录的最后一个字段,产生以下代码:

awk 'BEGIN{ FS="[\n\t ]"; RS="\n\n"} { print $NF }' file

Explanation: 说明:

  • FS this is the field separator and is set to either a newline, a tab or a space: [\\n\\t ] . FS这是字段分隔符,设置为换行符,制表符或空格: [\\n\\t ]
  • RS this is the record separator and is set to a doulbe newline: \\n\\n RS这是记录分隔符,设置为doulbe换行符: \\n\\n
  • print $NF this will print the field $ with index NF , which is a variable containing the number of fields . print $NF这将打印带有索引NF的字段$ ,这是一个包含字段数的变量。 Hence this prints the last field. 因此,这将打印最后一个字段。

Note: To capture all paragraphs the file should end in double newline, this can easily be achieved by pre processing the file using: $ echo -e '\\n\\n' >> file . 注意:要捕获文件应以双换行结束的所有段落,可以通过使用以下方式预处理文件来轻松实现: $ echo -e '\\n\\n' >> file

Alternate solution based on comments 基于评论的替代解决方案

A more elegant ans simple solution is as follows: 更优雅的简单解决方案如下:

awk -v RS='' '{ print $NF }' file

How about the following awk solution: 以下awk解决方案如何:

awk 'NF == 0 {if(last) print last; last=""} NF > 0 {last=$NF} END {print last}' file

the $NF is getting the value of the last "word" where NF stands for number of fields. $NF获取最后一个“字”的值,其中NF代表字段数。 Then the last variable always stores the last word on a line and prints it if it encounters an empty line, representing the end of a paragraph. 然后, last变量总是将最后一个单词存储在一行上,如果遇到空行则打印它,表示段落的结尾。

New version with matches function1 condition. 匹配function1条件的新版本。

awk 'NF == 0 {if(last && hasF) print last; last=hasF=""}
  NF > 0 {last=$NF; if(/function1/)hasF=1}
  END {if(hasF) print last}' filename.txt

This will produce the output you show from the input file you posted: 这将生成您从发布的输入文件中显示的输出:

$ awk -v RS= '{print $NF}' file
success3
randomname3
anothername3

If you want to print FILENAME and line number like you mention then this may be what you want: 如果你想像你提到的那样打印FILENAME和行号,那么这可能就是你想要的:

$ cat tst.awk
NF { nr=NR; last=$NF; next }
{ prt() }
END { prt() }
function prt() { if (nr) print FILENAME, nr, last; nr=0 }

$ awk -f tst.awk file
file 6 success3
file 13 randomname3
file 20 anothername3

If that doesn't do what you want, edit your question to provide clearer, more truly representative and accurate sample input and expected output. 如果这不符合您的要求,请编辑您的问题,以提供更清晰,更真实的代表性和准确的样本输入和预期输出。

This is the perl version of Shellfish's awk solution (plus the keywords): 这是Shellfish的awk解决方案的perl版本(加上关键字):

perl -00 -nE '/function1/ and /returning/ and say ((split)[-1])' file

or, with one regex: 或者,有一个正则表达式:

perl -00 -nE '/^(?=.*function1)(?=.*returning).*?(\S+)\s*$/s and say $1' file

But the key is the -00 option which reads the file a paragraph at a time. 但关键是-00选项,它一次读取一个段落的文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM