繁体   English   中英

awk 输出,第一列有空格

[英]awk output with spaces in first column

我尝试使用 awk 拆分列来打印一个句子,但第一列有空格。

我的初学者代码示例:

$ awk '/Linux/ { print "The filename","\""$1"\"","is located in",$2 }' test.txt
The filename "The" is located in test
The filename "Some" is located in file
The filename "File" is located in name
The filename "Something_here" is located in /ABC
The filename "Another_test" is located in /DEFG
The filename "Label" is located in test

来自文件:test.txt

Filename                               Folder         Type
-------------------------------------- -------------- ------
The test file                          /test/folder   Linux
Some file                              /              Linux
File name                              /Temp          Linux
Something_here                         /ABC           Linux
Another_test                           /DEFG          Linux
Label test                             /HIJK          Linux 

我想要实现的目标:(包括引号)

The filename "Default file" is located in / 
The filename "The test file" is located in /test/folder

问题是当我使用“空格”或“/”作为分隔符时,我无法在打印时获得整行

我建议的sed基于正则表达式和反向引用加上一个取代的grep命令以消除源文件的标题行:

$ cat test.txt | grep -E 'Linux[ ]*$' | sed -E 's%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%'
The filename "The test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK

正则表达式 ( regex ) 的一个很好的参考是在Linux 手册中

评论中要求的详细描述:

  • 带有-E选项的grep接受扩展的正则表达式(上面的参考文档)。 此处用于过滤包含“Linux”字样的行,每行末尾后跟一些空格(如果有)
  • grep的输出进入sed的输入
  • sedgrep一样通过-E选项来接受扩展的regex s命令将匹配正则表达式的字符(% chars = "(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$")之间的第一部分替换为其他字符(第二个% 字符之间的部分 = "文件名 "\\1\\2" 位于 \\4")。
  • 第二部分使用反向引用: '\\' 后跟一个非零十进制数字n表达式的第n个括号子表达式替换。 这里,\\1 被匹配第一个 "(.+)" 的字符串替换,这是这里的文件名,\\2 被下面的 "([^ ])" 替换,它是文件名的最后一个字符(技巧到禁止名称中的以下空格)...

这不是一个严格的解释,但至少它提供了一些输入以走得更远。

另一种解决方案是在sed命令行上传递多个操作。 因此,您可以添加一个查询来删除前 2 个标题行,以使用catgrep抑制管道。 这里的“1,2d',意思是“删除第 1 行和第 2 行”:

$ sed -E '1,2d;s%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%' test.txt
The filename "The  test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK

注意:根据手册-E选项切换到使用扩展正则表达式。 GNU sed多年来一直支持它,现在已包含在 POSIX 中。 在较旧的系统上,如果不支持-E ,则可以使用-r

$ sed -r '1,2d;s%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%' test.txt
The filename "The  test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK

GNU awk 具有正则表达式字段分隔符,因此只需要多个空格分隔您的列。

awk '/Linux/ { print "The file \""$1"\" is in "$2"." }' FS="   *" test.txt

它还提供固定宽度的字段,比如info gawk fieldwidths ,您可以使用虚线的长度来动态设置它们。

如果你有 GNU AWK,这应该可以解决问题:

awk 'match($0, /([^\/]+)([^ ]+) *Linux/, arr) { sub(/ +$/, "", arr[1]); printf("The filename \"%s\" is located in %s\n", arr[1], arr[2]) }' test.txt

解释:


# match and store groups in 'arr'
#  - arr[1]: everything up until the first slash (including a lot of whitespace)
#  - arr[2]: first slash until space
#  - rest: also ensure there's 'Linux' after that
match($0, /([^\/]+)([^ ]+) *Linux/, arr) {

  # trim whitespace from the right hand side of the filename
  sub(/ +$/, "", arr[1]);

  # print
  printf("The filename \"%s\" is located in %s\n", arr[1], arr[2])
}

请注意,在其他版本的 AWK 中还有一个功能较弱的match版本,可以用这些版本实现相同的功能,但您必须编写更多代码。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM