awk 输出，第一列有空格

Question

我尝试使用 awk 拆分列来打印一个句子，但第一列有空格。

我的初学者代码示例：

$ awk '/Linux/ { print "The filename","\""$1"\"","is located in",$2 }' test.txt
The filename "The" is located in test
The filename "Some" is located in file
The filename "File" is located in name
The filename "Something_here" is located in /ABC
The filename "Another_test" is located in /DEFG
The filename "Label" is located in test

来自文件：test.txt

Filename                               Folder         Type
-------------------------------------- -------------- ------
The test file                          /test/folder   Linux
Some file                              /              Linux
File name                              /Temp          Linux
Something_here                         /ABC           Linux
Another_test                           /DEFG          Linux
Label test                             /HIJK          Linux

我想要实现的目标：（包括引号）

The filename "Default file" is located in / 
The filename "The test file" is located in /test/folder

问题是当我使用“空格”或“/”作为分隔符时，我无法在打印时获得整行

Answer 1

我建议的sed基于正则表达式和反向引用加上一个取代的grep命令以消除源文件的标题行：

$ cat test.txt | grep -E 'Linux[ ]*$' | sed -E 's%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%'
The filename "The test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK

正则表达式 ( regex ) 的一个很好的参考是在Linux 手册中

评论中要求的详细描述：

带有-E选项的grep接受扩展的正则表达式（上面的参考文档）。 此处用于过滤包含“Linux”字样的行，每行末尾后跟一些空格（如果有）
grep的输出进入sed的输入
sed像grep一样通过-E选项来接受扩展的regex 。 s命令将匹配正则表达式的字符（% chars = "(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$"）之间的第一部分替换为其他字符（第二个% 字符之间的部分 = "文件名 "\\1\\2" 位于 \\4")。
第二部分使用反向引用： '\\' 后跟一个非零十进制数字n由正则表达式的第n个括号子表达式替换。 这里，\\1 被匹配第一个 "(.+)" 的字符串替换，这是这里的文件名，\\2 被下面的 "([^ ])" 替换，它是文件名的最后一个字符（技巧到禁止名称中的以下空格）...

这不是一个严格的解释，但至少它提供了一些输入以走得更远。

另一种解决方案是在sed命令行上传递多个操作。 因此，您可以添加一个查询来删除前 2 个标题行，以使用cat和grep抑制管道。 这里的“1,2d'，意思是“删除第 1 行和第 2 行”：

$ sed -E '1,2d;s%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%' test.txt
The filename "The  test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK

注意：根据手册， -E选项切换到使用扩展正则表达式。 GNU sed多年来一直支持它，现在已包含在 POSIX 中。 在较旧的系统上，如果不支持-E ，则可以使用-r ：

$ sed -r '1,2d;s%(.+)([^ ])([ ]+)(/.+)[ ]+Linux[ ]*$%The filename "\1\2" is located in \4%' test.txt
The filename "The  test file" is located in /test/folder  
The filename "Some file" is located in /             
The filename "File name" is located in /Temp         
The filename "Something_here" is located in /ABC          
The filename "Another_test" is located in /DEFG         
The filename "Label test" is located in /HIJK

Answer 2

GNU awk 具有正则表达式字段分隔符，因此只需要多个空格分隔您的列。

awk '/Linux/ { print "The file \""$1"\" is in "$2"." }' FS="   *" test.txt

它还提供固定宽度的字段，比如info gawk fieldwidths ，您可以使用虚线的长度来动态设置它们。

Answer 3

如果你有 GNU AWK，这应该可以解决问题：

awk 'match($0, /([^\/]+)([^ ]+) *Linux/, arr) { sub(/ +$/, "", arr[1]); printf("The filename \"%s\" is located in %s\n", arr[1], arr[2]) }' test.txt

解释：


# match and store groups in 'arr'
#  - arr[1]: everything up until the first slash (including a lot of whitespace)
#  - arr[2]: first slash until space
#  - rest: also ensure there's 'Linux' after that
match($0, /([^\/]+)([^ ]+) *Linux/, arr) {

  # trim whitespace from the right hand side of the filename
  sub(/ +$/, "", arr[1]);

  # print
  printf("The filename \"%s\" is located in %s\n", arr[1], arr[2])
}

请注意，在其他版本的 AWK 中还有一个功能较弱的match版本，可以用这些版本实现相同的功能，但您必须编写更多代码。

awk 输出，第一列有空格

问题描述

3 个解决方案

解决方案1
0 2020-10-22 10:38:45

解决方案2
0 2020-10-22 14:41:35

解决方案3
0 已采纳 2020-10-23 08:09:51

awk 输出，第一列有空格

问题描述

3 个解决方案

解决方案1 0 2020-10-22 10:38:45

解决方案2 0 2020-10-22 14:41:35

解决方案3 0 已采纳 2020-10-23 08:09:51

解决方案1
0 2020-10-22 10:38:45

解决方案2
0 2020-10-22 14:41:35

解决方案3
0 已采纳 2020-10-23 08:09:51