简体   繁体   English

运行awk命令后删除最后一列的分隔符'\\ t'

[英]keep the delimiter '\t' after running awk command to remove the last column

I found the following command to remove the last column from a file 我找到以下命令从文件中删除最后一列

awk 'NF{NF-=1};1' <in >out

the command is copied from here. 命令是从这里复制的。 https://unix.stackexchange.com/questions/234432/how-to-delete-the-last-column-of-a-file-in-linux?newreg=b1ebf81f0ea5458eafc3370a6739b1a9 https://unix.stackexchange.com/questions/234432/how-to-delete-the-last-column-of-a-file-in-linux?newreg=b1ebf81f0ea5458eafc3370a6739b1a9

Here comes the problem. 这就是问题所在。 The file was originally separated by '\\t', after this command, the delimiter is no longer '\\t'. 该文件最初由'\\ t'分隔,在此命令之后,分隔符不再是'\\ t'。 Anyone knows the reason? 谁知道原因? and how to keep the delimiter? 以及如何保留分隔符?

You have to define the output separator: 您必须定义输出分隔符:

awk 'BEGIN{FS=OFS="\t"}NF{NF-=1};1' input > output

remark: redefining the variable NF is undefined behaviour by POSIX but it is allowed in GNU awk and a few other versions of awk. 注释:重新定义变量NF是POSIX未定义的行为,但在GNU awk和其他几个版本的awk中允许它。

The following will work very well with any awk: 以下内容适用于任何awk:

awk 'BEGIN{FS="\t"}{sub(FS "[^"FS"]*$","")}1' input > output

knows two concepts very well: 非常了解两个概念:

  • records : a file is split in records where each record is separated from another record by the record separator RS . 记录 :文件被分成记录,其中每个记录由记录分隔符RS与另一记录分开。 By default this is the <newline> character and thus records are lines. 默认情况下,这是<newline>字符,因此记录是行。
  • fields : a record is split in fields where each field is separated from another field by the field separator FS . fields :记录在字段中分割,其中每个字段由字段分隔符FS与另一个字段分隔。 By default, this is any sequence of blanks (spaces and tabs). 默认情况下,这是任何空白序列(空格和制表符)。

Obviously, if you can define how the input is build up by defining its record separtor RS and field separtor FS , you can also tell awk how the output is build up. 显然,如果您可以通过定义其记录separtor RS和字段separtor FS来定义输入的构建方式,您还可以告诉awk如何构建输出。 Hence, you can define the output record separtor ORS which is appended after each printed record when you use the print statement. 因此,您可以定义在使用print语句时在每个打印记录之后附加的输出记录separtor ORS And next to ORS you can define the output field separator OFS which tells awk how fields are split. ORS旁边,您可以定义输出字段分隔符OFS ,它告诉awk如何分割字段。 Each , -operator in the print statement is normally replaced by a OFS , eg: 每次,在打印语句-运算符通常由取代OFS ,如:

print field1, field2, field3

will print 将打印

field 1 OFS field2 OFS field3 ORS

The complete record $0 will also be redefined as a string with OFS when you change a field or remove some fields. 当您更改字段或删除某些字段时,完整记录$0也将重新定义为OFS的字符串。

Another solution might be to use rev and cut : 另一种解决方案可能是使用revcut

rev input | cut -f2- | rev > output
awk '{sub(/\t[^\t]*$/,"")}1' file

以上将适用于任何awk。

Here are a few alternate solutions which should hopefully give you something to choose from. 这里有一些替代解决方案,希望能给你一些可供选择的东西。

perl -pe 's/\t[^\t]*$//' file
sed -e $'s/\t[^\t]*$//' file  # Bash C-style $'string'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM