简体   繁体   English

使用awk打印不带撇号或空格的新列

[英]Using awk to print a new column without apostrophes or spaces

I'm processing a text file and adding a column composed of certain components of other columns. 我正在处理一个文本文件,并添加由其他列的某些组件组成的列。 A new requirement to remove spaces and apostrophes was requested and I'm not sure the most efficient way to accomplish this task. 请求删除空格和撇号的新要求是,我不确定完成此任务的最有效方法。

The file's content can be created by the following script: 可以通过以下脚本创建文件的内容:

content=(
  john    smith          thomas       blank    123    123456    10  
  jane    smith          elizabeth    blank    456    456123    12  
  erin    "o'brien"      margaret     blank    789    789123    9  
  juan    "de la cruz"   carlos       blank    1011   378943    4
)
# put this into a tab-separated file, with the syntactic (double) quotes above removed
printf '%s\t%s\t%s\t%s\t%s\t%s\t%s\n' "${content[@]}" >infile

This is what I have now, but it fails to remove spaces and apostrophes: 这就是我现在所拥有的,但是它无法删除空格和撇号:

awk -F "\t" '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$6 tolower(substr($2,0,3)); }' infile > outfile

This throws an error "sub third parameter is not a changeable object", which makes sense since I'm trying to process output instead of input, I guess. 这会引发错误“子第三个参数不是可变对象”,这很有意义,因为我想我正在尝试处理输出而不是输入。

awk -F "\t" '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$6 sub("'\''", "",tolower(substr($2,0,3))); }' infile > outfile

Is there a way I can print a combination of column 6 and part of column 2 in lower case, all while removing spaces and apostrophes from the output to the new column? 有没有办法我可以打印出第6列和第2列的一部分的小写字母,同时删除输出到新列的空格和撇号? Worst case scenario, I can just create a new file with my first command and process that output with a new awk command, but I'd like to do it in one pass is possible. 最坏的情况是,我可以使用第一个命令创建一个新文件,并使用新的awk命令处理该输出,但是我希望可以一​​次通过。

The second approach was close, but for order of operations: 第二种方法很接近,但是对于操作顺序:

awk -F "\t" '
  BEGIN { OFS="\t"; }
  {
    var=$2;
    sub("['\''[:space:]]", "", var);
    var=substr(var, 0, 3);
    print $1,$2,$3,$5,$6,$7,$6 var;
  }
'
  • Assigning the contents you want to modify to a variable lets that variable be modified in-place. 将要修改的内容分配给变量后,就可以修改该变量。
  • Characters you want to remove should be removed before taking the substring, since otherwise you shorten your 3-character substring. 要删除的字符应获取子字符串之前删除,因为否则会缩短3个字符的子字符串。

It's a guess since you didn't provide the expected output but is this what you're trying to do? 这是一个猜测,因为您没有提供预期的输出,但这是您要执行的操作吗?

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
    abbr = $2
    gsub(/[\047[:space:]]/,"",abbr)
    abbr = tolower(substr(abbr,1,3))
    print $1,$2,$3,$5,$6,$7,$6 abbr
}

$ awk -f tst.awk infile
john    smith   thomas  123     123456  10      123456smi
jane    smith   elizabeth       456     456123  12      456123smi
erin    o'brien margaret        789     789123  9       789123obr
juan    de la cruz      carlos  1011    378943  4       378943del

Note that the way to represent a ' in a ' -enclosed awk script is with the octal \\047 (which will continue to work if/when you move your script to a file, unlike if you relied on "'\\''" which only works from the command line), and that strings, arrays, and fields in awk start at 1, not 0, so your substr(..,0,3) is wrong and awk is treating the invalid start position of 0 as if you had used the first valid start position which is 1 . 请注意,在'封闭的awk脚本中表示'的方式是使用八进制\\047 (如果/当您将脚本移至文件时,它将继续起作用,这与您是否依赖于"'\\''"仅可从命令行使用),而awk中的字符串,数组和字段从1开始,而不是0,因此您的substr(..,0,3)是错误的,并且awk将无效的起始位置0视为您使用的第一个有效开始位置是1

The "sub third parameter is not a changeable object" error you were getting is because sub() modifies the object you call it with as the 3rd argument but you're calling it with a literal string (the output of tolower(substr(...)) ) and you can't modify a literal string - try sub(/o/,"","foo") and you'll get the same error vs if you used var="foo"; sub(/o/,"",var) 您收到的"sub third parameter is not a changeable object"错误是因为sub()修改了您使用第3个参数调用它的对象,但您使用文字字符串来调用它( tolower(substr(...)) )并且您不能修改文字字符串-试试sub(/o/,"","foo") ,如果使用var="foo"; sub(/o/,"",var)则会得到相同的错误var="foo"; sub(/o/,"",var) var="foo"; sub(/o/,"",var) which is valid since you can modify the content of variables. var="foo"; sub(/o/,"",var)有效,因为您可以修改变量的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM