I'm processing a text file and adding a column composed of certain components of other columns. A new requirement to remove spaces and apostrophes was requested and I'm not sure the most efficient way to accomplish this task.
The file's content can be created by the following script:
content=(
john smith thomas blank 123 123456 10
jane smith elizabeth blank 456 456123 12
erin "o'brien" margaret blank 789 789123 9
juan "de la cruz" carlos blank 1011 378943 4
)
# put this into a tab-separated file, with the syntactic (double) quotes above removed
printf '%s\t%s\t%s\t%s\t%s\t%s\t%s\n' "${content[@]}" >infile
This is what I have now, but it fails to remove spaces and apostrophes:
awk -F "\t" '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$6 tolower(substr($2,0,3)); }' infile > outfile
This throws an error "sub third parameter is not a changeable object", which makes sense since I'm trying to process output instead of input, I guess.
awk -F "\t" '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$6 sub("'\''", "",tolower(substr($2,0,3))); }' infile > outfile
Is there a way I can print a combination of column 6 and part of column 2 in lower case, all while removing spaces and apostrophes from the output to the new column? Worst case scenario, I can just create a new file with my first command and process that output with a new awk command, but I'd like to do it in one pass is possible.
The second approach was close, but for order of operations:
awk -F "\t" '
BEGIN { OFS="\t"; }
{
var=$2;
sub("['\''[:space:]]", "", var);
var=substr(var, 0, 3);
print $1,$2,$3,$5,$6,$7,$6 var;
}
'
It's a guess since you didn't provide the expected output but is this what you're trying to do?
$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
abbr = $2
gsub(/[\047[:space:]]/,"",abbr)
abbr = tolower(substr(abbr,1,3))
print $1,$2,$3,$5,$6,$7,$6 abbr
}
$ awk -f tst.awk infile
john smith thomas 123 123456 10 123456smi
jane smith elizabeth 456 456123 12 456123smi
erin o'brien margaret 789 789123 9 789123obr
juan de la cruz carlos 1011 378943 4 378943del
Note that the way to represent a '
in a '
-enclosed awk script is with the octal \\047
(which will continue to work if/when you move your script to a file, unlike if you relied on "'\\''"
which only works from the command line), and that strings, arrays, and fields in awk start at 1, not 0, so your substr(..,0,3)
is wrong and awk is treating the invalid start position of 0
as if you had used the first valid start position which is 1
.
The "sub third parameter is not a changeable object"
error you were getting is because sub()
modifies the object you call it with as the 3rd argument but you're calling it with a literal string (the output of tolower(substr(...))
) and you can't modify a literal string - try sub(/o/,"","foo")
and you'll get the same error vs if you used var="foo"; sub(/o/,"",var)
var="foo"; sub(/o/,"",var)
which is valid since you can modify the content of variables.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.