I have text file named file
that contains the following:
Australia AU 10
New Zealand NZ 1
...
If I use the following command to extract the country names from the first column:
awk '{print $1}' file
I get the following:
Australia
New
...
Only the first word of each country name is output.
How can I get the entire country name?
Try this:
$ awk '{print substr($0,1,15)}' file
Australia
New Zealand
to get rid of the last two columns
awk 'NF>2 && NF-=2' file
NF>2
is the guard to filter records with more than 2 fields. If your data is consistent you can drop that to simply,
awk 'NF-=2' file
To complement Raymond Hettinger's helpful POSIX-compliant answer :
It looks like your country-name column is 23 characters wide.
In the simplest case, if you don't need to trim trailing whitespace , you can just use cut
:
# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia
New Zealand
Caveat : GNU cut
is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly.
To trim trailing whitespace , you can take advantage of GNU awk
's nonstandard FIELDWIDTHS
variable:
# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand
FIELDWIDTHS=23
declares the first field (reflected in $1
) to be 23 characters wide.
sub(" +$", "", $1)
then removes trailing whitespace from $1
by replacing any nonempty run of spaces ( " +"
) at the end of the field ( $1
) with the empty string.
However, your Linux distro may come with Mawk rather than GNU Awk; use awk -W version
to determine which one it is.
For a POSIX-compliant solution that trims trailing whitespace , extend Raymond's answer:
# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand
This isn't relevant in the case where your data has spaces, but often it doesn't:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz etc...
In these cases it's really easy to get, say, the IMAGE
column using tr
to remove multiple spaces:
$ docker ps | tr --squeeze-repeats ' '
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz
Now you can pipe this (without the pesky header row) to cut
:
$ docker ps | tr --squeeze-repeats ' ' | tail -n +2 | cut -d ' ' -f 2
foo
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.