简体   繁体   中英

Extract values from a fixed-width column

I have text file named file that contains the following:

Australia              AU 10
New Zealand            NZ  1
...

If I use the following command to extract the country names from the first column:

awk '{print $1}' file

I get the following:

Australia
New
...

Only the first word of each country name is output.

How can I get the entire country name?

Try this:

$ awk '{print substr($0,1,15)}' file
Australia
New Zealand

to get rid of the last two columns

awk 'NF>2 && NF-=2' file

NF>2 is the guard to filter records with more than 2 fields. If your data is consistent you can drop that to simply,

awk 'NF-=2' file

To complement Raymond Hettinger's helpful POSIX-compliant answer :

It looks like your country-name column is 23 characters wide.

In the simplest case, if you don't need to trim trailing whitespace , you can just use cut :

# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia              
New Zealand            

Caveat : GNU cut is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly.


To trim trailing whitespace , you can take advantage of GNU awk 's nonstandard FIELDWIDTHS variable:

# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand
  • FIELDWIDTHS=23 declares the first field (reflected in $1 ) to be 23 characters wide.

  • sub(" +$", "", $1) then removes trailing whitespace from $1 by replacing any nonempty run of spaces ( " +" ) at the end of the field ( $1 ) with the empty string.

However, your Linux distro may come with Mawk rather than GNU Awk; use awk -W version to determine which one it is.


For a POSIX-compliant solution that trims trailing whitespace , extend Raymond's answer:

# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand

This isn't relevant in the case where your data has spaces, but often it doesn't:

$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
foo            bar       baz       etc...

In these cases it's really easy to get, say, the IMAGE column using tr to remove multiple spaces:

$ docker ps | tr --squeeze-repeats ' '
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz

Now you can pipe this (without the pesky header row) to cut :

$ docker ps | tr --squeeze-repeats ' ' | tail -n +2 | cut -d ' ' -f 2
foo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM