简体   繁体   English

从固定宽度的列中提取值

[英]Extract values from a fixed-width column

I have text file named file that contains the following:我有一个名为file文本文件,其中包含以下内容:

Australia              AU 10
New Zealand            NZ  1
...

If I use the following command to extract the country names from the first column:如果我使用以下命令从第一列中提取国家/地区名称:

awk '{print $1}' file

I get the following:我得到以下信息:

Australia
New
...

Only the first word of each country name is output.仅输出每个国家名称的第一个单词。

How can I get the entire country name?我怎样才能得到整个国家的名字?

Try this: 尝试这个:

$ awk '{print substr($0,1,15)}' file
Australia
New Zealand

to get rid of the last two columns 摆脱最后两列

awk 'NF>2 && NF-=2' file

NF>2 is the guard to filter records with more than 2 fields. NF>2是过滤具有2个以上字段的记录的保护。 If your data is consistent you can drop that to simply, 如果您的数据一致,您可以简单地将其删除

awk 'NF-=2' file

To complement Raymond Hettinger's helpful POSIX-compliant answer : 为了补充Raymond Hettinger有用的POSIX兼容答案

It looks like your country-name column is 23 characters wide. 您的country-name列看起来像是23个字符宽。

In the simplest case, if you don't need to trim trailing whitespace , you can just use cut : 在最简单的情况下, 如果您不需要修剪尾随空格 ,则可以使用cut

# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia              
New Zealand            

Caveat : GNU cut is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly. 警告GNU cut不支持UTF-8,因此如果输入是UTF-8编码并包含非ASCII字符,则上述操作将无法正常工作。


To trim trailing whitespace , you can take advantage of GNU awk 's nonstandard FIELDWIDTHS variable: 修剪尾随空格 ,您可以利用GNU awk的非标准FIELDWIDTHS变量:

# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand
  • FIELDWIDTHS=23 declares the first field (reflected in $1 ) to be 23 characters wide. FIELDWIDTHS=23声明第一个字段(反映在$1 )为23个字符宽。

  • sub(" +$", "", $1) then removes trailing whitespace from $1 by replacing any nonempty run of spaces ( " +" ) at the end of the field ( $1 ) with the empty string. sub(" +$", "", $1)然后通过用空字符串替换字段末尾( $1 )的任何非空运行空格( " +" ),从$1删除尾随空格。

However, your Linux distro may come with Mawk rather than GNU Awk; 但是,您的Linux发行版可能会与Mawk而不是GNU Awk一起发布; use awk -W version to determine which one it is. 使用awk -W version来确定它是哪一个。


For a POSIX-compliant solution that trims trailing whitespace , extend Raymond's answer: 对于修剪尾随空格POSIX兼容解决方案,请扩展Raymond的答案:

# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand

This isn't relevant in the case where your data has spaces, but often it doesn't:在您的数据有空格的情况下,这无关紧要,但通常没有:

$ docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
foo            bar       baz       etc...

In these cases it's really easy to get, say, the IMAGE column using tr to remove multiple spaces:在这些情况下,使用tr删除多个空格可以很容易地获取IMAGE列:

$ docker ps | tr --squeeze-repeats ' '
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz

Now you can pipe this (without the pesky header row) to cut :现在你可以通过管道(没有讨厌的标题行)来cut

$ docker ps | tr --squeeze-repeats ' ' | tail -n +2 | cut -d ' ' -f 2
foo

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM