[英]Extract values from a fixed-width column
I have text file named file
that contains the following:我有一个名为
file
文本文件,其中包含以下内容:
Australia AU 10
New Zealand NZ 1
...
If I use the following command to extract the country names from the first column:如果我使用以下命令从第一列中提取国家/地区名称:
awk '{print $1}' file
I get the following:我得到以下信息:
Australia
New
...
Only the first word of each country name is output.仅输出每个国家名称的第一个单词。
How can I get the entire country name?我怎样才能得到整个国家的名字?
Try this: 尝试这个:
$ awk '{print substr($0,1,15)}' file
Australia
New Zealand
to get rid of the last two columns 摆脱最后两列
awk 'NF>2 && NF-=2' file
NF>2
is the guard to filter records with more than 2 fields. NF>2
是过滤具有2个以上字段的记录的保护。 If your data is consistent you can drop that to simply, 如果您的数据一致,您可以简单地将其删除
awk 'NF-=2' file
To complement Raymond Hettinger's helpful POSIX-compliant answer : 为了补充Raymond Hettinger有用的POSIX兼容答案 :
It looks like your country-name column is 23 characters wide. 您的country-name列看起来像是23个字符宽。
In the simplest case, if you don't need to trim trailing whitespace , you can just use cut
: 在最简单的情况下, 如果您不需要修剪尾随空格 ,则可以使用
cut
:
# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia
New Zealand
Caveat : GNU cut
is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly. 警告 : GNU
cut
不支持UTF-8,因此如果输入是UTF-8编码并包含非ASCII字符,则上述操作将无法正常工作。
To trim trailing whitespace , you can take advantage of GNU awk
's nonstandard FIELDWIDTHS
variable: 要修剪尾随空格 ,您可以利用GNU
awk
的非标准FIELDWIDTHS
变量:
# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand
FIELDWIDTHS=23
declares the first field (reflected in $1
) to be 23 characters wide. FIELDWIDTHS=23
声明第一个字段(反映在$1
)为23个字符宽。
sub(" +$", "", $1)
then removes trailing whitespace from $1
by replacing any nonempty run of spaces ( " +"
) at the end of the field ( $1
) with the empty string. sub(" +$", "", $1)
然后通过用空字符串替换字段末尾( $1
)的任何非空运行空格( " +"
),从$1
删除尾随空格。
However, your Linux distro may come with Mawk rather than GNU Awk; 但是,您的Linux发行版可能会与Mawk而不是GNU Awk一起发布; use
awk -W version
to determine which one it is. 使用
awk -W version
来确定它是哪一个。
For a POSIX-compliant solution that trims trailing whitespace , extend Raymond's answer: 对于修剪尾随空格的POSIX兼容解决方案,请扩展Raymond的答案:
# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand
This isn't relevant in the case where your data has spaces, but often it doesn't:在您的数据有空格的情况下,这无关紧要,但通常没有:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz etc...
In these cases it's really easy to get, say, the IMAGE
column using tr
to remove multiple spaces:在这些情况下,使用
tr
删除多个空格可以很容易地获取IMAGE
列:
$ docker ps | tr --squeeze-repeats ' '
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
foo bar baz
Now you can pipe this (without the pesky header row) to cut
:现在你可以通过管道(没有讨厌的标题行)来
cut
:
$ docker ps | tr --squeeze-repeats ' ' | tail -n +2 | cut -d ' ' -f 2
foo
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.