简体   繁体   English

是否可以在awk中使用两个不同的字段分隔符并在变量中存储两个值?

[英]Is it possible to use two different Field Separators in awk and store values from both in variables?

I guess the general question I have is, is it possible to give awk a field separator, store one of the tokens in a variable, then give awk another field separator, and store one of the tokens in a second variable, then print out both the variable values? 我想我的一般问题是,是否可以给awk一个字段分隔符,将一个标记存储在变量中,然后给awk另一个字段分隔符,并将其中一个标记存储在第二个变量中,然后打印出来变量值? It seems like the variables store a reference to the $nth token, not the value itself. 似乎变量存储对$ nth标记的引用,而不是值本身。

The specific example I had in mind more or less follows this form: {Animal}, {species} class 我想到的具体例子或多或少遵循这种形式: {动物},{种类}类

Cat, Felis catus MAMMAL
Dog, Canis lupus familiaris MAMMAL
Peregrine Falcon, Falco peregrinus AVIAN
...

and you want it to output something like: 你希望它输出如下内容:

Cat MAMMAL
Dog MAMMAL
Peregrine Falcon AVIAN
...

Where what you want is something that fits the form: {Animal} class 你想要的是符合形式的东西: {动物}类

with something being enclosed in {}'s meaning it could have any number of spaces. 将某些东西包含在{}中意味着它可以有任意数量的空格。

My original idea was I would have something like this: 我最初的想法是我会有这样的事情:

cat test.txt | awk '{FS=","}; {animal=$1}; {FS=" "}; {class=$NF}; {print animal, class}; > animals.txt

I expect the variable "animal" to store what's to the left of the comma, and "class" to to have the class type of that animal, so MAMMAL, etc. But what ends up happening is that only the last used Field separator is applied, so this would break for things that have spaces in the name, like Peregrine Falcon, etc. 我希望变量“animal”存储逗号左边的内容,“class”来存储该动物的类类型,所以MAMMAL等等。但最终发生的事情是只有最后使用的Field分隔符是应用,所以这会破坏名称中有空格的东西,比如Peregrine Falcon等。

so it would look something like 所以它看起来像

Cat, MAMMAL
Dog, MAMMAL
Peregrine AVIAN

One way using awk : 使用awk一种方法:

awk -F, '{ n = split($2,array," "); printf "%s, %s\n", $1, array[n] }' file.txt

Results: 结果:

Cat, MAMMAL
Dog, MAMMAL
Peregrine Falcon, AVIAN

You can always split() inside your awk script. 你总是可以在你的awk脚本中split() You can also manipulate fields causing the entire line to be re-parsed. 您还可以操作字段,从而重新解析整行。 For example, this gets the results in your question: 例如,这会在您的问题中得到结果:

awk '{cl=$NF; split($0,a,", "); printf("%s, %s\n", a[1], cl)}' test.txt

The field separator for awk can be any regular expression, but in this case it might be easier to use the record separator, setting it to [,\\n] will alternate between the fields you want: awk的字段分隔符可以是任何正则表达式,但在这种情况下,使用记录分隔符可能更容易,将其设置为[,\\n]将在您想要的字段之间切换:

awk -v RS='[,\n]' 'NR % 2 { printf("%s, ", $0) } NR % 2 == 0 { print $NF }'

So even fields are output in their entirety, and odd fields only output the last field. 因此,偶数字段全部输出,奇数字段仅输出最后一个字段。

paste -d, <(cut -d, -f1 input.txt) <(awk '{print $NF}' input.txt)
  • cut the first column cut第一列
  • awk get the last column awk得到最后一栏
  • paste them together 把它们paste在一起

output: 输出:

Cat,MAMMAL
Dog,MAMMAL
Peregrine Falcon,AVIAN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM