简体   繁体   English

如何使用 awk 显示包含特定单词的文件列

[英]How to display file columns containing a specific word using awk

I would like to print all columns that contains word, for example "watermelon".我想打印所有包含单词的列,例如“西瓜”。 A was thinking about using together these 2 formulas, because they are working separetly (one is doing something for every column in file and another is checking if column contains specyfic word). A 正在考虑一起使用这两个公式,因为它们是分开工作的(一个正在为文件中的每一列做一些事情,另一个正在检查列是否包含特定的单词)。

awk '{for(i=1;i<=NF-1;i++) printf $i" "; print $i}' a.csv
awk -F"," '{if ($2 == " watermelon") print $2}' a.csv

But when I try put them toghether my code isn't working但是当我尝试将它们放在一起时,我的代码不起作用

#!/bin/bash 
awk '{for(i=1;i<=NF-1;i++) 
         awk -F"," '{if ($i == " watermelon") 
              print $i}' a.csv    
        }' a.csv

For example this is my file a.csv例如这是我的文件 a.csv

lp, type, name, number, letter
1, fruit, watermelon, 6, a
2, fruit, apple, 7, b
3, vegetable, onion, 8, c
4, vegetable, broccoli, 6, b
5, fruit, orange, 5, c

And this is the result i would like to get, while searching for word watermelon这是我想得到的结果,同时搜索 word 西瓜

name
watermelon
apple
onion
broccoli
orange

Here's one that processes the data twice:这是处理数据两次的一个:

$ awk -F', ' '                          # remember to se OFS if you need one
NR==FNR {                               # on the first run
    for(i=1;i<=NF;i++)                  # find 
        if($i=="watermelon")            # watermelon fields
            a[i]                        # and mark them
    next
}
FNR==1 {                                # in case there were no such field
    for(i in a)                         # test 
        next                            # and continue
    exit                                # or exit
}
{                                       # on the second run
    for(i=1;i<=NF;i++)                 
        if(i in a)b=b (b==""?"":OFS) $i # buffer those fields for output
    print b                             # and output
    b=""                                # clean that buffer for next record
}' file file

Output: Output:

name
watermelon
apple
onion
broccoli
orange
$ cat tst.awk
BEGIN { FS=OFS=", " }
NR==FNR {
    for (inFldNr=1; inFldNr<=NF; inFldNr++) {
        if ( $inFldNr == tgt ) {
            hits[inFldNr]
        }
    }
    next
}
FNR==1 {
    for (inFldNr=1; inFldNr<=NF; inFldNr++) {
        if ( inFldNr in hits ) {
            out2in[++numOutFlds] = inFldNr
        }
    }
}
{
    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
        inFldNr = out2in[outFldNr]
        printf "%s%s", $inFldNr, (outFldNr<numOutFlds ? OFS : ORS)
    }
}

$ awk -v tgt='watermelon' -f tst.awk file file
name
watermelon
apple
onion
broccoli
orange

The main difference between the above and @JamesBrown's approach is that in the 2nd pass of the file my script only loops over the fields to be output while James' loops over all input fields and so will be slower in what is presumably the normal case where not all input fields have to be output.上述方法与@JamesBrown 的方法之间的主要区别在于,在文件的第二遍中,我的脚本仅循环遍历字段为 output 而 James 循环遍历所有输入字段,因此在可能的正常情况下会变慢并非所有输入字段都必须是 output。

Regarding printf $i in your code btw - never do that, always do printf "%s", $i for any input data instead as the former will fail when your input contains printf formatting chars like %s .关于printf $i在您的代码中顺便说一句 - 永远不要这样做,总是对任何输入数据执行printf "%s", $i %s ,因为当您的输入包含 ZAFA0FF8B27B87666A6BDE87251C 5.FDEZ 格式时,前者将失败

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM