简体   繁体   English

awk重新分析没有getline的管道输入

[英]awk reparse piped input without getline

I wrote an awk script that will parse piped input and turn it into a well spaced table. 我写了一个awk脚本,它将解析管道输入并将其转换为间隔很大的表。 To achieve this I needed to parse the input stream twice. 为此,我需要解析输入流两次。 First to parse the actual column size for each column. 首先解析每列的实际列大小。 And then for printing the table itself. 然后打印表本身。

#!/bin/gawk -f
# with changes from ooga

BEGIN {
    FS=" "
    buffer = "mktemp" | getline result
    # Initialize Vars
}

{
    # Count Columns...
}

END{
    close(buffer)

    while((getline < buffer) > 0){
        # Print formated table
    }
}

So this is working but it uses getline and all manuals pointed out that there are very few cases where you really need getline . 所以这是有效的,但它使用getline ,所有手册都指出,你真的需要getline情况非常少。 Thought the only other option I found was using files instead of pipes. 我认为唯一的其他选择是使用文件而不是管道。

Is there another option in gawk that will parse piped input twice? gawk中有另一个选项可以解析管道输入两次吗?

Just store the input in an array and then print that. 只需将输入存储在一个数组中,然后打印即可。 You didn't post any sample input and expected output so there's nothing we can test against but something like this might be what you want: 你没有发布任何样本输入和预期输出,所以我们没有什么可以测试,但这样的东西可能是你想要的:

awk '
{
    line[NR] = $0
    curLength = length($0)
    if (curLength > maxLength)
        maxLength = curLength
}
END {
    for (i=1; i<=NR; i++) {
        printf "| %*s |\n", maxLength, line[i]
    }
}
'

There's not really a better way to do it (EDIT: actually, it's probably better to use an array as Ed Morton has said; see his post and my alternate example at the end of this post), but it's not a very "awkish" program since it doesn't use the pattern{action} paradigm. 实际上并没有更好的方法(编辑:实际上,如Ed Morton所说的那样使用数组可能更好;在本文末尾看到他的帖子和我的替代例子),但这不是一个非常“笨拙”程序,因为它不使用pattern{action}范例。 The only advantage of awk for this program is the automatic field-splitting. awk对此程序的唯一优势是自动字段拆分。

Some tips: 一些技巧:

  • FS defaults to a single space (which has the special meaning that fields are separated by runs of whitespace and that leading and trailing whitespace is ignored.) So there's no need to explicitly set it to a space. FS默认为单个空格(具有特殊含义,即字段由空格的运行分隔,并且忽略前导和尾随空格。)因此无需将其显式设置为空格。

  • |& opens a coprocess, but you only need a regular pipe so just ust | |&打开一个协同进程,但你只需要一个普通的管道,所以只需要| .

  • You should explicitly close the pipe. 您应该明确关闭管道。

  • The function seems an unecessary complication. 该功能似乎是一个不必要的复杂功能。

  • You should delete the temporary file after you're finished with it. 完成后应删除临时文件。

This yields: 这会产生:

#!/bin/gawk -f

BEGIN {
    "mktemp" | getline tmpfile
    close("mktemp")
}

{
    # process and save piped data to tmpfile
}

END {
    close(tmpfile)
    while((getline < tmpfile) > 0) {
        # process data from tmpfile
    }
    system("rm " tmpfile)
}

Here's an example of using an array instead of a temporary file: 这是使用数组而不是临时文件的示例:

#!/bin/awk -f

{
    line[NR] = $0
    if (NF > nf)
        nf = NF;
    for (i=1; i<=NF; ++i)
        if (length($i) > flen[i])
            flen[i] = length($i)
}

END {
    for (r=1; r<=NR; ++r) {
        for (f=1; f<=nf; ++f) {
            split(line[r], fields)
            printf("| %-*s ", flen[f], fields[f])
        }
        print "|"
    }
}

Output: 输出:

$ cat file
one two three
four five six
seven eight nine
$ cat file | ./columnize.awk
| one   | two   | three |
| four  | five  | six   |
| seven | eight | nine  |
$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM