简体   繁体   English

AWK - 解析 SQL output

[英]AWK - Parsing SQL output

I have a SQL output something like below from the output of a custom tool.我有一个 SQL output 来自自定义工具的 output 的类似下面的东西。 Would appreciate any help in finding what I am doing incorrectly.将不胜感激任何帮助找出我做错了什么。

column1                  | column2 | column3 | column4 | column5 | column6 |     column7     | column8 | column9 |        column10            |          column11          
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
 cec75                   | 1234     | 007    |         |    2810 |         | SOME_TEXT       |         |         | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)

I am trying to pipe this output the columns I need in my case column1 , column2 , and column7 .我正在尝试 pipe 这个 output 在我的情况下我需要的列column1column2column7 I have tried piping out like this but it just prints column1我尝试过这样的管道,但它只打印column1

tool check | awk '{print $1, $2}'

column1 |
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+------------------------------- 
cec75 |
(1 row) 

It would be nice to have something like this.有这样的东西会很好。

ce7c5,1234,SOME_TEXT

My file contents我的文件内容


                  column1                  | column2 | column3 | column4 | column5 | column6 |     column7     | column8 | column9 |        column10         |          column11          
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
 6601c | 2396     | 123         |             |               9350 |                       | SOME_TEXT |               |                | 2020-12-07 22:49:01.023+00 | 2020-12-08 07:22:37.419669+00
(1 row)


                  column1                  | column2 | column3 | column4 | column5 | column6 |     column7     | column8 | column9 |        column10         |          column11          
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
 cec75 | 1567     | 007        |             |               2810 |                       | SOME_TEXT |               |                | 2020-12-07 20:28:46.865+00 | 2020-12-08 07:28:10.319888+00
(1 row)

You need to set correct FS and somehow filters out undesired (junk) lines.您需要设置正确的FS并以某种方式过滤掉不需要的(垃圾)行。 I would do it following way.我会按照以下方式进行。 Let file.txt content be:file.txt内容为:

column1                  | column2 | column3 | column4 | column5 | column6 |     column7     | column8 | column9 |        column10            |          column11          
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
 cec75                   | 1234     | 007    |         |    2810 |         | SOME_TEXT       |         |         | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)

then然后

awk 'BEGIN{FS="[[:space:]]+\\|[[:space:]]+";OFS=","}(NR>=2 && NF>=2){print $1,$2,$7}' file.txt

output: output:

cec75,1234,2020-12-07 20:28:46.865+00

Explanation: I set field separator ( FS ) to one or more :space: literal |说明:我将字段分隔符 ( FS ) 设置为一个或多个:space: literal | one or more :space: where :space: means any whitespace.一个或多个:space:其中:space:表示任何空格。 Depending on your data you might elect to use zero or more rather than one or more - to do so replace + with * .根据您的数据,您可能会选择使用零个或多个而不是一个或多个 - 为此将+替换为* For every line which is not first one (this filter out header) and has at least 2 fields (this filter out line with - and + and (1 row) ) I print content of 1st column followed by , followed by content of 2nd column followed by , followed by content of 7th column.对于不是第一行的每一行(这个过滤掉标题)并且至少有2个字段(这个过滤掉带有-+(1 row)的行)我打印第一列的内容,然后是第二列的内容其次是,然后是第 7 列的内容。

EDIT: Since OP added edited set of samples, so adding this solution now.编辑:由于 OP 添加了经过编辑的样本集,因此现在添加此解决方案。 This considers that you want to print lines after lines which starts from --- .这认为您要在从---开始的行之后打印行。

awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}' Input_file

OR或者

your_command | 
awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}'

Description:描述:

Command line switches...命令行开关...

  • The delimiter is |分隔符是| surrounded by spaces.被空间包围。 (Note that we need to use a couple of \ 's to escape | if we feed the regex for the delimiter in from the command line.) (请注意,如果我们从命令行输入分隔符的正则表达式,我们需要使用几个\来转义| 。)
  • In addition to input delimiter (input field separator) the output delimiter (output field separator) can also be set using a command line switch.除了输入分隔符(输入字段分隔符)之外,output 分隔符(输出字段分隔符)也可以使用命令行开关进行设置。

The awk script... awk脚本...

  • If a header is encountered or a ( is seen on a line, it's not a valid line; so, just ignore it.如果遇到 header 或(在一行上看到,它不是有效行;所以,忽略它。
  • If the line now has any alphanumeric characters, it's now a valid line to operate on;如果该行现在有任何字母数字字符,则它现在是可以操作的有效行; so, and we strip the leading spaces off the line, and then print the columns we want.所以,我们从行中去掉前导空格,然后打印我们想要的列。
tool check | awk -F' *\\| *' -v OFS=, '/column|\(/ { next } /[[:alnum:]]/ { sub(/^ +/, ""); print $1, $2, $7 }'

Examining the data more closely... It looks as though the date-stamp (which always has a : in it) might be present on all valid records... If so, the script can be reduced to something much more simple.更仔细地检查数据......看起来好像日期戳(其中总是有一个:可能出现在所有有效记录上......如果是这样,脚本可以简化为更简单的东西。

tool check | awk -F' *\\| *' -v OFS=, '$10 ~ /:/ { sub(/^ +/, ""); print $1, $2, $7 }'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM