[英]AWK - Parsing SQL output
I have a SQL output something like below from the output of a custom tool.我有一个 SQL output 来自自定义工具的 output 的类似下面的东西。 Would appreciate any help in finding what I am doing incorrectly.
将不胜感激任何帮助找出我做错了什么。
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
cec75 | 1234 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)
I am trying to pipe this output the columns I need in my case column1 , column2 , and column7 .我正在尝试 pipe 这个 output 在我的情况下我需要的列column1 , column2和column7 。 I have tried piping out like this but it just prints column1
我尝试过这样的管道,但它只打印column1
tool check | awk '{print $1, $2}'
column1 |
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
cec75 |
(1 row)
It would be nice to have something like this.有这样的东西会很好。
ce7c5,1234,SOME_TEXT
My file contents我的文件内容
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
6601c | 2396 | 123 | | 9350 | | SOME_TEXT | | | 2020-12-07 22:49:01.023+00 | 2020-12-08 07:22:37.419669+00
(1 row)
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
cec75 | 1567 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 07:28:10.319888+00
(1 row)
You need to set correct FS
and somehow filters out undesired (junk) lines.您需要设置正确的
FS
并以某种方式过滤掉不需要的(垃圾)行。 I would do it following way.我会按照以下方式进行。 Let
file.txt
content be:让
file.txt
内容为:
column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
cec75 | 1234 | 007 | | 2810 | | SOME_TEXT | | | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)
then然后
awk 'BEGIN{FS="[[:space:]]+\\|[[:space:]]+";OFS=","}(NR>=2 && NF>=2){print $1,$2,$7}' file.txt
output: output:
cec75,1234,2020-12-07 20:28:46.865+00
Explanation: I set field separator ( FS
) to one or more :space:
literal |
说明:我将字段分隔符 (
FS
) 设置为一个或多个:space:
literal |
one or more :space:
where :space:
means any whitespace.一个或多个
:space:
其中:space:
表示任何空格。 Depending on your data you might elect to use zero or more rather than one or more - to do so replace +
with *
.根据您的数据,您可能会选择使用零个或多个而不是一个或多个 - 为此将
+
替换为*
。 For every line which is not first one (this filter out header) and has at least 2 fields (this filter out line with -
and +
and (1 row)
) I print content of 1st column followed by ,
followed by content of 2nd column followed by ,
followed by content of 7th column.对于不是第一行的每一行(这个过滤掉标题)并且至少有2个字段(这个过滤掉带有
-
和+
和(1 row)
的行)我打印第一列的内容,
然后是第二列的内容其次是,
然后是第 7 列的内容。
EDIT: Since OP added edited set of samples, so adding this solution now.编辑:由于 OP 添加了经过编辑的样本集,因此现在添加此解决方案。 This considers that you want to print lines after lines which starts from
---
.这认为您要在从
---
开始的行之后打印行。
awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}' Input_file
OR或者
your_command |
awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}'
Description:描述:
Command line switches...命令行开关...
|
|
surrounded by spaces.\
's to escape |
if we feed the regex for the delimiter in from the command line.) \
来转义|
。) The awk script... awk脚本...
(
is seen on a line, it's not a valid line; so, just ignore it.(
在一行上看到,它不是有效行;所以,忽略它。tool check | awk -F' *\\| *' -v OFS=, '/column|\(/ { next } /[[:alnum:]]/ { sub(/^ +/, ""); print $1, $2, $7 }'
Examining the data more closely... It looks as though the date-stamp (which always has a :
in it) might be present on all valid records... If so, the script can be reduced to something much more simple.更仔细地检查数据......看起来好像日期戳(其中总是有一个
:
可能出现在所有有效记录上......如果是这样,脚本可以简化为更简单的东西。
tool check | awk -F' *\\| *' -v OFS=, '$10 ~ /:/ { sub(/^ +/, ""); print $1, $2, $7 }'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.