简体   繁体   English

将竖线分隔的文本流中的列标准化为行

[英]normalize columns into rows from pipe-delimited text stream

I am looking for a concise, command line tool/script to take the output of another command and transform the text into a normalized data set for import to a database. 我正在寻找一个简洁的命令行工具/脚本来获取另一个命令的输出,并将文本转换为标准化的数据集,以导入到数据库中。

My input stream currently looks like this: 我的输入流当前如下所示:

timestamp|identifier|column1|column2|...|column n

(representing n observations at the same time, for the same identifier (person) ) (对于同一标识符(人),同时代表n个观察值)

I want to grab the first two fields and then prepend them to column1-n to produce output like this: 我想获取前两个字段,然后将它们添加到column1-n以产生如下输出:

timestamp|identifier|column1
timestamp|identifier|column2
timestamp|identifier|column3
...
timestamp|identifier|column n

sed? sed? awk? 啊? perl? Perl? or, would it be better to load this data into a database table as-is, then use some kind of transform script stored procedure? 还是将这些数据原样加载到数据库表中,然后使用某种转换脚本存储过程会更好? I believe I've done this before in SQL Server using PIVOT 我相信我以前在SQL Server中使用PIVOT做到了

This can make it: 这可以使它:

$ awk 'BEGIN{FS=OFS="|"} {for (i=3; i<=NF; i++) print $1, $2, $i}' file
timestamp|identifier|column1
timestamp|identifier|column2
timestamp|identifier|...
timestamp|identifier|column n

Explanation 说明

  • BEGIN{FS=OFS="|"} set input and output field separator as | BEGIN{FS=OFS="|"}设置输入和输出字段分隔符为| .
  • for (i=3; i<=NF; i++) print $1, $2, $i loop through all fields since 3rd printing 1st col, 2nd col + current field. for (i=3; i<=NF; i++) print $1, $2, $i自从第三次打印第一列,第二列+当前字段以来,在所有字段中循环for (i=3; i<=NF; i++) print $1, $2, $i
perl -F'\|' -lane 'print join "|", @F[0,1],$_ for @F[2..$#F]' file

output 输出

timestamp|identifier|column1
timestamp|identifier|column2
timestamp|identifier|column n

Explanation: 说明:

-F'\\|' is delimiter for implicit split, and it should it be escaped since it is regex 是隐式拆分的定界符,由于它是正则表达式,因此应将其转义

-l auto chomp newline, and adds one when printing -l自动排行换行,并在打印时添加一个

-a auto split into @F array -a自动拆分为@F数组

-n adds implicit while(<>) loop -n添加隐式while(<>)循环

or letting perl speak for itself, 或让perl为自己说话,

perl -MO=Deparse -F'\|' -lane 'print join "|", @F[0,1],$_ for @F[2..$#F]'
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    our(@F) = split(/\|/, $_, 0);
    print join('|', @F[0, 1], $_) foreach (@F[2 .. $#F]);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM