[英]How to find and print specific character in bash
I have file like: 我有这样的文件:
AA,A=14,B=356,C=845,D=4516
BB,A=65,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,F=225
I never know if in the row missing A,B,C or D value. 我不知道行中是否缺少A,B,C或D值。 But I need to transform this file like: 但是我需要像这样转换该文件:
AA,A=14,B=356,C=845,D=4516,-,-
BB,A=65,-,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,-,-,F=225
So if any value missing print just -
mark. 因此,如果缺少任何值,则仅打印-
标记。 My plan is have the same number of columns to easy parsing. 我的计划是让相同数量的列易于解析。 I am prefer awk solution. 我更喜欢awk解决方案。 Thank you for any advice or help. 感谢您的任何建议或帮助。
My first try was: 我的第一次尝试是:
awk '{gsub(/[,]/, "\t")}; BEGIN{ FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = "-" }; {print $0}'
But then I notice, that some values are missing. 但随后我注意到,某些值丢失了。
EDIT: 编辑:
From my header I know that there is value A,B,C,D,E,F... 从我的标头中,我知道值A,B,C,D,E,F ...
$ cat file.txt
AA,A=14,B=356,C=845,D=4516
BB,A=65,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,F=225
$ perl -F, -le '@k=(A..F);
$op[0]=$F[0]; @op[1..6]=("-")x6;
$j=0; for($i=1;$i<=$#F;){ if($F[$i] =~ m/$k[$j++]=/){$op[$j]=$F[$i]; $i++} }
print join(",",@op)
' file.txt
AA,A=14,B=356,C=845,D=4516,-,-
BB,A=65,-,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,-,-,F=225
-F,
split input line on ,
and save to @F
array -F,
分割输入行,
并保存到@F
数组 -l
removes newline from input line, adds newline to output -l
从输入行中删除换行符,向输出中添加换行符 @k=(A..F);
initialize @k
array with A
, B
, etc upto F
用A
, B
等初始化@k
数组直到F
$op[0]=$F[0]; @op[1..6]=("-")x6;
initalize @op
array with first element of @F
and remaining six elements as -
用@F
第一个元素@op
数组,其余六个元素为-
@F
array, if element matches with @k
array element in corresponding index followed by =
, change @op
element for循环遍历@F
数组,如果元素与相应索引中的@k
数组元素匹配,后跟=
,则更改@op
元素 print join(",",@op)
print the @op
array with ,
as separator print join(",",@op)
打印@op
与阵列,
作为分隔符 Perl to the rescue! Perl进行救援!
You haven't specified how to obtain the header information, so in the following script, the @header array is populated directly. 您尚未指定如何获取标头信息,因此在以下脚本中,直接填充@header数组。
%to_idx
hash maps the column names to their indices (A => 0, B => 1 etc.). %to_idx
哈希将列名映射到其索引(A => 0,B => 1等)。
Each lines is split into fields, each field is compared to the expected one ( $next
) and dashes are printed if needed. 每行都分成多个字段,将每个字段与期望的字段( $next
)比较,并在需要时打印破折号。 The same happens for missing trailing fields. 缺少尾随字段也会发生同样的情况。
#!/usr/bin/perl
use warnings;
use strict;
my @header = qw( A B C D E F );
my %to_idx = map +($header[$_] => $_), 0 .. $#header;
open my $IN, '<', shift or die $!;
while (<$IN>) {
chomp;
my @fields = split /,/;
print shift @fields;
my $next = 0;
for my $field (@fields) {
my ($name, $value) = split /=/, $field;
print ',-' x ($to_idx{$name} - $next);
print ",$name=$value";
$next = $to_idx{$name} + 1;
}
print ',-' x (1 + $#header - $next); # Missing trailing fields.
print "\n"
}
@(do (defstruct fill-missing nil strings (hash (hash :equal-based)) (:postinit (self) (each ((s self.strings)) (set [self.hash s] "-"))) (:method add (self str val) (set [self.hash str] `@str=@val`)) (:method print (self stream) (put-string `@{(mapcar self.hash self.strings) ","}` stream)))) @(repeat) @ (bind fm @(new fill-missing strings '#"A B C D E F")) @{label},@(coll)@{sym /[^,=]+/}=@{val /[^,]+/}@(do fm.(add sym val))@(end) @ (do (put-line `@label,@fm`)) @(end)
Run: 跑:
$ txr missing.txr data AA,A=14,B=356,C=845,D=4516,-,- BB,A=65,-,C=255,D=841,E=5133,F=1428 CC,A=88,B=54,C=549,-,-,F=225
BEGIN {
PROCINFO["sorted_in"]="@ind_str_asc" # order for for(i in a)
for(i=65;i<=90;i++) # create the whole alphabet to array a[]
a[sprintf("%c", i)] # you could read the header and use that as well
}
{
split($0,b,",") # split record by ","
printf "%s", b[1] # printf first element (AA, BB...)
delete b[1] # get rid of it
for(i in b)
b[substr(b[i],1,1)]=b[i] # take the first letter to use as index (A=12)
for(i in a) # go thru alphabet and printf from b[]
printf "%s%s", OFS, (i in b?b[i]:"-"); print ""
}
awk -v OFS=\, -f parsing.awk tbparsed.txt
AA,A=14,B=356,C=845,D=4516,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
BB,A=65,-,C=255,D=841,E=5133,F=1428,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
CC,A=88,B=54,C=549,-,-,F=225,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
It prints "-" for each letter not found in the record. 对于记录中未找到的每个字母,它将打印“-”。 If the data had a header, you could split
to 2-D array b[NR]
and change the for(i in a)
to for(i in b[1]) ... printf ... b[NR][b[1][i]] ...
and if you don't need the static first column, remove the first printf
and delete
. 如果数据具有标题,则可以split
为二维数组b[NR]
并将for(i in a)
更改for(i in b[1]) ... printf ... b[NR][b[1][i]] ...
,如果不需要静态的第一列,请删除第一printf
并delete
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.