如何在bash中查找和打印特定字符

Question

I have file like: 我有这样的文件：

AA,A=14,B=356,C=845,D=4516
BB,A=65,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,F=225

I never know if in the row missing A,B,C or D value. 我不知道行中是否缺少A，B，C或D值。 But I need to transform this file like: 但是我需要像这样转换该文件：

AA,A=14,B=356,C=845,D=4516,-,-
BB,A=65,-,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,-,-,F=225

So if any value missing print just - mark. 因此，如果缺少任何值，则仅打印-标记。 My plan is have the same number of columns to easy parsing. 我的计划是让相同数量的列易于解析。 I am prefer awk solution. 我更喜欢awk解决方案。 Thank you for any advice or help. 感谢您的任何建议或帮助。

My first try was: 我的第一次尝试是：

awk '{gsub(/[,]/, "\t")}; BEGIN{ FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = "-" }; {print $0}'

But then I notice, that some values are missing. 但随后我注意到，某些值丢失了。

EDIT: 编辑：

From my header I know that there is value A,B,C,D,E,F... 从我的标头中，我知道值A，B，C，D，E，F ...

Answer 1

$ cat file.txt
AA,A=14,B=356,C=845,D=4516
BB,A=65,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,F=225

$ perl -F, -le '@k=(A..F);
   $op[0]=$F[0]; @op[1..6]=("-")x6;
   $j=0; for($i=1;$i<=$#F;){ if($F[$i] =~ m/$k[$j++]=/){$op[$j]=$F[$i]; $i++} }
   print join(",",@op)
   ' file.txt
AA,A=14,B=356,C=845,D=4516,-,-
BB,A=65,-,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,-,-,F=225

-F, split input line on , and save to @F array -F,分割输入行,并保存到@F数组
-l removes newline from input line, adds newline to output -l从输入行中删除换行符，向输出中添加换行符
@k=(A..F); initialize @k array with A , B , etc upto F 用A ， B等初始化@k数组直到F
$op[0]=$F[0]; @op[1..6]=("-")x6; initalize @op array with first element of @F and remaining six elements as - 用@F第一个元素@op数组，其余六个元素为-
for-loop iterates over @F array, if element matches with @k array element in corresponding index followed by = , change @op element for循环遍历@F数组，如果元素与相应索引中的@k数组元素匹配，后跟= ，则更改@op元素
print join(",",@op) print the @op array with , as separator print join(",",@op)打印@op与阵列,作为分隔符

Answer 2

Perl to the rescue! Perl进行救援！

You haven't specified how to obtain the header information, so in the following script, the @header array is populated directly. 您尚未指定如何获取标头信息，因此在以下脚本中，直接填充@header数组。

%to_idx hash maps the column names to their indices (A => 0, B => 1 etc.). %to_idx哈希将列名映射到其索引（A => 0，B => 1等）。

Each lines is split into fields, each field is compared to the expected one ( $next ) and dashes are printed if needed. 每行都分成多个字段，将每个字段与期望的字段（ $next ）比较，并在需要时打印破折号。 The same happens for missing trailing fields. 缺少尾随字段也会发生同样的情况。

#!/usr/bin/perl
use warnings;
use strict;

my @header = qw( A B C D E F );

my %to_idx = map +($header[$_] => $_), 0 .. $#header;

open my $IN, '<', shift or die $!;
while (<$IN>) {
    chomp;
    my @fields = split /,/;
    print shift @fields;
    my $next = 0;
    for my $field (@fields) {
        my ($name, $value) = split /=/, $field;
        print ',-' x ($to_idx{$name} - $next);
        print ",$name=$value";
        $next = $to_idx{$name} + 1;
    }
    print ',-' x (1 + $#header - $next);  # Missing trailing fields.
    print "\n"
}

Answer 3

Solution in TXR TXR解决方案

@(do
   (defstruct fill-missing nil
     strings
     (hash (hash :equal-based))

     (:postinit (self)
       (each ((s self.strings))
         (set [self.hash s] "-")))

     (:method add (self str val)
       (set [self.hash str] `@str=@val`))

     (:method print (self stream)
       (put-string `@{(mapcar self.hash self.strings) ","}` stream))))
@(repeat)
@  (bind fm @(new fill-missing strings '#"A B C D E F"))
@{label},@(coll)@{sym /[^,=]+/}=@{val /[^,]+/}@(do fm.(add sym val))@(end)
@  (do (put-line `@label,@fm`))
@(end)

Run: 跑：

$ txr missing.txr data
AA,A=14,B=356,C=845,D=4516,-,-
BB,A=65,-,C=255,D=841,E=5133,F=1428
CC,A=88,B=54,C=549,-,-,F=225

Answer 4

BEGIN {                                  
    PROCINFO["sorted_in"]="@ind_str_asc" # order for for(i in a)
    for(i=65;i<=90;i++)                  # create the whole alphabet to array a[]
        a[sprintf("%c", i)]              # you could read the header and use that as well
}
{
    split($0,b,",")                      # split record by ","
    printf "%s", b[1]                    # printf first element (AA, BB...)
    delete b[1]                          # get rid of it
    for(i in b) 
        b[substr(b[i],1,1)]=b[i]         # take the first letter to use as index (A=12)
    for(i in a)                          # go thru alphabet and printf from b[]
        printf "%s%s", OFS, (i in b?b[i]:"-"); print ""
}

awk -v OFS=\, -f parsing.awk tbparsed.txt
AA,A=14,B=356,C=845,D=4516,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
BB,A=65,-,C=255,D=841,E=5133,F=1428,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
CC,A=88,B=54,C=549,-,-,F=225,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-

It prints "-" for each letter not found in the record. 对于记录中未找到的每个字母，它将打印“-”。 If the data had a header, you could split to 2-D array b[NR] and change the for(i in a) to for(i in b[1]) ... printf ... b[NR][b[1][i]] ... and if you don't need the static first column, remove the first printf and delete . 如果数据具有标题，则可以split为二维数组b[NR]并将for(i in a)更改for(i in b[1]) ... printf ... b[NR][b[1][i]] ... ，如果不需要静态的第一列，请删除第一printf并delete 。

如何在bash中查找和打印特定字符

问题描述

4 个解决方案

解决方案1
3 已采纳 2016-09-05 15:15:40

解决方案2
2 2016-09-05 15:15:50

解决方案3
1 2016-09-06 06:38:02

解决方案4
1 2016-09-06 13:08:21

如何在bash中查找和打印特定字符

问题描述

4 个解决方案

解决方案1 3 已采纳 2016-09-05 15:15:40

解决方案2 2 2016-09-05 15:15:50

解决方案3 1 2016-09-06 06:38:02

解决方案4 1 2016-09-06 13:08:21

解决方案1
3 已采纳 2016-09-05 15:15:40

解决方案2
2 2016-09-05 15:15:50

解决方案3
1 2016-09-06 06:38:02

解决方案4
1 2016-09-06 13:08:21