awk再加一列是第三列等于字符串

Question

I have a VCF file (tab delimited) where some "RPB" values went missing in the 2nd column and it sort of shifted the whole line to the left. 我有一个VCF文件（制表符分隔），其中第二列中缺少一些“ RPB”值，并且该类将整行向左移动。

I have the following: 我有以下几点：

1   AF1=23  AC1=23
2   RPB=123 AF1=23  AC1=23
3   AF1=23  AC1=23

I need the following: 我需要以下内容：

1   NULL    AF1=23  AC1=23
2   RPB=123 AF1=23  AC1=23
3   NULL    AF1=23  AC1=23

I tried that, it worked miserably..: 我试过了，效果很糟..：

awk 'if($2="AF1%" {print $1,"\t"NULL"\t", print$2, print$3}' input.vcf > output.vcf

I have to import this VCF to MySQL so the tab delimitation has to be conserved .. any idea? 我必须将此VCF导入MySQL，因此必须保留制表符分隔..有什么想法吗？

Answer 1

$ awk 'NF<4{sub(/\t/,"&NULL&")}1' file
1       NULL    AF1=23  AC1=23
2       RPB=123 AF1=23  AC1=23
3       NULL    AF1=23  AC1=23

By the way, you weren't TOO far off a functional solution with your attempt: 顺便说一下，您在尝试功能性解决方案时并不太遥远：

awk 'if($2="AF1%" {print $1,"\t"NULL"\t", print$2, print$3}' input.vcf

This minimally altered version would have produced the output you want: 这个最小改动的版本将产生您想要的输出：

awk '{if($2~/^AF1/) print $1 "\tNULL\t" $2 "\t" $3; else print}' input.vcf

but as you can see that's not a very idiomatic approach. 但是如您所见，这不是一种惯用的方法。

Answer 2

this awk one-liner would help you: 这个awk单线将帮助您：

kent$  awk -F'\t' -v OFS='\t' '!($2~/^RPB=/){$2="NULL\t"$2}7' file
1       NULL    AF1=23  AC1=23
2       RPB=123 AF1=23  AC1=23
3       NULL    AF1=23  AC1=23

Answer 3

IMHO you shouldn't use regex, Try this: 恕我直言，您不应该使用正则表达式，请尝试以下操作：

#!/bin/bash
cat input.vcf |\
perl -ane '
    BEGIN{$c=0;$max_fields=0}
    $c2=0;
    foreach(@F){
        $a[$c][$c2]=$_;
        if( $c2  > $max_fields ) {
            $max_fields=$c2; 
        }
        $c2++
    }
    $c++;
    END{
        foreach $i (@a){
            while (@$i < $max_fields + 1 ){
                unshift (@$i,"NULL");   
            }  
        }
        foreach $i (@a){
            foreach $x (@$i){
                print $x,"\t";
            }
            print "\n";
        }
    }'

Output: 输出：

bash test.sh 
NULL    AF1=23  AC1=23  
RPB=123 AF1=23  AC1=23  
NULL    AF1=23  AC1=23

Explanation: 说明：

The code above creates a 2D array (row/fields) 上面的代码创建2D数组（行/字段）
It also stores max_fields 它还存储max_fields
for each row, if number of fields is less than max_fields than insert "NULL" a the beginnig of the row 对于每一行，如果字段数小于max_fields，则插入“ NULL”作为该行的beginnig

Answer 4

Based on a tab delimited input file: 基于制表符分隔的输入文件：

awk -v OFS="\t" 'NF==3{$1=$1 OFS "NULL"} 1' input.vcf

where it could be altered to the following if the input file isn't tab delimited: 如果输入文件不是制表符分隔的，则可以将其更改为以下内容：

awk -v OFS="\t" '{$1=$1 (NF==3 ? OFS "NULL" : "")} 1' input.vcf

In either, when NF==3 the first field is re-assigned to contain the missing data. 在任一情况下，当NF==3将重新分配第一个字段以包含丢失的数据。 In the first example, only the output delimiters of the altered lines need adjusting, but when the data isn't tab delimited, each line needs to be "re-computed" with re-assignment prior to the 1 which is printing the whole line. 在第一个示例中，仅需要调整更改行的输出定界符，但是当数据不是制表符分隔时，需要在打印整行的行1之前通过重新分配来“重新计算”每行。

The beauty of Ed's answer when the input file is tab delimited is that the whole line output delimiter isn't "re-computed" when the substitution takes place, because it's the first delimiter that's being replaced. 当用制表符分隔输入文件时，Ed的答案之所以美，是因为发生替换时，整行输出分隔符不会“重新计算”，因为这是被替换的第一个分隔符。

awk再加一列是第三列等于字符串

问题描述

4 个解决方案

解决方案1
2 已采纳 2014-06-22 14:00:40

解决方案2
1 2014-06-21 23:03:24

解决方案3
0 2014-06-22 12:18:17

解决方案4
0 2014-06-23 05:21:58

awk再加一列是第三列等于字符串

问题描述

4 个解决方案

解决方案1 2 已采纳 2014-06-22 14:00:40

解决方案2 1 2014-06-21 23:03:24

解决方案3 0 2014-06-22 12:18:17

解决方案4 0 2014-06-23 05:21:58

解决方案1
2 已采纳 2014-06-22 14:00:40

解决方案2
1 2014-06-21 23:03:24

解决方案3
0 2014-06-22 12:18:17

解决方案4
0 2014-06-23 05:21:58