简体   繁体   中英

Using linux to find and replace the a value in a column in a tab delimited file

I have this data below for example:

Name  Chrom  Position
rs1    1     1234  
rs2    1     1789   
rs3    1     1289   
1      1     1269   
2      1     1897

I would like to find all the lines in the first column that DO NOT contain the string "rs" and replace with rs'chrom''position' . Name 1 would look like rs11269 .

It would end up looking like this:

Name  Chrom  Position   
rs1     1     1234  
rs2     1     1789   
rs3     1     1289   
rs11269 1     1269   
rs11897 1     1897

I know with awk I can do

awk '!/rs/{print $1}' file

and it will find all the lines in the first column that do not contain the string "rs" but how do I take this further now to replace it with my own rs ID built from chrom and position? Would I use gsub or something else? This doesn't have to be in shell commands but also Perl is another option to use as well. Thank you for your help.

You can use this command:

$ awk 'BEGIN{FS=OFS="\t"}NR>1&&!($1~/rs/){$1="rs"$2$3}1' file
Name     Chrom  Position
rs1      1      1234
rs2      1      1789
rs3      1      1289
rs11269  1      1269
rs11897  1      1897

With BEGIN{FS=OFS="\\t"} we set the input and output field separators to the tab character, with NR>1&&!($1~/rs/) we filter the lines that are not the first (header) and that do not contain the string "rs" in the first field, and with {$1="rs"$2$3} we change the value of the first field to the desired value. The final 1 is the true condition, so that all lines are printed.

Solution in perl:

#!/usr/bin/perl 

use strict; 
use warnings; 

use Data::Dumper;

my $row;

#   Open file to read.
open (my $FH, '<', 'test') or die ($!);

# Write new updated file.
open (my $OUT, '>', 'updated_test') or die ($!);


while (my $line = <$FH>){
    chomp($line);
    my @val = split( "\t" , $line );

    if ( $val[0] !~  m/rs/ ) { 
        my $row = "rs" . $val[0] . $val[1] . $val[2] . "\t" . $val[1] . "\t" . $val[2];
        print $OUT $row."\n";
    } 
    else{
        print $OUT $line."\n";
    }
}

close $FH;  
close $OUT; 

OUTPUT:

Name  Chrom  Position   
rs1     1     1234  
rs2     1     1789   
rs3     1     1289   
rs11269 1     1269   
rs11897 1     1897

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM