简体   繁体   中英

remove lines from text file that contain specific text

I'm trying to remove lines that contain 0/0 or ./. in column 71 "FORMAT.1.GT" from a tab delimited text file.
I've tried the following code but it doesn't work. What is the correct way of accomplishing this? Thank you

my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6"; 

Since you need the exact position and know string lenghts substr can find it

perl -ne 'print if not substr($_, 70, 3) =~ m{(?:0/0|\./\.)}' filename

This prints lines only when a three-character long string starting at 71st column does not match either of 0/0 and ./.

The {} delimiters around the regex allow us to use / and | inside without escaping. The ?: is there so that the () are used only for grouping, and not capturing. It will work fine also without ?: which is there only for efficiency's sake.

perl -ane 'print unless $F[70] =~ m|([0.])/\1|' myfile > newfile

You can either call a one-liner as borodin and zdim said. Which one is right for you is still not clear because you don't tell whether 71st column means the 71st tab-separated field of a line or the 71st character of that line. Consider

12345\t6789

Now what is the 2nd column? Is it the character 2 or the field 6789 ? Borodin's answer assumes it's 6789 while zdim assumes it's 2 . Both showed a solution for either case but these solutions are stand-alone solutions. Programs of its own to be run from the commandline.

If you want to integrate that into your Perl script you could do it like this:

Replace this line:

my $cmd6 = `fgrep -v "0/0" | fgrep -v "./." $Variantlinestsvfile > $MDLtsvfile`; print "$cmd6"; 

with this snippet:

open( my $fh_in, '<', $Variantlinestsvfile ) or die "cannot open $Variantlinestsvfile: $!\n";
open( my $fh_out, '>', $MDLtsvfile ) or die "cannot open $MDLtsvfile: $!\n";
while( my $line = <$fh_in> ) {

    # character-based:
    print $fh_out $line unless (substr($line, 70, 3) =~ m{(?:0/0|\./\.)});

    # tab/field-based:
    my @fields = split(/\s+/, $line);
    print $fh_out $line unless ($fields[70] =~ m|([0.])/\1|);
}
close($fh_in);
close($fh_out);

Use either the character-based line or the tab/field-based lines. Not both!

Borodin and zdim condensed this snippet to a one-liner, but you must not call that from a Perl script.

试试吧!

awk '{ if ($71 != "./." && $71 != ".0.") print ;  }' old_file.txt  > new_file.txt

The problem with your command is that you are attempting to capture the output of a command which produces no output - all the matches are redirected to a file, so that's where all the output is going.

Anyway, calling grep from Perl is just wacky. Reading the file in Perl itself is the way to go.

If you do want a single shell command,

grep -Ev $'^([^\t]*\t){70}(\./\.|0/0)\t' file

would do what you are asking more precisely and elegantly. But you can use that regex straight off in your Perl program just as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM