防止“foo”將“foo-bar”與 grep -w 匹配

Question

我在我的 Perl 腳本中使用 grep 並且我正在嘗試 grep 我提供的確切關鍵字。 問題是“-w”不能將“-”符號識別為分隔符。

示例：假設我有這兩條記錄：

A1BG    0.0767377011073753
A1BG-AS1    0.233775553296782

如果我給 grep -w "A1BG" 它返回它們，但我只想要確切的一個。

有什么建議？ 提前謝謝了。

附注。

這是我的整個代碼。 輸入文件是由兩列制表符分隔的。 所以，我想為每個基因保留一個獨特的價值。 如果我有多個記錄，我會計算平均值。

#!/usr/bin/perl
use strict;
use warnings;

#Find the average fc between common genes
sub avg {
my $total;
$total += $_ foreach @_;
   return $total / @_;
}

my @mykeys = `cat G13_T.txt| awk '{print \$1}'| sort -u`;
foreach (@mykeys)
{
    my @TSS = ();

    my $op1 = 0;

    my $key = $_;
    chomp($key);
    #print "$key\n";
    my $command = "cat G13_T.txt|grep -E '([[:space:]]|^)$key([[:space:]]|\$)'";
    #my $command = "cat Unique_Genes/G13_T.txt|grep -w $key";
    my @belongs= `$command`;
    chomp(@belongs);
    my $count = scalar(@belongs);
    if ($count == 1) {
            print "$belongs[0]\n";
    }
    else {
            for (my $i = 0; $i < $count; $i++) {
                    my @token = split('\t', $belongs[$i]);
                    my $lfc = $token[1];
                    push (@TSS, $lfc);
            }
            $op1 = avg(@TSS);
            print $key ."\t". $op1. "\n";
    }
}

Answer 1

您可以像這樣使用帶有grep的 POSIX ERE 正則表達式：

grep -E '([[:space:]]|^)A1BG([[:space:]]|$)' file

僅返回匹配項（不匹配行）：

grep -Eo '([[:space:]]|^)A1BG([[:space:]]|$)' file

細節

([[:space:]]|^) - 第 1 組：空格或行首
A1BG - 一個子串
([[:space:]]|$) - 第 2 組：空格或行尾

Answer 2

如果我在評論中得到澄清，目標是找到第一列中唯一名稱的平均值（第二列）。 這樣就不需要外部工具了。

逐行讀取文件並將每個名稱的值相加。 名稱唯一性是通過使用散列來授予的，名稱是鍵。 與此同時，還跟蹤他們的人數

use warnings;
use strict;
use feature 'say';

my $file = shift // die "Usage: $0 filename\n";

open my $fh, '<', $file or die "Can't open $file: $!";

my %results;

while (<$fh>) {
    #my ($name, $value) = split /\t/;
    my ($name, $value) = split /\s+/;  # used for easier testing

    $results{$name}{value} += $value;
    ++$results{$name}{count};
}

foreach my $name (sort keys %results) { 
    $results{$name}{value} /= $results{$name}{count} 
        if $results{$name}{count} > 1;

    say "$name => $results{$name}{value}";
}

處理文件后，每個累積值除以其計數並被其覆蓋，因此除以平均值（/=除和賦值），如果計數> 1 （作為效率的小度量）。

如果知道為每個名稱找到的所有值有任何用處，則將它們存儲在每個鍵的 arrayref 中，而不是添加它們

while (<$fh>) {
    #my ($name, $value) = split /\t/;
    my ($name, $value) = split /\s+/;  # used for easier testing

    push @{$results{$name}}, $value;
}

現在我們不需要計數，因為它由數組中的元素數給出（ref）

use List::Util qw(sum);

foreach my $name (sort keys %results) {
    say "$name => ", sum(@{$results{$name}}) / @{$results{$name}};
}

請注意，以這種方式構建的哈希需要與文件大小相當（甚至可能超過）的內存，因為所有值都已存儲。

這是使用所示的兩行樣本數據進行測試的，在文件中重復和更改。 該代碼不會以任何方式測試輸入，但希望第二個字段始終為數字。

請注意，沒有理由退出我們的程序並使用外部命令。

防止“foo”將“foo-bar”與 grep -w 匹配

問題描述

2 個解決方案

解決方案1
3 2019-03-28 17:50:40

解決方案2
3 已采納 2019-03-28 20:29:22

防止“foo”將“foo-bar”與 grep -w 匹配

問題描述

2 個解決方案

解決方案1 3 2019-03-28 17:50:40

解決方案2 3 已采納 2019-03-28 20:29:22

解決方案1
3 2019-03-28 17:50:40

解決方案2
3 已采納 2019-03-28 20:29:22