简体   繁体   English

捕获有关文件和找到的正则表达式的信息

[英]Capture information about both the file and the regexp found

I have a directory full of files that contain numbers I want to capture. 我的目录充满了包含要捕获的数字的文件。 I also want to know which numbers come from which particular files. 我也想知道哪个数字来自哪个特定文件。 Right now I'm just running grep, which outputs something like: 现在,我正在运行grep,其输出如下:

grep ./* -e 'expression'
./file1: expression numberA
./file1: expression numberB
./file1: expression numberA
./file2: expression numberC numberD
...

What I want is to extract a piece of the filenames (in this example, 1 for file1), and also all of the numbers that appear after my expression. 我想要提取的是一部分文件名(在此示例中,file1为1),以及表达式之后出现的所有数字。

While I'd prefer to just do everything in bash, any solution is welcome. 虽然我希望只使用bash进行所有操作,但是欢迎任何解决方案。

EDIT: To be clear, I want as output the following: 编辑:要明确,我想作为输出以下内容:

file1:
numberA
numberB
file2:
numberC
numberD
...

I've also edited the earlier portion (./file1: expression numberA). 我还编辑了前面的部分(./file1:表达式numberA)。 Sorry for not being clear before. 很抱歉以前不清楚。

Try this: 尝试这个:

grep -e 'expression' * | perl -pe 's/^(.*?)(\d+)(:.*)$/$1$2$3 $2/'

This should produce all input lines. 这将产生所有输入线。 If the filename portion of the input line ends with a number, that number should be appended to the input line. 如果输入行的文件名部分以数字结尾,则该数字应附加到输入行中。

$1 , $2 and $3 are backreferences to the 3 subexpressions (those parts of the regular expression in parentheses). $1$2$3是对3个子表达式(括号中正则表达式的那些部分)的反向引用。

The commandline switch -e instructs the Perl interpreter to execute the given expression. 命令行开关-e指示Perl解释器执行给定的表达式。 -p loops on the input and prints $_ . -p在输入上循环并显示$_

However, since you also want all numbers from after your match, you probably need something more complex: 但是,由于您还希望比赛后获得所有数字,因此您可能需要更复杂的东西:

grep -e 'expression' * | perl -ne '
  chomp;
  ($a,$b) = split ":";
  $a =~ s/.*(\d+)$/$1/;
  $b =~ s/.*expression(.*)/$1/;
  $b =~ s/[^\d]+/ /g;
  print "$_ $a $b\n";
'

-n does the same as -p , only without implicitly printing $_ . -n-p相同,只是不隐式打印$_

Edit: After reading your updated requirements I think you may be better off with an all-Perl solution. 编辑:在阅读了更新的需求之后,我认为使用全Perl解决方案可能会更好。

#!/usr/bin/env perl

use strict;
use warnings;

foreach (@ARGV) {
  my $file = $_;
  open FILE, "<$file" or die "Can't open file $file.";
  my $first = 1;
  foreach (<FILE>) {
    if (m/expression(.*)/) {
      my $values = $1;
      if ($first) {
        print "$file:\n";
        $first = 0;
      }
      $values =~ s/(^ +| +$)//g;
      $values =~ s/ +/\n/g;
      print "$values\n";
    }
  }
  close FILE;
}

If you just want to see the file number and numbers you could use something like: 如果您只想查看文件号和数字,可以使用类似以下内容的文件:

find . -exec sh -c "echo -n \;{}::;grep -e 'expression' {}" \; | perl -pe 's/^.*(\d+)::/File \1:\n/' | perl -pe 's/\D*(\d+)$/\1/'

Note: this would break if your expression contains ::number (used as a delimiter, can be changed). 注意:如果您的表达式包含::number (用作定界符,可以更改),则此操作会中断。 Prints last file name if no matches are found. 如果未找到匹配项,则打印上一个文件名。

Would produce: 将产生:

File 2:
878
File 3:
199
File 4:
123
234
9
0

Example file2: 示例文件2:

foo 123
bar 123
expression 878
lorem ipsum

If you just want the number pairs (file number + number) then you could try: 如果只需要数字对(文件号+数字),则可以尝试:

grep ./* -e 'expression' | perl -pe 's/^.*?(\d+):.*?(\d+)$/\1 \2/'

Output: 输出:

2 878
3 199
4 123
4 234
4 9
4 0

As mentioned in my comment, your questions is somewhat unclear on exactly what you want. 正如我在评论中提到的,您的问题尚不清楚您到底想要什么。 Providing some examples would be beneficial. 提供一些示例将是有益的。

Thanks to the two who answered - with the information that both of you gave me, I was able to figure out a perfect solution: 感谢两个人的回答-有了你们俩给我的信息,我得以找到一个完美的解决方案:

grep -i expression ./* | perl -pe 's/.*(\d+).*:.*(\d+)/$1 $2/' | sort | uniq

This gives as output: 这给出了输出:

1 numberA
1 numberB
2 numberC numberD

I think grep is not needed in this case. 我认为在这种情况下不需要grep。 Awk or perl needed to use to accumulate data, so they can look for the expression in the file. 需要使用Awk或perl来累积数据,以便它们可以在文件中查找表达式。 Here is an awk example: 这是一个awk示例:

awk '/expression/ {f[FILENAME]; for(i=2;i<=NF;++i) v[FILENAME,$i]} 
END {for(i in f) {print i":"; for(j in v) if(sub("^"i SUBSEP,"",j))print j}}' ./*

Output 输出量

file1:
numberB
numberA
file2:
numberD
numberC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM