简体   繁体   English

在正则表达式搜索中使用标量

[英]Using scalars in regex search

I wrote this code: 我写了这段代码:

my $id = shift;
my $file = shift;
unless(open (INFO, $file)) { print "cant open file\n"; return 0; }
#this is how i do it - i didn't copy the code directly last time:
while(my $line = <info>)
{
    if($line =~ /d\s+S+\s\Q$id\disk\d+s\d+/g)
    {
        print "yay i found it";
        close(INFO);
        return 1;
    }
}
close(INFO);
return 0;

Example for a line that would be good is: 一条好的线的例子是:

2:     Apple_HFS 0x123456789ABC   999.9 GB   disk2s2

(as u can see $id is "0x123456789ABC") (你可以看到$ id是“0x123456789ABC”)

My question: it doesn't work - it opens the file and read the lines but the maching isn't good. 我的问题:它不起作用 - 它打开文件并读取线条,但加工效果不佳。 please tell me what am I missing here? 请告诉我,我在这里失踪了什么? I guess my regex is wrong but I couldn't fix it. 我想我的正则表达式是错误的,但我无法解决它。

I tried google and (of course) Stack overflow ( How to evaluate a word saved in a scalar via regular expression in Perl? , Detect exact string value of scalar in regex matching , Use variable as RegEx pattern ) but with no luck. 我试过谷歌和(当然)堆栈溢出( 如何通过Perl中的正则表达式评估保存在标量中的单词?在正则表达式匹配中检测标量的精确字符串值使用变量作为RegEx模式 )但没有运气。 I'm sure I'm missing some basics but this isn't my first regex - just the firs to have a scalar in it. 我敢肯定我错过了一些基础知识,但这不是我的第一个正则表达式 - 只有冷杉才能有一个标量。

Thank you 谢谢

The immediate problem is that $file is the name of the file. 当前的问题是$file$file的名称。 You open it but never actually read anything from it. 你打开它但从未真正读过它。

Here are some further comments on your code 以下是对您的代码的进一步评论

  • It is common practice, and much tidier, to collect the parameters of a subroutine like this 收集像这样的子程序的参数是常见的,而且更加整洁

     my ($id, $file) = @_ 

    This also has the advantage of copying the values, so that the actual parameters in the call are in less danger of being modified 这还具有复制值的优点,因此呼叫中的实际参数不太可能被修改

  • You should use the three-parameter form of open and lexical file handles, like this 您应该使用三参数形式的open和lexical文件句柄,如下所示

     open my $fh, '<', $file 

    In particular, the file is left open when the subroutine exits in your case because you have chosen a global file handle. 特别是,当子例程退出时,文件将保持打开状态,因为您已选择了全局文件句柄。 Lexical handles are closed implicitly when they go out of scope 当词法句柄超出范围时,它们会隐式关闭

  • You should use the $! 你应该使用$! built-in variable in the open error message to give information on why it failed open错误消息中的内置变量,提供有关失败原因的信息

  • An error is generally indicated by a bare return , which returns undef or an empty list, dependent on context. 错误通常由裸return指示,返回undef或空列表,具体取决于上下文。 return 0 in list context results in the list (0) which produces a true value if it is assigned to an array 列表上下文中的return 0导致列表(0) ,如果将其分配给数组,则生成

  • Unless you really need to be able to access all of a file at once, it is generally best to use a while loop to read and process it line by line 除非您确实需要能够一次访问所有文件,否则通常最好使用while循环逐行读取和处理它

  • The /g regex match modifier is for finding all occurrences of a pattern in a string. /g正则表达式匹配修饰符用于查找字符串中所有出现的模式。 It is unnecessary and wasteful if all you want to do is check whether the pattern appears anywhere in the string 如果你想要做的就是检查模式是否出现在字符串中的任何地方 ,这是不必要和浪费的

Also your regex has a lot of problems. 你的正则表达式还有很多问题。 If I add the /x modifier then I can add spaces to show you better what you have written 如果我添加/x修饰符,那么我可以添加空格以更好地显示您所编写的内容

/ d \s+ S+ \s \Q$id \d isk \d+ s \d+ /x

which matches 哪个匹配

  • a single d character 单个d字符
  • one or more space characters 一个或多个空格字符
  • one or more S characters 一个或多个S字符
  • a single space character 单个空格字符
  • The \\Q isn't terminated, so the rest of the string is matched literally. \\Q未终止,因此字符串的其余部分按字面匹配。 If you had \\Q$id\\E then the rest of the pattern would match 如果你有\\Q$id\\E那么模式的其余部分将匹配
  • a single digit 一位数
  • the string isk 字符串isk
  • one or more digits 一个或多个数字
  • a single s character 一个s字符
  • one or more digits 一个或多个数字

which doesn't come close to matching the record format that you show. 它与您显示的记录格式不匹配。 It's important to remember that there is no need for your pattern to matc all of the string, so you may want something like just /\\b\\Q$id\\E\\b/ which checks that your ID is somewhere in the string with word boundaries at each end. 重要的是要记住你的模式不需要匹配所有的字符串,所以你可能需要像/\\b\\Q$id\\E\\b/这样的东西来检查你的ID是否在字符串中的某个地方两端的界限。 I Don't see a string like 0x123456789ABC appearing elsewhere and giving a false positive 我没有看到像0x123456789ABC这样的字符串出现在其他地方而且给出了误报

I think the best solution is to split each record on whitespace and check whether the third field matches the ID passed in 我认为最好的解决方案是在空格上拆分每条记录,并检查第三个字段是否与传入的ID匹配

Your subroutine should look like this 你的子程序应该是这样的

sub routine {
  my ($id, $file) = @_;

  open my $fh, '<', $file or do {
    warn "Unable to open '$file' for input: $!";
    return;
  };

  while (my $line = <$fh>) {
    my @fields = split ' ', $line;
    if ($fields[2] eq $id) {
      print "Yay! I found it!\n";
      return 1;
    }
  }

  return;
}

Instead of 代替

my @lines = split(/\n/, $file);

try 尝试

my @lines = <INFO>;

or even better, 甚至更好,

unless(open (my $INFO, "<", $file)) { print "cant open file\n"; return 0; }
while (my $line = <$INFO>)
{
  # ..
}

Also you've forgot to end quotation of string ie. 你也忘了结束字符串的引用ie。 \\Q$string\\E

if($line =~ /d\s+S+\s\Q$id\Edisk\d+s\d+/g)

I think the regex is incorrect. 我认为正则表达式是不正确的。 I'm not really sure what you're trying to match so I've had an attempt based on the example: 我不确定你想要匹配什么,所以我根据这个例子进行了尝试:

\\d+:.*S\\s+\\Q$id\\E.+disk\\d+s\\d+

This will match: 这将匹配:

d+: a digit followed by a colon d+:一个数字后跟一个冒号

.*S\\s+ everything up to the 'S' in 'Apple_HFS' and a space .*S\\s+ 'Apple_HFS'中的'S'和空格

\\Q$id\\E the id string you're looking for \\Q$id\\E您正在寻找的id字符串

.+ everything up to 'disk' .+一切都达到'磁盘'

disk\\d+s\\d+ diskXXXsXXX disk\\d+s\\d+ diskXXXsXXX

Works in this snippet: 适用于此代码段:

$id = "0x123456789ABC";
$line = "2:     Apple_HFS 0x123456789ABC   999.9 GB   disk2s2";

if($line =~ /\d+:.*S\s+\Q$id\E.+disk\d+s\d+/g)
{
        print "yay i found it";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM