如何在 perl 中匹配多行

Question

Lets say I have a netlist file formatted like so for each module:假设我有一个为每个模块格式化的网表文件：

module module_name1(in1, in2,
    in3, in4, in5,
    out1, out2, out3
    out4, out5);

There are many of these throughout the netlist.整个网表中有很多这样的内容。 I want to be able to grab the module name and the list of ports.我希望能够获取模块名称和端口列表。 Here is what I have so far:这是我到目前为止所拥有的：

use strict;
use warnings;

my $input_file = $ARGV[0];
open (my $INFILE, $input_file) or die "$input_file cannot be opened.\n";

my $outfile = "verilog.port.txt";
open (my $OUTFILE, '>', $outfile) or die "\nUnable to create $outfile\n";

my ($module_name,$port_list);

while (<>) {
  if ($_ =~ /module (\w+)\((.+)\)/m) {
    $module_name = $1;
    $port_list = $2;
    print $OUTFILE "Module Name: $module_name Port list: $port_list\n"
  }
}
close $INFILE;

close $OUTFILE;

This will only work if the module is instantiated in only 1 line.这仅在模块仅在 1 行中实例化时才有效。 For example if :例如，如果：

module module_name2(in1, in2, out1, out2);

and I will get something like:我会得到类似的东西：

Module Name: module_name2 Port list: in1, in2, out1, out2

However if the module is created over multiple lines like in my first example, my reg expression cannot pick it up.但是，如果模块是像在我的第一个示例中那样通过多行创建的，则我的 reg 表达式无法选择它。 So I was wondering if there is a way to match through multiple lines using perl.所以我想知道是否有办法使用 perl 匹配多行。

Answer 1

You are reading files line by line, you need to read them either by paragraph (chunks separated by a empty line) or the entire file if if there is no such separation;您正在逐行阅读文件，如果没有这种分隔，则需要按段落（由空行分隔的块）或整个文件阅读它们； otherwise $_ contains only one line and will not match.否则$_只包含一行并且不会匹配。

Also, the /m flag is not what you are looking for ( /m makes ^ / $ match beginning/end of lines), you need /s which makes .此外， /m标志不是您要查找的内容（ /m使^ / $匹配行的开头/结尾），您需要/s使. include newlines (see: the perlreref documentation page, the perlop page is a bit confusing)包括换行符（参见： perlreref文档页面， perlop页面有点混乱）

By paragraph, this one liner should do the trick:按段落，这个班轮应该可以解决问题：

$ perl -l -00 -ne 'if ( /module (\w+)\((.+)\)/s) { @ports = split(/\s*,\s*/,$2); print "Module name: $1 Ports: " . join(", ", @ports)}' <<'EOF'
> module module_name1(in1, in2,
>     in3, in4, in5,
>     out1, out2, out3,
>     out4, out5);
>
>
> module module_name2(in21, in22,
>     in23, in24, in25,
>     out21, out22, out23,
>     out24, out25);
> EOF
Module name: module_name1 Ports: in1, in2, in3, in4, in5, out1, out2, out3, out4, out5
Module name: module_name2 Ports: in21, in22, in23, in24, in25, out21, out22, out23, out24, out25

You can use -MO=Deparse to see the entire code:您可以使用-MO=Deparse来查看整个代码：

perl -MO=Deparse -l -00 -ne 'if ( /module (\w+)\((.+)\)/s) { @ports = split(/\s*,\s*/,$2); print "Module name: $1 Ports: " . join(", ", @ports)}'
BEGIN { $/ = ""; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    if (/module (\w+)\((.+)\)/s) {
        @ports = split(/\s*,\s*/, $2, 0);
        print "Module name: $1 Ports: " . join(', ', @ports);
    }
}

If you don't have empty lines separating the modules, you will need to get the entire file at once (slurp)如果您没有分隔模块的空行，则需要一次获取整个文件（slurp）

perl -l -0777 -ne 'while (/module (\w+)\((.+?)\);/sg) { @ports = split(/\s*,\s*/,$2); print "Module name: $1 Ports: " . join(", ", @ports)}' <<'EOF'
> module module_name1(in1, in2,
>     in3, in4, in5,
>     out1, out2, out3,
>     out4, out5);
> module module_name2(in21, in22,
>     in23, in24, in25,
>     out21, out22, out23,
>     out24, out25);
> EOF
Module name: module_name1 Ports: in1, in2, in3, in4, in5, out1, out2, out3, out4, out5
Module name: module_name2 Ports: in21, in22, in23, in24, in25, out21, out22, out23, out24, out25

Again, you can use -MO=Deparse to see what is happening:同样，您可以使用-MO=Deparse来查看发生了什么：

perl -MO=Deparse -l -0777 -ne 'while (/module (\w+)\((.+?)\);/sg) { @ports = split(/\s*,\s*/,$2); print "Module name: $1 Ports: " . join(", ", @ports)}'
BEGIN { $/ = undef; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    while (/module (\w+)\((.+?)\);/gs) {
        @ports = split(/\s*,\s*/, $2, 0);
        print "Module name: $1 Ports: " . join(', ', @ports);
    }
}

The key element in these approaches is the -0 flag which in the -00 form sets $/ to the empty string enabling paragraph mode, and in -0777 form sets $/ to undef enabling slurp mode (reading the entire file) (see also: $RS in the perlvar manual.)这些方法中的关键元素是-0标志，它在-00形式中将$/设置$/启用段落模式的空字符串，在-0777形式-0777 $/设置$/ undef 启用 slurp 模式（读取整个文件）（另请参见：perlvar 手册中的$RS 。）

A important caveat: -l sets the $\\ variable to $/ (which by default is "\\n"), and in this case it has to used before -0 in the command line if you want the output to be separated by new lines.一个重要的警告： -l将$\\变量设置为$/ （默认情况下为“\\n”），在这种情况下，如果您希望输出由 new 分隔，则必须在命令行中的-0之前使用它线。

For a more elegant approach, you can use the following script:对于更优雅的方法，您可以使用以下脚本：

#!/bin/perl

use warnings;
use strict;

use File::Slurp;
use Data::Dumper;

my $data = read_file($ARGV[0]);

my %modules = $data =~ /module (\w+)\((.+?)\);/sg;

$modules{$_} = [split(/\s*,\s*/, $modules{$_})] for keys(%modules);

print Dumper(\%modules);

This would give you a data structure containing all the information needed - see https://ideone.com/BuuR8I for a live demo这将为您提供一个包含所有所需信息的数据结构 - 有关实时演示，请参见https://ideone.com/BuuR8I

Answer 2

See following code snippet for one of many possible solutions有关许多可能的解决方案之一，请参阅以下代码片段

NOTE: OP is missing in posted data block , after out3注：OP中缺少发布的数据块,之后out3

#!/usr/bin/perl 
#
# vim: ai:ts=4:sw=4
#

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $debug = 0;          # debug flag

my $data = do { local $/; <DATA> };

$data =~ s/[ \n]+/ /g;

my @lines = split ';', $data;

say Dumper(\@lines) if $debug;

for (@lines) {
    next unless /module\s+(.*)?\((.*)\)/;
    say "Module: $1 -- Ports: $2";
}


__DATA__
module module_name1(in1, in2,
    in3, in4, in5,
    out1, out2, out3,
    out4, out5);


module module_name2(in21, in22,
    in23, in24, in25,
    out21, out22, out23,
    out24, out25);

Output输出

Module: module_name1 -- Ports: in1, in2, in3, in4, in5, out1, out2, out3, out4, out5
Module: module_name2 -- Ports: in21, in22, in23, in24, in25, out21, out22, out23, out24, out25

Answer 3

I have to disagree that reading line-by-line is 'inappropriate' when perl has the .. range operator .当 perl 具有.. range operator时，我不得不同意逐行阅读是“不合适的”。

Take the OP code and modify as such:取OP代码并修改如下：

while (<>) {
    if (/module/ .. /\)/) {
        $module_name = $1 if /module\s+(\w+)/;
        my $done=/\)/;
        s/.*\(//; s/\).*//;s/,\s+/, /g;
        chomp;
        $port_list .= $_;
        print $OUTFILE "Module Name: $module_name Port list $port_list\n" if $done;
    }
}

In other words, from lines matching /module/ to lines matching /)/ , accumulate the port list.换句话说，从匹配/module/的行到匹配/)/ ，累积端口列表。

如何在 perl 中匹配多行

问题描述

3 个解决方案

解决方案1
3 2020-02-21 06:14:17

解决方案2
0 2020-02-21 04:50:44

解决方案3
0 2020-02-22 05:39:14

如何在 perl 中匹配多行

问题描述

3 个解决方案

解决方案1 3 2020-02-21 06:14:17

解决方案2 0 2020-02-21 04:50:44

解决方案3 0 2020-02-22 05:39:14

解决方案1
3 2020-02-21 06:14:17

解决方案2
0 2020-02-21 04:50:44

解决方案3
0 2020-02-22 05:39:14