如何使用Perl正则表达式解析lshw的输出？

Question

I am trying to parse lshw output into a hash with this code, what works so far. 我正在尝试使用此代码将lshw输出解析为哈希，到目前为止仍然有效。

  use strict;
  use warnings;

  my (%lshw,$key,$value);
  while (<>){
  s/#.*//;                # no comments
  s/^\s+//;               # no leading whites
  s/\s+$//;               # no trailing whites
  next unless length;     # anything left?
  if (/(?<key>.*?):\s+(?<value>.*)/x){
    $lshw{$+{key}} = $+{value};
  }
}

# remove white spaces in hash keys
for $key (keys %lshw){
  $value = delete $lshw{$key};
  for ($key){
    s/\s+//g;
   }
  $lshw{$key} = $value;
  }

my $logname   = $lshw{'logicalname'};
print "Logical name\t $logname\n";

but I strugle when I come to the line with configurations like: 但是当我遇到如下配置时我很努力：

clock: 33Mhz 
width: 32 bits 
capacity: 1Gbit/s 
configuration:autonegotiation=on broadcast=yes driver=igb driverversion=5.3.0-k duplex=full firmware=1.63, 0x800009fa ip=[REMOVED] latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s`

I was trying a hoh approach, but did not find a solution how to split key/values, as it contains multi word values like port=twisted pair . 我正在尝试一种方法，但是没有找到如何拆分键/值的解决方案，因为它包含多个单词值，例如port=twisted pair 。 The key is always a single word. 关键永远是一个单词。

Can anyone please give me a hint how to solve this? 谁能给我一个提示如何解决这个问题？

(thanks simbabque for the strict/warnings hint) （感谢simbabque提供严格/警告提示）

Answer 1

What you need is to capture all characters after an equal sign which are not followed by the pattern somekeyname= . 您需要的是捕获等号后的所有字符，然后不跟随模式somekeyname= 。

#!/usr/bin/env perl

use strict;
use warnings;

my $s = q{configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.3.0-k duplex=full firmware=1.63, 0x800009fa ip=[REMOVED] latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s};

my ($key, $rest) = split /:\s*/, $s, 2;

my %params = ($rest =~ / (\w+) = ((?:. (?! \w+ = ))+) /gx);

use YAML::XS;
print Dump \%params;

Output: 输出：

---
autonegotiation: on
broadcast: yes
driver: igb
driverversion: 5.3.0-k
duplex: full
firmware: 1.63, 0x800009fa
ip: '[REMOVED]'
latency: '0'
link: yes
multicast: yes
port: twisted pair
speed: 1Gbit/s

In addition, your initial loop can be improved: 此外，可以改善您的初始循环：

 while (<>) {
     next if /^#/; # skip comments
     /\S/ or next; # skip blank lines
     s/^\s+//;
     s/\s+\z//;
     # ...
}

Answer 2

You simply need to split the config string as so 您只需要按如下方式split配置字符串

use strict;
use warnings 'all';
use feature 'say';

my $s = 'configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.3.0-k duplex=full firmware=1.63, 0x800009fa ip=[REMOVED] latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s';

say for split /\s+(?=[^\s=]+=)/, $s;

output 输出

configuration:
autonegotiation=on
broadcast=yes
driver=igb
driverversion=5.3.0-k
duplex=full
firmware=1.63, 0x800009fa
ip=[REMOVED]
latency=0
link=yes
multicast=yes
port=twisted pair
speed=1Gbit/s

You now have a list of keys and their values, correctly divided by key name. 现在，您将获得一个键及其值的列表，这些键及其值已正确地除以键名称。 This should be simple to process 这应该很容易处理

Answer 3

The approach of Borodin is it. Borodin的方法就是这样。

Just in case you want to parse it with regexp this will work and separates the keys from its values. 万一您想用regexp解析它，它将起作用并且将键与其值分开。

#!/usr/bin/env perl

use warnings FATAL => 'all';
use strict;
my $s = 'configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.3.0-k duplex=full firmware=1.63, 0x800009fa ip=[REMOVED] latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s';

while ($s =~ m/(?<key>[A-Za-z0-9]+)=(?<value>([\/\[\]A-Za-z0-9., -]+)(?= [a-z]+)|([\/\[\]A-Za-z0-9., -]+))/g) {
    print "$+{key} >>  $+{value}\n";
    $s =~ s/$+{key}//;
}

Output 输出量

autonegotiation >>  on
broadcast >>  yes
driver >>  igb
driverversion >>  5.3.0-k
duplex >>  full
firmware >>  1.63, 0x800009fa
ip >>  [REMOVED]
latency >>  0
link >>  yes
multicast >>  yes
port >>  twisted pair
speed >>  1Gbit/s

Pros 优点

key/value separation 键/值分离

Cons 缺点

expensive positive lookahead in regular expression 正则表达式中昂贵的正向提前

Refactoring proposal 重构提案

get rid of complex character group in regex [\\/\\[\\]A-Za-z0-9., -] 消除正则表达式[\\/\\[\\]A-Za-z0-9., -]的复杂字符组[\\/\\[\\]A-Za-z0-9., -]
get rid of replacement of found patterns in loop 摆脱循环中找到的模式的替换

如何使用Perl正则表达式解析lshw的输出？

问题描述

3 个解决方案

解决方案1
4 2018-04-13 13:39:57

解决方案2
1 2018-04-13 13:56:33

output 输出

解决方案3
0 2018-04-13 14:57:47

如何使用Perl正则表达式解析lshw的输出？

问题描述

3 个解决方案

解决方案1 4 2018-04-13 13:39:57

解决方案2 1 2018-04-13 13:56:33

output 输出

解决方案3 0 2018-04-13 14:57:47

解决方案1
4 2018-04-13 13:39:57

解决方案2
1 2018-04-13 13:56:33

解决方案3
0 2018-04-13 14:57:47