简体   繁体   English

如何使用Perl正则表达式解析lshw的输出?

[英]How can I parse the output of lshw using Perl regexes?

I am trying to parse lshw output into a hash with this code, what works so far. 我正在尝试使用此代码将lshw输出解析为哈希,到目前为止仍然有效。

  use strict;
  use warnings;

  my (%lshw,$key,$value);
  while (<>){
  s/#.*//;                # no comments
  s/^\s+//;               # no leading whites
  s/\s+$//;               # no trailing whites
  next unless length;     # anything left?
  if (/(?<key>.*?):\s+(?<value>.*)/x){
    $lshw{$+{key}} = $+{value};
  }
}

# remove white spaces in hash keys
for $key (keys %lshw){
  $value = delete $lshw{$key};
  for ($key){
    s/\s+//g;
   }
  $lshw{$key} = $value;
  }

my $logname   = $lshw{'logicalname'};
print "Logical name\t $logname\n";

but I strugle when I come to the line with configurations like: 但是当我遇到如下配置时我很努力:

clock: 33Mhz 
width: 32 bits 
capacity: 1Gbit/s 
configuration:autonegotiation=on broadcast=yes driver=igb driverversion=5.3.0-k duplex=full firmware=1.63, 0x800009fa ip=[REMOVED] latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s`

I was trying a hoh approach, but did not find a solution how to split key/values, as it contains multi word values like port=twisted pair . 我正在尝试一种方法,但是没有找到如何拆分键/值的解决方案,因为它包含多个单词值,例如port=twisted pair The key is always a single word. 关键永远是一个单词。

Can anyone please give me a hint how to solve this? 谁能给我一个提示如何解决这个问题?

(thanks simbabque for the strict/warnings hint) (感谢simbabque提供严格/警告提示)

What you need is to capture all characters after an equal sign which are not followed by the pattern somekeyname= . 您需要的是捕获等号后的所有字符,然后不跟随模式somekeyname=

#!/usr/bin/env perl

use strict;
use warnings;

my $s = q{configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.3.0-k duplex=full firmware=1.63, 0x800009fa ip=[REMOVED] latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s};

my ($key, $rest) = split /:\s*/, $s, 2;

my %params = ($rest =~ / (\w+) = ((?:. (?! \w+ = ))+) /gx);

use YAML::XS;
print Dump \%params;

Output: 输出:

---
autonegotiation: on
broadcast: yes
driver: igb
driverversion: 5.3.0-k
duplex: full
firmware: 1.63, 0x800009fa
ip: '[REMOVED]'
latency: '0'
link: yes
multicast: yes
port: twisted pair
speed: 1Gbit/s

In addition, your initial loop can be improved: 此外,可以改善您的初始循环:

 while (<>) {
     next if /^#/; # skip comments
     /\S/ or next; # skip blank lines
     s/^\s+//;
     s/\s+\z//;
     # ...
}

You simply need to split the config string as so 您只需要按如下方式split配置字符串

use strict;
use warnings 'all';
use feature 'say';

my $s = 'configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.3.0-k duplex=full firmware=1.63, 0x800009fa ip=[REMOVED] latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s';

say for split /\s+(?=[^\s=]+=)/, $s;

output 输出

configuration:
autonegotiation=on
broadcast=yes
driver=igb
driverversion=5.3.0-k
duplex=full
firmware=1.63, 0x800009fa
ip=[REMOVED]
latency=0
link=yes
multicast=yes
port=twisted pair
speed=1Gbit/s

You now have a list of keys and their values, correctly divided by key name. 现在,您将获得一个键及其值的列表,这些键及其值已正确地除以键名称。 This should be simple to process 这应该很容易处理

The approach of Borodin is it. Borodin的方法就是这样。

Just in case you want to parse it with regexp this will work and separates the keys from its values. 万一您想用regexp解析它,它将起作用并且将键与其值分开。

#!/usr/bin/env perl

use warnings FATAL => 'all';
use strict;
my $s = 'configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.3.0-k duplex=full firmware=1.63, 0x800009fa ip=[REMOVED] latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s';

while ($s =~ m/(?<key>[A-Za-z0-9]+)=(?<value>([\/\[\]A-Za-z0-9., -]+)(?= [a-z]+)|([\/\[\]A-Za-z0-9., -]+))/g) {
    print "$+{key} >>  $+{value}\n";
    $s =~ s/$+{key}//;
}

Output 输出量

autonegotiation >>  on
broadcast >>  yes
driver >>  igb
driverversion >>  5.3.0-k
duplex >>  full
firmware >>  1.63, 0x800009fa
ip >>  [REMOVED]
latency >>  0
link >>  yes
multicast >>  yes
port >>  twisted pair
speed >>  1Gbit/s

Pros 优点

  • key/value separation 键/值分离

Cons 缺点

  • expensive positive lookahead in regular expression 正则表达式中昂贵的正向提前

Refactoring proposal 重构提案

  • get rid of complex character group in regex [\\/\\[\\]A-Za-z0-9., -] 消除正则表达式[\\/\\[\\]A-Za-z0-9., -]的复杂字符组[\\/\\[\\]A-Za-z0-9., -]
  • get rid of replacement of found patterns in loop 摆脱循环中找到的模式的替换

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM