简体   繁体   English

Perl:匹配文件中的符号,从上一行打印数据

[英]Perl: match symbols in a file, print data from a previous line

I have a small data set in an XML format: 我有一个XML格式的小数据集:

 <symbolgroupdef id="bin_11-QQQQ"> 
      <symbol>QQQ</symbol> 
    </symbolgroupdef>
    <symbolgroupdef id="bin_6-AAPL">
      <symbol>AAPL</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7-BIDU">
      <symbol>BIDU</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7">
      <symbol>AAPL</symbol>
      <symbol>IBM</symbol>
    </symbolgroupdef>

I want to print out the symbolgroupdef and the symbol where a certain symbol exists. 我想打印出symbolgroupdefsymbol ,其中一定存在象征。 The symbol may appear under several symbolgroupdef groups. 该符号可能出现在几个symbolgroupdef组下。

Here is the code I have written so far: 这是我到目前为止编写的代码:

#!/usr/bin/perl
use warnings; 
use strict ;
$symbol = $ARGV[0] ;  
my $sym_file = "/data/xmlconfig/config.xml";
open my $sym_fh, '<', $sym_file or die $!;
while($line = <$sym_fh>) {
    if (my $line =~ /\<symbolgroupdef id=\".*\"\>/) {
        print $line ;
        sleep 1;
        }
    }

Basically what I want is something with will find the symbolsgroupdef id line, look for the specified symbol under it, and if it finds it, print the symbolgroupdef is line and the symbol under it. 基本上,我想要的东西将是找到symbolgroupdef id线,在其下查找指定的符号,如果找到,则打印symbolgroupdef为line和其下的符号。 The symbol will be a command line entry and specified by $ARGV[0] 该符号将是一个命令行条目,并由$ ARGV [0]指定

in the above case theses two lines should be printed 在上述情况下,应打印这两行

<symbolgroupdef id="bin_6-AAPL">
<symbol>AAPL</symbol>
<symbolgroupdef id="bin_7">
<symbol>AAPL</symbol>

I don't have any modules on this machine, and can't install any on this machine. 我在这台机器上没有任何模块,也无法在这台机器上安装任何模块。 Please forgive me for parsing XML without a module. 请原谅我在没有模块的情况下解析XML。

Here's a solution based on the idea of keeping a record of the most recent <symbolgroupdef> attribute. 这是一个基于记录最新<symbolgroupdef>属性的想法的解决方案。 It stores the id in $sgline , although you can store the whole line if you want. 它将ID存储在$sgline ,尽管您可以根据需要存储整行。 When a line turns up with the correct value in the symbol element, you can print out $sgline . 当在symbol元素中以正确的值出现一行时,可以打印$sgline

#!/usr/bin/perl
use warnings; 
use strict;

my $id = $ARGV[0];

# uncomment these to use your file
#my $sym_file = "/data/xmlconfig/config.xml";
#open my $sym_fh, '<', $sym_file or die $!;

my $sgline = '';

# change DATA to $sym_fh to use your file
while (<DATA>) {
    # match the symbolgroupdef element
    if (m#<symbolgroupdef id="(.+?)">#) {
        $sgline = $1; # or store the whole line using $sgline = $_;
    }
    # match the symbol element with the appropriate ID
    elsif (m#<symbol>$id</symbol>#) {
        print "$sgline\n";
    }
}


__DATA__
    <symbolgroupdef id="bin_11-QQQQ"> 
      <symbol>QQQ</symbol> 
    </symbolgroupdef>
    <symbolgroupdef id="bin_6-AAPL">
      <symbol>AAPL</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7-BIDU">
      <symbol>BIDU</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7">
      <symbol>AAPL</symbol>
      <symbol>IBM</symbol>
    </symbolgroupdef>

Output: 输出:

bin_6-AAPL
bin_7

Don't use a regex to parse XML. 不要使用正则表达式来解析XML。 Instead use an actual XML Parser. 而是使用实际的XML解析器。

I'd recommend using XML::LibXML : 我建议使用XML::LibXML

use strict;
use warnings;

use XML::LibXML;

my $xml = XML::LibXML->load_xml(IO => \*DATA);

for my $group ($xml->findnodes(q{//symbolgroupdef/symbol[text()='BIDU']/..})) {
    print $group->getAttribute('id'), "\n";
}

__DATA__
<root>
    <symbolgroupdef id="bin_11-QQQQ"> 
        <symbol>QQQ</symbol> 
    </symbolgroupdef>
    <symbolgroupdef id="bin_6-AAPL">
        <symbol>AAPL</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7-BIDU">
        <symbol>BIDU</symbol>
    </symbolgroupdef>
    <symbolgroupdef id="bin_7">
        <symbol>AAPL</symbol>
        <symbol>IBM</symbol>
    </symbolgroupdef>
</root>

Outputs: 输出:

bin_7-BIDU

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM