简体   繁体   中英

Parsing XML file more than once and merging the results together

I have a subroutine that parses 2 XMLs, one is the original log data, the other is a filter. I want to remove everything from the log.xml that is not found in one of the filters.

Here is an example of my log file:

<log>
  <message>
    <type>warning</type>
    <from>cody</from>
    <content>cant use XML::Merge</content>
  </message>
  <message>
    <type>error</type>
    <from>cody</from>
    <content>some text here</content>
  </message>
  <message>
    <type>warning</type>
    <from>charlie</from>
    <content>ruff</content>
  </message>
  <message>
    <type>error</type>
    <from>cody</from>
    <content>an error</content>
  </message>
</log>

with a filter.xml that looks like:

<filters>
  <filter>
    <type>warning</type>
    <content>XML::Merge</content>
  </filter>
  <filter>
    <type>error</type>
  </filter>
</filters>

This should result in all warnings containing the content "XML::Merge" to be retained and ALL errors as well. My attempt has been to make a first pass with the first filter, which results in all other message nodes being chopped, so I get no errors in the resulting XML file. The next filter then chops off the ones that were supposed to remain from the first filter. Here is my code, which works well if only one filter is in the filter.xml.

sub include {
  my $filterParser = XML::LibXML->new->parse_file($filterXML);
  my $logParser = XML::LibXML->new->parse_file($xml);

  foreach my $filter ( $filterParser->findnodes('/filters/filter') ) {
    foreach my $msg ( $logParser->findnodes('/log/message') ) {
        foreach my $msgNode ($msg->childNodes) {
            foreach my $filterNode ($filter->childNodes) {
                if ($msgNode->localName eq $filterNode->localName) {
                    my $m = $msgNode->textContent;
                    my $f = $filterNode->textContent;
                    if (index($m, $f) == -1) {
                        $msg->parentNode->removeChild($msg);
                    }
                }   
            }
        }
    }
  } 
  $logParser->toFile($xml);
}

I understand why it outputs a blank doc with more than one filter, but need help on getting the first pass saved somewhere, then use the original XML to make a pass with the second filter, and so on, until no filters are left and then merge everything into one XML, without duplicate messages.

I think I probably titled this question poorly, but hopefully this quention and answer will help someone else some day. Anyway I've accomplished my goal with some brute force... I ended up doing a pass for each filter, and added the nodes that I want to keep to a list (I needed a flag because some filters have more than one single criteria). After all filters have been processed on all messages, I then loop through the log.xml and look for each node saved in my list. If the node from the log.xml doesn't match any in the list, I remove it from the tree.

sub include {
  my $filterParser = XML::LibXML->new->parse_file($filterXML);
  my $logParser = XML::LibXML->new->parse_file($xml);

  my $remove = true;
  my @nodes;

  foreach my $msg ( $logParser->findnodes('/TdsMainLog/message') ) {
    foreach my $filter ( $filterParser->findnodes('/filters/filter') ) {
        foreach my $msgNode ($msg->childNodes) {
            foreach my $filterNode ($filter->childNodes) {  
                if ($msgNode->localName eq $filterNode->localName) {
                    my $m = $msgNode->textContent;
                    my $f = $filterNode->textContent;
                    if ( index($m, $f) != -1 ) {
                        #mark for keeping
                        $remove = false;
                    } 
                    else { $remove = true; } #else unmark
                }
            }
        }
        if ($remove eq false) { push (@nodes, $msg); }
        $remove = true;
    }
  }

  foreach my $msg ( $logParser->findnodes('/TdsMainLog/message') ) {
    $remove = true;
    foreach my $node (@nodes) {
        if ($msg->isSameNode($node)) {
            $remove = false;
        }
    }
    if ($remove eq true) { $msg->parentNode->removeChild($msg); }
  }
    $logParser->toFile($xml);
  }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM