简体   繁体   English

多次分析XML文件并将结果合并在一起

[英]Parsing XML file more than once and merging the results together

I have a subroutine that parses 2 XMLs, one is the original log data, the other is a filter. 我有一个解析2个XML的子例程,一个是原始日志数据,另一个是过滤器。 I want to remove everything from the log.xml that is not found in one of the filters. 我想从其中一个过滤器中找不到的log.xml中删除所有内容。

Here is an example of my log file: 这是我的日志文件的示例:

<log>
  <message>
    <type>warning</type>
    <from>cody</from>
    <content>cant use XML::Merge</content>
  </message>
  <message>
    <type>error</type>
    <from>cody</from>
    <content>some text here</content>
  </message>
  <message>
    <type>warning</type>
    <from>charlie</from>
    <content>ruff</content>
  </message>
  <message>
    <type>error</type>
    <from>cody</from>
    <content>an error</content>
  </message>
</log>

with a filter.xml that looks like: 带有如下所示的filter.xml:

<filters>
  <filter>
    <type>warning</type>
    <content>XML::Merge</content>
  </filter>
  <filter>
    <type>error</type>
  </filter>
</filters>

This should result in all warnings containing the content "XML::Merge" to be retained and ALL errors as well. 这将导致包含内容“ XML :: Merge”的所有警告被保留,并且所有错误也将保留。 My attempt has been to make a first pass with the first filter, which results in all other message nodes being chopped, so I get no errors in the resulting XML file. 我的尝试是使用第一个过滤器进行第一次通过,这导致所有其他消息节点被斩断,因此在生成的XML文件中没有任何错误。 The next filter then chops off the ones that were supposed to remain from the first filter. 然后,下一个过滤器将第一个过滤器应保留的内容截去。 Here is my code, which works well if only one filter is in the filter.xml. 这是我的代码,如果filter.xml中只有一个过滤器,则可以很好地工作。

sub include {
  my $filterParser = XML::LibXML->new->parse_file($filterXML);
  my $logParser = XML::LibXML->new->parse_file($xml);

  foreach my $filter ( $filterParser->findnodes('/filters/filter') ) {
    foreach my $msg ( $logParser->findnodes('/log/message') ) {
        foreach my $msgNode ($msg->childNodes) {
            foreach my $filterNode ($filter->childNodes) {
                if ($msgNode->localName eq $filterNode->localName) {
                    my $m = $msgNode->textContent;
                    my $f = $filterNode->textContent;
                    if (index($m, $f) == -1) {
                        $msg->parentNode->removeChild($msg);
                    }
                }   
            }
        }
    }
  } 
  $logParser->toFile($xml);
}

I understand why it outputs a blank doc with more than one filter, but need help on getting the first pass saved somewhere, then use the original XML to make a pass with the second filter, and so on, until no filters are left and then merge everything into one XML, without duplicate messages. 我知道为什么它输出带有多个过滤器的空白文档,但是需要帮助将第一遍保存在某个地方,然后使用原始XML与第二个过滤器进行遍历,依此类推,直到没有剩下任何过滤器为止将所有内容合并为一个XML,没有重复的消息。

I think I probably titled this question poorly, but hopefully this quention and answer will help someone else some day. 我想我可能对这个问题的称呼很差,但是希望这种推论和回答有一天能对其他人有所帮助。 Anyway I've accomplished my goal with some brute force... I ended up doing a pass for each filter, and added the nodes that I want to keep to a list (I needed a flag because some filters have more than one single criteria). 无论如何,我已经用某种蛮力实现了我的目标...我最终为每个过滤器进行了一次遍历,然后将要保留的节点添加到列表中(我需要一个标志,因为某些过滤器具有多个单一条件)。 After all filters have been processed on all messages, I then loop through the log.xml and look for each node saved in my list. 在所有消息上处理完所有过滤器之后,我将遍历log.xml并查找保存在列表中的每个节点。 If the node from the log.xml doesn't match any in the list, I remove it from the tree. 如果log.xml中的节点与列表中的任何节点都不匹配,则将其从树中删除。

sub include {
  my $filterParser = XML::LibXML->new->parse_file($filterXML);
  my $logParser = XML::LibXML->new->parse_file($xml);

  my $remove = true;
  my @nodes;

  foreach my $msg ( $logParser->findnodes('/TdsMainLog/message') ) {
    foreach my $filter ( $filterParser->findnodes('/filters/filter') ) {
        foreach my $msgNode ($msg->childNodes) {
            foreach my $filterNode ($filter->childNodes) {  
                if ($msgNode->localName eq $filterNode->localName) {
                    my $m = $msgNode->textContent;
                    my $f = $filterNode->textContent;
                    if ( index($m, $f) != -1 ) {
                        #mark for keeping
                        $remove = false;
                    } 
                    else { $remove = true; } #else unmark
                }
            }
        }
        if ($remove eq false) { push (@nodes, $msg); }
        $remove = true;
    }
  }

  foreach my $msg ( $logParser->findnodes('/TdsMainLog/message') ) {
    $remove = true;
    foreach my $node (@nodes) {
        if ($msg->isSameNode($node)) {
            $remove = false;
        }
    }
    if ($remove eq true) { $msg->parentNode->removeChild($msg); }
  }
    $logParser->toFile($xml);
  }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM