简体   繁体   中英

Comparing two sets of XML data without loading all the comparison data into memory

So I have two XML files that are being parsed for information. I'm trying to think of a way to determine what elements from one XML file are missing from the other XML file. Now currently the results for both XML files are loaded into two different arrays but this is not good because its a lot of data to hold on to.

I need to somehow figure out what is missing from one file without loading all the data permanently into memory since the XML files in question can be very very large.

Here is an example of the xml. Just imagine the other file is missing one of the weakness.I'm already using the SAX parser to get the actual data.

 <weaknesses>
   <wakness status="new" severity="low" id="14876">
     <cwe id="133" href="http://cwevis.org">Title1</cwe>
       <tool code="STRING" category="PERFORMANCE" name="aaa"/>
        <rule name="Method invokes inefficient new String(String) constructor"/>
         <locations>
         <location path="Catcher.java" type="file">
         <line end="93" start="93"/>
          <description>stuff</description>
         </location>
         </locations>
    </weakness>

   <weakness status="new" severity="low" id="14877">
     <cwe id="138" href="http://cwevis.org">Title2</cwe>
       <tool code="PARAMETER" category="SECURITY" name="bbb"/>
        <rule name="Servlet parameters unsafe"/>
         <locations>
          <location path="Catcher.java" type="file">
         </locations>
   </weakness>

   <weakness status="new" severity="low" id="14878">
     <cwe id="500" href="http://cwevis.org">Title3</cwe>
       <tool code="FINAL" category="asd" name="vvv"/>
         <rule name="Field isn't final and can't be protected from malicious code"/>
          <locations>
           <location path="Course.java" type="file">
           <line end="56" start="56"/>
           <description>stuff </description>
           </location>
          </locations>
   </weakness>
 </weaknesses>

Note: I'm programming this in Java and Assume that the elements are not sorted. the two ideas that come to mind are the easy way of loading both sets and comparing one against the other which dosent solve the memory problem. The other one would be to keep parsing the xml over and over without storing things but then its very process inefficient.

Lets say you compare xmlfile A against B. You first fill a set X with all A elements while parsing file A; while you parse file B, you try to remove from the stack X whatever elements you find. If you get true (it is removed from the set), you go ahead. If you get false (it was not in the set X), you save it in set Y). At the end of parsing file B, stack X will contain all elements in A and not in B; stack Y will contain all elements in B which are not in A.

This requires you to implement an entity class realizing the weakness object, which implements equals (for the remove call to work), and eventually the Comparable interface (a sorted collection may be a better fit for some dimensions of this problem).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM