简体   繁体   中英

How to split large xml files into small chunks(Small Xml Files) using StAX In Java

I have XML of Having 5 Suppllier's i want split this large file into 2 Suppllier's each the File starting tag isand ending tag islike this i have 5 Suppllier's...

i red few articles if file size is large go with the StaX parser my file is ( >6GB ) so how can i split my sample file into multiple files..

Here is my Sample XML File.....

<?xml version="1.0" encoding="UTF-8"?>
<Shop xmlns="http://www.shpAddress.com">
  <tab:Product xmlns:tab="http://www.productName.com">
    <tab:Suppllier>
      <col:Items xmlns:col="http://www.Items.com">
        <col:Delivery>
          <Prize xsi:nil="true" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>              
          <LastModifiedDate xmlns="">2020-01-28</LastModifiedDate>
        </col:Delivery>
      </col:Items>
     </tab:Suppllier>
     <tab:Suppllier>
      <col:Items xmlns:col="http://www.Items.com">
        <col:Delivery>
          <Prize xsi:nil="true" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>          
          <LastModifiedDate xmlns="">2021-02-28</LastModifiedDate>
        </col:Delivery>
      </col:Items>
     </tab:Suppllier>
     <tab:Suppllier>
      <col:Items xmlns:col="http://www.Items.com">
        <col:Delivery>
          <Prize xsi:nil="true" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>          
          <LastModifiedDate xmlns="">2022-02-28</LastModifiedDate>
        </col:Delivery>
      </col:Items>
     </tab:Suppllier>
     <tab:Suppllier>
      <col:Items xmlns:col="http://www.Items.com">
        <col:Delivery>
          <Prize xsi:nil="true" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>          
          <LastModifiedDate xmlns="">2023-03-28</LastModifiedDate>
        </col:Delivery>
      </col:Items>
     </tab:Suppllier>
     <tab:Suppllier>
      <col:Items xmlns:col="http://www.Items.com">
        <col:Delivery>
          <Prize xsi:nil="true" xmlns="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>          
          <LastModifiedDate xmlns="">2024-04-28+05:30</LastModifiedDate>
        </col:Delivery>
      </col:Items>
     </tab:Suppllier>
  </tab:Product>
</Shop>

There's expected bunch of boilerplate code with the StaX. It's hopefully possible to restrain it in 2 times with SAXParser/XMLOutputFactory pair.

If a format of XML is quite stable there's another approach looking much simple. Trivial regexp can pick a text between <tab:Suppllier> and </tab:Suppllier> tags. Other stable parts (at very beginning and ending of XML) can be hard-coded and written to the new partial files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM