简体   繁体   中英

Reading a huge text file and appending using Stringbuilder in Java

There is a huge xml file(3-4GB) (360000 lines of records) and have to read each line and append each line using Stringbuilder.once it is read it will be processed further. But will not be able to store in the internal memory as the stringbuilder buffer size exceeds. How to split the records and rest before the buffer size exceeds. Kindly suggest.

        try {
        File file = new File("test.txt");
        FileReader fileReader = new FileReader(file);
        BufferedReader bufferedReader = new BufferedReader(fileReader);
        String builder stringBuilder = new Stringbuilder ();
        String line;
         int count =0;
        while ((line = bufferedReader.readLine()) != null)`enter code here` 
         {
            if (line.startswith("<customer>") ){
              stringBuilder .append(line);
            }     
            count++;    
        }
        fileReader.close();
        System.out.println(stringBuilder .toString());
    } catch (IOException e) {
        e.printStackTrace();
    }

EDIT: Asker tried with StAX

 while (xmlEventReader.hasNext()) {
        XMLEvent xmlEvent = null;
        try {
            xmlEvent = xmlEventReader.nextEvent();
        } catch (Exception e) {
            e.printStackTrace();
        }
        if (xmlEvent.isStartElement()) {
            StartElement elem = (StartElement) xmlEvent;
            if (elem.getName().getLocalPart().equals("<Customer>")) {
                if (customerRecord) {
                    insideChildRecord = true;
                }
                customerRecord = true;
            }
        }
        if (customerRecord) {
            xmlEventWriter.add(xmlEvent);
        }
        if (xmlEvent.isEndElement()) {
            EndElement elem = (EndElement) xmlEvent;
            if (elem.getName().getLocalPart().equals("<Customer>")) {
                if (insideChildRecord) {
                    insideChildRecord = false;
                } else {
                    customerRecord = false;
                    xmlEventWriter.flush();
                    String cmlChunk = stringWriter.toString()

It looks like you are parsing an XML file (because I see you checking for "<customer>").

It would be better to use a parsing library for this than low level streams. Since the file is quite large I suggest to use either SAX or StAX for this: https://docs.oracle.com/javase/tutorial/jaxp/stax/index.html

XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream(fileName));
while(xmlEventReader.hasNext()) {
    XMLEvent xmlEvent = xmlEventReader.nextEvent();
    // parse the XML events one by one

You will have to do all the 'further processing' immediately on the XML events, since you cannot store the data in memory.

Maybe this will make it more clear how to use StAX:

    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream("huge-file.xml"));

    // this variable is re-used to store the current customer
    Customer customer = null;

    while (xmlEventReader.hasNext()) {

        XMLEvent xmlEvent = xmlEventReader.nextEvent();
        if (xmlEvent.isStartElement()) {

            StartElement startElement = xmlEvent.asStartElement();

            if (startElement.getName().getLocalPart().equalsIgnoreCase("customer")) {
                // start populating a new customer
                customer = new Customer();

                // read an attribute for example <customer number="42">
                Attribute attribute = startElement.getAttributeByName(new QName("number"));
                if (attribute != null) {
                    customer.setNumber(attribute.getValue());
                }
            }

            // read a nested element for example:
            // <customer>
            //    <name>John Doe</name>
            if(startElement.getName().getLocalPart().equals("name")){
                xmlEvent = xmlEventReader.nextEvent();
                customer.setName(xmlEvent.asCharacters().getData());
            }
        }

        if (xmlEvent.isEndElement()) {
            EndElement endElement = xmlEvent.asEndElement();
            if(endElement.getName().getLocalPart().equalsIgnoreCase("customer")){
                // all data for the current Customer has been read
                // do something with the customer, like logging it or storing it in a database
                // after this the customer variable will be re-assigned to the next customer
            }
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM