简体   繁体   English

在Java中使用Stringbuilder读取巨大的文本文件并追加

[英]Reading a huge text file and appending using Stringbuilder in Java

There is a huge xml file(3-4GB) (360000 lines of records) and have to read each line and append each line using Stringbuilder.once it is read it will be processed further. 有一个巨大的xml文件(3-4GB)(360000行记录),必须读取每一行并使用Stringbuilder附加每一行。读取后将对其进行进一步处理。 But will not be able to store in the internal memory as the stringbuilder buffer size exceeds. 但是由于stringbuilder缓冲区大小超出限制,将无法存储在内部存储器中。 How to split the records and rest before the buffer size exceeds. 如何拆分记录并在缓冲区大小超出之前休息。 Kindly suggest. 请提示。

        try {
        File file = new File("test.txt");
        FileReader fileReader = new FileReader(file);
        BufferedReader bufferedReader = new BufferedReader(fileReader);
        String builder stringBuilder = new Stringbuilder ();
        String line;
         int count =0;
        while ((line = bufferedReader.readLine()) != null)`enter code here` 
         {
            if (line.startswith("<customer>") ){
              stringBuilder .append(line);
            }     
            count++;    
        }
        fileReader.close();
        System.out.println(stringBuilder .toString());
    } catch (IOException e) {
        e.printStackTrace();
    }

EDIT: Asker tried with StAX 编辑:Asker尝试与StAX

 while (xmlEventReader.hasNext()) {
        XMLEvent xmlEvent = null;
        try {
            xmlEvent = xmlEventReader.nextEvent();
        } catch (Exception e) {
            e.printStackTrace();
        }
        if (xmlEvent.isStartElement()) {
            StartElement elem = (StartElement) xmlEvent;
            if (elem.getName().getLocalPart().equals("<Customer>")) {
                if (customerRecord) {
                    insideChildRecord = true;
                }
                customerRecord = true;
            }
        }
        if (customerRecord) {
            xmlEventWriter.add(xmlEvent);
        }
        if (xmlEvent.isEndElement()) {
            EndElement elem = (EndElement) xmlEvent;
            if (elem.getName().getLocalPart().equals("<Customer>")) {
                if (insideChildRecord) {
                    insideChildRecord = false;
                } else {
                    customerRecord = false;
                    xmlEventWriter.flush();
                    String cmlChunk = stringWriter.toString()

It looks like you are parsing an XML file (because I see you checking for "<customer>"). 看起来您正在解析XML文件(因为我看到您正在检查“ <customer>”)。

It would be better to use a parsing library for this than low level streams. 为此,最好使用解析库而不是低级流。 Since the file is quite large I suggest to use either SAX or StAX for this: https://docs.oracle.com/javase/tutorial/jaxp/stax/index.html 由于文件很大,因此我建议为此使用SAX或StAX: https : //docs.oracle.com/javase/tutorial/jaxp/stax/index.html

XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream(fileName));
while(xmlEventReader.hasNext()) {
    XMLEvent xmlEvent = xmlEventReader.nextEvent();
    // parse the XML events one by one

You will have to do all the 'further processing' immediately on the XML events, since you cannot store the data in memory. 由于您无法将数据存储在内存中,因此您必须立即对XML事件进行所有“进一步处理”。

Maybe this will make it more clear how to use StAX: 也许这将使如何使用StAX更清晰:

    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream("huge-file.xml"));

    // this variable is re-used to store the current customer
    Customer customer = null;

    while (xmlEventReader.hasNext()) {

        XMLEvent xmlEvent = xmlEventReader.nextEvent();
        if (xmlEvent.isStartElement()) {

            StartElement startElement = xmlEvent.asStartElement();

            if (startElement.getName().getLocalPart().equalsIgnoreCase("customer")) {
                // start populating a new customer
                customer = new Customer();

                // read an attribute for example <customer number="42">
                Attribute attribute = startElement.getAttributeByName(new QName("number"));
                if (attribute != null) {
                    customer.setNumber(attribute.getValue());
                }
            }

            // read a nested element for example:
            // <customer>
            //    <name>John Doe</name>
            if(startElement.getName().getLocalPart().equals("name")){
                xmlEvent = xmlEventReader.nextEvent();
                customer.setName(xmlEvent.asCharacters().getData());
            }
        }

        if (xmlEvent.isEndElement()) {
            EndElement endElement = xmlEvent.asEndElement();
            if(endElement.getName().getLocalPart().equalsIgnoreCase("customer")){
                // all data for the current Customer has been read
                // do something with the customer, like logging it or storing it in a database
                // after this the customer variable will be re-assigned to the next customer
            }
        }
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM