簡體   English   中英

在Java中使用Stringbuilder讀取巨大的文本文件並追加

[英]Reading a huge text file and appending using Stringbuilder in Java

有一個巨大的xml文件(3-4GB)(360000行記錄),必須讀取每一行並使用Stringbuilder附加每一行。讀取后將對其進行進一步處理。 但是由於stringbuilder緩沖區大小超出限制,將無法存儲在內部存儲器中。 如何拆分記錄並在緩沖區大小超出之前休息。 請提示。

        try {
        File file = new File("test.txt");
        FileReader fileReader = new FileReader(file);
        BufferedReader bufferedReader = new BufferedReader(fileReader);
        String builder stringBuilder = new Stringbuilder ();
        String line;
         int count =0;
        while ((line = bufferedReader.readLine()) != null)`enter code here` 
         {
            if (line.startswith("<customer>") ){
              stringBuilder .append(line);
            }     
            count++;    
        }
        fileReader.close();
        System.out.println(stringBuilder .toString());
    } catch (IOException e) {
        e.printStackTrace();
    }

編輯:Asker嘗試與StAX

 while (xmlEventReader.hasNext()) {
        XMLEvent xmlEvent = null;
        try {
            xmlEvent = xmlEventReader.nextEvent();
        } catch (Exception e) {
            e.printStackTrace();
        }
        if (xmlEvent.isStartElement()) {
            StartElement elem = (StartElement) xmlEvent;
            if (elem.getName().getLocalPart().equals("<Customer>")) {
                if (customerRecord) {
                    insideChildRecord = true;
                }
                customerRecord = true;
            }
        }
        if (customerRecord) {
            xmlEventWriter.add(xmlEvent);
        }
        if (xmlEvent.isEndElement()) {
            EndElement elem = (EndElement) xmlEvent;
            if (elem.getName().getLocalPart().equals("<Customer>")) {
                if (insideChildRecord) {
                    insideChildRecord = false;
                } else {
                    customerRecord = false;
                    xmlEventWriter.flush();
                    String cmlChunk = stringWriter.toString()

看起來您正在解析XML文件(因為我看到您正在檢查“ <customer>”)。

為此,最好使用解析庫而不是低級流。 由於文件很大,因此我建議為此使用SAX或StAX: https : //docs.oracle.com/javase/tutorial/jaxp/stax/index.html

XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream(fileName));
while(xmlEventReader.hasNext()) {
    XMLEvent xmlEvent = xmlEventReader.nextEvent();
    // parse the XML events one by one

由於您無法將數據存儲在內存中,因此您必須立即對XML事件進行所有“進一步處理”。

也許這將使如何使用StAX更清晰:

    XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
    XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(new FileInputStream("huge-file.xml"));

    // this variable is re-used to store the current customer
    Customer customer = null;

    while (xmlEventReader.hasNext()) {

        XMLEvent xmlEvent = xmlEventReader.nextEvent();
        if (xmlEvent.isStartElement()) {

            StartElement startElement = xmlEvent.asStartElement();

            if (startElement.getName().getLocalPart().equalsIgnoreCase("customer")) {
                // start populating a new customer
                customer = new Customer();

                // read an attribute for example <customer number="42">
                Attribute attribute = startElement.getAttributeByName(new QName("number"));
                if (attribute != null) {
                    customer.setNumber(attribute.getValue());
                }
            }

            // read a nested element for example:
            // <customer>
            //    <name>John Doe</name>
            if(startElement.getName().getLocalPart().equals("name")){
                xmlEvent = xmlEventReader.nextEvent();
                customer.setName(xmlEvent.asCharacters().getData());
            }
        }

        if (xmlEvent.isEndElement()) {
            EndElement endElement = xmlEvent.asEndElement();
            if(endElement.getName().getLocalPart().equalsIgnoreCase("customer")){
                // all data for the current Customer has been read
                // do something with the customer, like logging it or storing it in a database
                // after this the customer variable will be re-assigned to the next customer
            }
        }
    }

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM