简体   繁体   English

写入大量数据时,部分数据丢失/每个数据都存在时,写入过程非常缓慢

[英]When writing a huge amount of data, parts of it get lost / When every data is present, the write process is very slow

I have a problem with the Buffered writer when writing a large amount of strings to a file.将大量字符串写入文件时,缓冲写入器出现问题。

Situation: I have to read a large text file (>100k lines) and perform some modification to each line (remove whitspaces, check for optional commands, etc.) and write the modified content to a new file.情况:我必须读取一个大文本文件(> 100k 行)并对每一行进行一些修改(删除空格、检查可选命令等)并将修改后的内容写入新文件。

I have tried two possibilities to write to the file and get only one of the two following results:我尝试了两种写入文件的可能性,但只得到以下两种结果之一:

  1. The write process is horribly slow, but all lines are processed写入过程非常缓慢,但所有行都已处理
  2. Several chunks of lines are getting munched during the writing process, leaving an incomplete modified result.在写入过程中,几行代码被咀嚼,留下不完整的修改结果。

Approaches and results:方法和结果:

  1. Horribly slow but complete非常缓慢但完整
// read file content and put it in List<String> fileContent
for (String line : fileContent)
{
  try(BufferedWriter writer = new BufferedWriter(new OutputStreamwriter(new FileOutputStream(filename, true))))
    {
      writer.write(modifyFileContent(fileContent));
    }
}

I already know, opening a file to write one line and closing it directly is very good at underperforming.我已经知道,打开一个文件写一行然后直接关闭它非常擅长表现不佳。 A modification of a file with around 4M lines takes around 4h or so, which is not desireable.修改一个大约有 4M 行的文件需要大约 4 小时左右,这是不可取的。 At least, it works...至少,它有效......

  1. Faster, but incomplete write更快但不完整的写入
// read file content and put it in List<String> fileContent
// This is placed in a try/catch block, I'm omitting it here for brevity
BufferedWriter writer = new BufferedWriter(new OutputStreamwriter(new FileOutputStream(filename, true);
for (String line : fileContent)
{
  writer.write(modifyFileContent(fileContent));
}
writer.close();

This works faster, but I get following content in the result file (I use the line number from the original file for this debug purpose):这工作得更快,但我在结果文件中得到以下内容(我使用原始文件中的行号进行调试):

...
Very long line with interesting content // line nb 567
Very long line with interesting content // line nb 568
Very long line with interesting content // line nb 569
Very long line wi
Very long line with interesting content // line nb 834
Very long line with interesting content // line nb 835
Very long line with interesting content // line nb 836
...

When printing this strings to the console, I see no gaps in the line numbering!将此字符串打印到控制台时,我在行号中看不到任何间隙! So it seems, there is somewhere a buffering issue...所以看起来,有一个缓冲问题......

Other approaches: I also tried the NIO version of newBufferedWriter, which also omitted several lines.其他方法:我也试过NIO版的newBufferedWriter,同样省略了几行。

Question: What am I missing here?问题:我在这里缺少什么? Is there a way, to get a good write performance with correctness here?有没有办法在这里获得正确的良好写入性能? The input files are usually in the area of several 100MB and Millions of lines... Any hints are much appreciated :)输入文件通常在几个 100MB 和数百万行的区域内......任何提示都非常感谢:)

[edit] [编辑]

Thanks to Sir Lopez I found a working solution.感谢洛佩兹爵士,我找到了一个可行的解决方案。 I never stumbled upon RandomAccessFile before...我以前从未偶然发现过RandomAccessFile ......

Now with this information, I guess I run into a race condition or something else thread related... As I started working with threads just recently, I guess, this could've be expected...现在有了这些信息,我想我遇到了竞争条件或其他与线程相关的问题......因为我最近才开始使用线程,我想,这本来可以预料的......

To give the proper view, I made a minimal example, which shows the context, in which my problem originally occured.为了给出正确的观点,我做了一个最小的例子,它显示了我的问题最初发生的上下文。 Any Feedback is welcome :) :欢迎任何反馈:) :

package minex;

import java.awt.EventQueue;
import java.awt.event.ActionEvent;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.OutputStreamWriter;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.swing.GroupLayout;
import static javax.swing.GroupLayout.Alignment.BASELINE;
import static javax.swing.GroupLayout.Alignment.LEADING;
import javax.swing.JButton;
import javax.swing.JFileChooser;
import javax.swing.JFrame;
import javax.swing.JProgressBar;
import javax.swing.SwingWorker;
import javax.swing.UIManager;
import javax.swing.WindowConstants;

/**
 * Read a file line by line, modify its content and write it to another file.
 * @author demo
 */
public class gui extends JFrame {

  /**
   * Back ground task, so the gui isn't blocked and the progress bar can be updated.
   */
  class fileConversionWorker extends SwingWorker<Integer, Double>
  {
    private final File file;
    
    public fileConversionWorker(File file)
    {
      this.file = file;
    }
 
    /**
     * Count the lines in the provided file. Needed to set the boundary
     * settings for the progress bar.
     * @param aFile File to read.
     * @return Number of lines present in aFile
     * @throws IOException 
     * @see quick and dirty taken from https://stackoverflow.com/a/1277955
     */
    private int countLines(File aFile) throws IOException {
    LineNumberReader reader = null;
    try {
        reader = new LineNumberReader(new FileReader(aFile));
        while ((reader.readLine()) != null);
        return reader.getLineNumber();
    } catch (Exception ex) {
        return -1;
    } finally { 
        if(reader != null) 
            reader.close();
    }
}
    
    /**
     * Reads a file line by line, modify the line
     * content and write it back to a different file immediately.
     * @return 
     */
    @Override
    public Integer doInBackground()
    {
      int totalLines = 0;
      try {
        // Indicate, that something is happening
        barProgress.setIndeterminate(true);
        totalLines = countLines(file);
        barProgress.setIndeterminate(false);
      } catch (IOException ex) {
        Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
      }
      
      // only proceed, when we at least have 1 line to manipulate.
      if (totalLines > 0)
      {
        BufferedReader br = null;
        BufferedWriter writer = null;
        try {
          barProgress.setMaximum(totalLines);
          br = new BufferedReader(new FileReader(file));
          String filename =  file.getAbsolutePath() + ".mod";
          long lineNb = 0;
          
          writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filename, true)));
          
          String line;
          // Read original file, modify line and immediately write to new file
          while ((line = br.readLine()) != null)
          {
            writer.write(line + " // " + lineNb);
            writer.newLine();

            publish((double)(lineNb / totalLines));
            lineNb++;
          }
        } catch (FileNotFoundException ex) {
          Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
        } catch ( IOException ex) {
          Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
        }
        finally {
          // Tidying up
          try {
            if (br != null)
              br.close();
            if (writer != null)
              writer.close();
          } catch (IOException ex) {
            Logger.getLogger(gui.class.getName()).log(Level.SEVERE, null, ex);
          }
        }
      }
      return 0;
    }
    
    /**
     * Prevent any interaction, which could interrupt the worker
     */
    @Override 
    public void done()
    {
      butLoadFile.setEnabled(true); 
    }
    
    /**
     * Update progress the progress bar,
     * @param aDoubles
     */
    @Override
    protected void process(java.util.List<Double> aDoubles) {    
      int amount = barProgress.getMaximum() - barProgress.getMinimum();
      barProgress.setValue( ( int ) (barProgress.getMinimum() + ( amount * aDoubles.get( aDoubles.size() - 1 ))) );
    }
  }
  
  /**
   * Start the gui.
   */
  public static void main()
  {
    EventQueue.invokeLater(() -> {
      new gui().setVisible(true);
    });
  }
  
  /**
   * Initialize all things needed.
   */
  public gui()
  {
    initComponents();
  }
  
  /**
   * Load a file and immediately begin processing it.
   * @param evt 
   */
  private void butLoadFileActionListener(ActionEvent evt)
  {
    javax.swing.JFileChooser fc = new javax.swing.JFileChooser("/home/demo/fileFolder");
    int returnVal = fc.showOpenDialog(gui.this);
    
    if (returnVal == JFileChooser.APPROVE_OPTION) {
      File file = fc.getSelectedFile();
      butLoadFile.setEnabled(false);
      fileConversionWorker worker = new fileConversionWorker(file);
      worker.execute();
    }
  }
  
  /**
   * Paint the canvas.
   */
  private void initComponents()
  {
    setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);
    setResizable(false);
    setTitle("Min Example");
    
    butLoadFile = new JButton("Load file");
    butLoadFile.addActionListener((ActionEvent evt) -> {
      butLoadFileActionListener(evt);
    });
    
    barProgress = new JProgressBar();
    barProgress.setStringPainted(true);
    barProgress.setMinimum(0);
    
    javax.swing.GroupLayout layout = new GroupLayout(getContentPane());
    getContentPane().setLayout(layout);
    
    layout.setHorizontalGroup(
    layout.createParallelGroup(LEADING)
            .addComponent(butLoadFile, GroupLayout.PREFERRED_SIZE, 200, GroupLayout.PREFERRED_SIZE)
            .addComponent(barProgress, GroupLayout.PREFERRED_SIZE, 200, GroupLayout.PREFERRED_SIZE)
    );

    layout.setVerticalGroup(
    layout.createParallelGroup(BASELINE)
            .addGroup(layout.createSequentialGroup()
            .addComponent(butLoadFile, GroupLayout.PREFERRED_SIZE, 20, GroupLayout.PREFERRED_SIZE)
            .addComponent(barProgress, GroupLayout.PREFERRED_SIZE, 20, GroupLayout.PREFERRED_SIZE)            
            )
    );
    
    pack();
  }
  
  private JButton butLoadFile;        /** Button to load a file. */
  private JProgressBar barProgress;   /** Progress bar to visualize progress. */  
}

[/edit] [/编辑]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM