简体   繁体   English

以人类可读的格式将巨大的ArrayList写入文件

[英]Writing huge ArrayList to file in a human readable format

I have a program that processes quite a lot of sensor data from a sensor system. 我有一个程序可以处理来自传感器系统的大量传感器数据。 I'm currently looking into writing the output from my program to a text file so that I can check if it is processes properly from the program. 我目前正在考虑将程序的输出写入文本文件,以便可以检查程序是否正确处理了输出。

Right now I am writing a few identifiers before the ArrayList and then writing the ArrayList to the file using ArrayList.toString() . 现在,我在ArrayList之前写入一些标识符,然后使用ArrayList.toString()将ArrayList写入文件。

lineToWrite = identifer1 + ";" + identifier2 + ";" + ArrayList.toString()

The output file contains 21 lines in total, and the ArrayLists are from 100 items to 400.000 items large. 输出文件总共包含21行,并且ArrayList的大小在100到400.000之间。 Using the toString() method makes it impossible for any of the file editing programs I usually use to open the file and inspect them. 使用toString()方法使我通常用来打开文件并检查它们的任何文件编辑程序都不可能。

I thought of doing a small processing of the items in the ArrayList: 我想到了对ArrayList中的项目进行少量处理:

String lineToWrite = "";

String arrayListString = "\n";
for(String s : sensorLine){
    arrayListString += "\t" + s + "\n";
}

lineToWrite = identifer1 + ";" + identifier2 + ";" + arrayListString;

but it seems like this takes forever for some of the ArrayLists which are large enough. 但是对于一些足够大的ArrayList来说,这似乎要花很多时间。 Does anyone have a better/faster approach for doing this or know of a good file viewing program? 是否有人有更好/更快的方法来执行此操作,或者是否知道一个好的文件查看程序?

I have used the following, which don't have the following problems: 我使用了以下方法,它们没有以下问题:

  • Notepad++ -> Slow to open and laggy once fully opened 记事本++->完全打开后打开缓慢且缓慢
  • Sublime Text 3 -> Very slow to open! Sublime Text 3->打开速度非常慢!

As a small side note to the sensor data: I have in total 2.3 million sensor inputs. 作为传感器数据的一个小注释:我总共有230万个传感器输入。

EDIT1: 编辑1:

To extend the problem question I might have to add that it is the part of splitting the enormous array into a single string that proved to be a problem. 为了扩展问题,我可能不得不补充,这是将巨大的数组拆分为单个字符串的一部分,事实证明这是一个问题。 The program iterates very slowly over the array as it is just increasing the size of the arrayListString on every pass through and that takes up a lot of memory/processing power I guess. 程序在数组上的迭代非常慢,因为它每次arrayListString都会增加arrayListString的大小,我猜这会占用大量内存/处理能力。

EDIT2: 编辑2:

As for the writing method itself I am using a BufferedWriter() , with placeholders for the actual method variables: 至于编写方法本身,我使用的是BufferedWriter() ,它带有实际方法变量的占位符:

output = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filename, toAppend), "UTF-8"));

And for the actual writing I am using: 对于实际的写作,我使用的是:

output.append(line)
output.flush();

The problem is you're assembling a very large string into memory, and then writing it all at once, with lots of string manipulation to boot (leading to allocation of memory for each string). 问题是您将一个非常大的字符串组装到内存中,然后一次写入所有内容,并且要进行大量的字符串操作(导致为每个字符串分配内存)。

Instead, look into using a Stream. 而是考虑使用Stream。 Use a Writer, and you can iterate the array and append to a file as you go, will be much faster. 使用Writer,您可以迭代数组并随时将其追加到文件中,将会更快。

Here's a good tutorial on the basics: http://www.tutorialspoint.com/java/java_files_io.htm 这是一个很好的基础知识教程: http : //www.tutorialspoint.com/java/java_files_io.htm

As to the editor issue, most editors either load the entire file into memory or load it in chunks of lines or bytes. 至于编辑器问题,大多数编辑器要么将整个文件加载到内存中,要么以行或字节的块形式加载。 If you have huge lines, you may want to revisit your format. 如果行数很大,则可能需要重新查看格式。

I think you will have to split your data into chunks and load into editor when needed.Here a good answer. 我认为您将需要将数据拆分为多个块,然后在需要时加载到编辑器中。 How to read Text File of about 2 GB? 如何读取大约2 GB的文本文件?

Dump the data into a database . 将数据转储到数据库中

Then you can do interesting things like select the numbers 1000 - 1100, or search values, do avg/min/max. 然后,您可以做一些有趣的事情,例如选择数字1000-1100,或搜索值,进行平均值/最小值/最大值。 In a database client like Toad. 在像Toad这样的数据库客户端中。

The SQL query language should not be a problem. SQL查询语言应该没有问题。 A client also not. 客户也没有。

Java has embedded, standalone databases; Java具有嵌入式独立数据库。 H2 might suffice. H2可能就足够了。

For some odd reason, nearly all text editors hare horribly slow when you have long lines . 由于某些奇怪的原因,当您的行很长时,几乎所有的文本编辑器的速度都非常慢。 Often you can easily edit a file with a million lines, but will encounter problems if the file contains a single line with 100000 characters. 通常,您可以轻松编辑一百万行的文件,但是如果文件包含一行包含100000个字符的文件,则会遇到问题。

Regarding the performance of writing a file, there are several trade-offs. 关于写入文件的性能,需要权衡一些。

It is generally beneficial for performance to write "larger blocks of data". 编写“更大的数据块”通常对性能有利。 That is: When you want to write 1000 bytes, you should write these 1000 bytes at once, and not one by one. 也就是说:当您要写入1000个字节时,应该一次写入这1000个字节,而不是一个接一个地写入。 But in this case, you are attempting to build a really huge block of data by assembling a huge string. 但是在这种情况下,您尝试通过组装巨大的字符串来构建非常大的数据块。 This may strike back and decrease the performance, becase assembling this string may be expensive due to the many string concatenations. 这可能会反击并降低性能,因为由于许多字符串串联而组装此字符串可能会很昂贵。

As Taylor pointed out in his answer , writing the file line-by-line is likely a reasonable trade-off here: The chunks are then still large enough to compensate for the efforts of the write operation in general, and still small enough to avoid string concatenation overheads. 正如泰勒(Taylor)在回答中指出的那样,在此处逐行写入文件可能是一个合理的权衡:这样,块仍然足够大以补偿一般的写操作,并且还足够小以避免字符串串联开销。

As an example: The time for writing 1 Million lines with a BufferedWriter should hardly be measurable: 例如:用BufferedWriter写入100万行的时间几乎无法测量:

import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class ArrayListToFile
{
    public static void main(String[] args) throws IOException
    {
        List<String> sensorLine = new ArrayList<String>();
        int size = 1000000;
        Random random = new Random(0);
        for (int i=0; i<size; i++)
        {
            sensorLine.add(String.valueOf(random.nextDouble()));
        }

        write("out.txt", sensorLine);
    }

    private static void write(String fileName, Iterable<?> elements)
        throws IOException
    {
        try (BufferedWriter bw = new BufferedWriter(
            new OutputStreamWriter(new FileOutputStream(fileName))))
        {
            String identifier1 = "i1";
            String identifier2 = "i2";

            bw.write(identifier1 + ";" + identifier2 + ";\n");

            for (Object s : elements)
            {
                bw.write("\t" + s + "\n");
            }
        }
    }
}

In the end I found a solution. 最后,我找到了解决方案。

I used a StringBuilder to surpass the problem of writing a huge string to the file. 我使用StringBuilder来解决将巨大的字符串写入文件的问题。 The approach is as follows: 方法如下:

StringBuilder sb = new StringBuilder();
for(String s : arrayList){
    sb.append("\t" + s + "\n"
}

String line = identifier1 + ";" + identfier2 + ";" + sb.toString();

And for the editor Sublime Text 3 didn't seem to mind too much as long as the lines weren't 400.000 characters long 对于编辑器Sublime Text 3似乎并不介意,只要行的长度不超过400.000个字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM