简体繁体 English

在Java中对CSV文件进行读写操作的代价是多少？

[英]how costly(Time) are read and write operations on csv file in java?

原文 2015-09-22 16:16:49 9 2 java/ csv

I am writing a software which has a part dealing with read and write operaions. 我正在写一个包含读写操作的软件。 I am wondering how costly these operations are on a csv file. 我想知道这些操作在csv文件上的代价是多少。 Is there are any other file formats that consume less time? 还有其他消耗更少时间的文件格式吗？ Because I have to do write and read on csv files at the end of every cycle. 因为我必须在每个周期结束时对csv文件进行读写操作。

2 个解决方案

Read and write operations depend on the file system, hardware, software configuration, memory, mermory setup and size of the file to read. 读取和写入操作取决于文件系统，硬件，软件配置，内存，内存设置以及要读取的文件大小。 But not on the format. 但格式不对。 A different problem related with this is the cost of parsing the file that surely must relative low as csv is very simple. 与此相关的另一个问题是解析文件的成本，因为csv非常简单，所以解析文件的成本肯定必须相对较低。

The point is that CSV is a good format for tables of data but not for nested data. 关键是，CSV是数据表的好格式，但不适用于嵌套数据。 If your data has a lot of nested information you can separate it into different csv files or you will have some information redundancy that will penalize your performance. 如果您的数据包含大量嵌套信息，则可以将其分成不同的csv文件，否则您将拥有一些信息冗余，从而降低性能。 But other formats might have other kind of redundancy. 但是其他格式可能具有其他类型的冗余。

And do not optimize prematurily. 并且不要优化过早。 If you are reading and writing from the file very frecuently this file will surely be kept on RAM. 如果您正在非常频繁地从文件中读取和写入文件，则该文件一定会保留在RAM中。 JSON or a zipped file might save size and be read faster but would have a higher parsing time and could be even slower at the end. JSON或压缩文件可能会节省大小，并且读取速度更快，但解析时间会更长，并且最终速度可能会更慢。 And the parsing time depends also on the implemenation of the library (Gson vs Jackson) and version. 解析时间还取决于库的实现（Gson vs Jackson）和版本。

It will be nice to know the reasons behind your problem to give better ansewrs. 很高兴知道您的问题背后的原因，以便给出更好的答案。

The cost of reading / writing to a CSV file, and whether it is suitable for your application, depend on the details of your use case. 读/写CSV文件的成本以及是否适合您的应用程序，取决于用例的详细信息。 Specifically, if you are simply reading from the beginning of the file and writing to the end of the file, then the CSV format is likely to work fine. 具体来说，如果您只是从文件的开头读取并在文件的末尾写入，则CSV格式可能会正常工作。 However, if you need to access particular records in the middle of your file then you probably wish to choose another format. 但是，如果您需要访问文件中间的特定记录，则可能希望选择其他格式。

The main issue with a CSV file is that it is not a good format choice for random access, since each record (row) is of variable size, so you cannot simply seek to a particular record offset in the file, and instead need to read every row (well, you could still jump and sample, but you cannot seek directly by record offset). CSV文件的主要问题在于，它不是随机访问的理想格式选择，因为每个记录（行）的大小都是可变的，因此您不能简单地在文件中寻找特定的记录偏移量，而需要读取每行（嗯，您仍然可以跳转和采样，但是不能直接通过记录偏移量进行搜索）。 Other formats with fixed sized records would allow you to seek directly to a particular record in the file, making updating of an entry in the middle of the file possible without needing to re-read and re-write the entire file. 具有固定大小记录的其他格式将使您可以直接查找文件中的特定记录，从而无需重新读取和重写整个文件就可以更新文件中间的条目。