[英]Parsing CSV files to arrays from very large sources in java
I have a parser that works fine on smaller files of approx. 我有一个解析器,可以在大约较小的文件上正常工作。 60000 lines or less but I have to parse a CSV file with over 10 million lines and this method just isn't working it hangs every 100 thousand lines for 10 seconds and I assume its the split method, Is there a faster way to parse data from a CSV to a string array?
60000行或更少,但我必须解析一个超过1000万行的CSV文件,这个方法不工作它每10万行挂起10秒钟我假设它的拆分方法,是否有更快的方法来解析数据从CSV到字符串数组?
Code in question: 有问题的代码:
String[][] events = new String[rows][columns];
Scanner sc = new Scanner(csvFileName);
int j = 0;
while (sc.hasNext()){
events[j] = sc.nextLine().split(",");
j++;
}
your code won't parse CSV files reliably. 您的代码不会可靠地解析CSV文件。 What if you had ',' or a line separator in a value?
如果您在值中有','或行分隔符怎么办? This is also very slow.
这也很慢。
Get uniVocity-parsers to parse your files. 获取uniVocity解析器来解析您的文件。 It is 3 times faster than Apache Commons CSV, has many more features and we use it to process files with billions of rows.
它比Apache Commons CSV快3倍,具有更多功能,我们用它来处理数十亿行的文件。
To parse all rows into a list of Strings: 要将所有行解析为字符串列表:
CsvParserSettings settings = new CsvParserSettings(); //lots of options here, check the documentation
CsvParser parser = new CsvParser(settings);
List<String[]> allRows = parser.parseAll(new FileReader(new File("path/to/input.csv")));
Disclosure: I am the author of this library. 披露:我是这个图书馆的作者。 It's open-source and free (Apache V2.0 license).
它是开源和免费的(Apache V2.0许可证)。
as a rule of thumb, using libraries is usually more efficient than in-house development. 根据经验,使用库通常比内部开发更有效。 There are several libraries that provide reading/parsing csv files.
有几个库提供读取/解析csv文件。 One of the more popular ones is Apache Commons CSV
其中一个比较受欢迎的是Apache Commons CSV
You might want to try a library I've just released: sesseltjonna-csv 您可能想尝试我刚刚发布的库: sesseltjonna-csv
It dynamically generates a CSV parser + databinding at runtime using ASM for improved performance. 它使用ASM在运行时动态生成CSV解析器+数据绑定,以提高性能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.