[英]How to convert a big csv file into json array quickly in java
I want to convert a big csv file like 20000 to 50000 record file into json array but it takes nearly 1 min to convert in is there any way to achieve it in less then 5 sec. 我想将像20000到50000记录文件这样的大csv文件转换为json数组,但是要转换将近1分钟,有没有办法在不到5秒的时间内实现它。
ResourceBundle rb = ResourceBundle.getBundle("settings");
String path = rb.getString("fileandfolder.Path");
System.out.println(path + "ssdd");
String csvPath = request.getParameter("DP") != null ? request
.getParameter("DP").toString() : "";
String orname = path + csvPath;
File file = new File(orname);
FileReader fin = new FileReader(file); //Read file one by one
BufferedReader bi = new BufferedReader(fin);
int res;
String csv = "";
while ((res = fin.read()) != -1) {
csv = csv + ((char) res); //Converted int to char and stored in csv
}
long start3 = System.nanoTime();
JSONArray array = CDL.toJSONArray(csv);
String Csvs = array.toString();
long time3 = System.nanoTime() - start3;
System.out
.printf("Took %.3f seconds to convert to a %d MB file, rate: %.1f MB/s%n",
time3 / 1e9, file.length() >> 20, file.length()
* 1000.0 / time3);
Try 尝试
StringBuilder sb = new StringBuilder();
while ((res = fin.read()) != -1) {
sb.append((char) res); //Converted int to char and stored in csv
}
String csv = sb.toString();
Concating strings using + is slow, you should use StringBuilfer or StringBuffer 使用+连接字符串很慢,应该使用StringBuilfer或StringBuffer
There are two glaring performance problems in your code, both of them in this snippet: 您的代码中存在两个明显的性能问题,在此代码段中都存在:
while ((res = fin.read()) != -1) {
csv = csv + ((char) res);
}
First problem: fin
is an unbuffered FileReader
, so each read()
call is actually doing a system call. 第一个问题:
fin
是一个无缓冲的FileReader
,因此每个read()
调用实际上都是在进行系统调用。 Each syscall is hundreds or even thousands of instructions. 每个系统调用都是数百甚至数千条指令。 And you are doing that for each and every character in the input file.
您正在为输入文件中的每个字符执行此操作。
Remedy: Read from bi
rather than from fin
. 补救措施:从
bi
而不是fin
读取。 (That's what you created it for ... presumably.) (大概就是您为之创建的。)
Second problem: each time you execute csv = csv + ((char) res);
第二个问题:每次执行
csv = csv + ((char) res);
you are creating a new String that is one character longer than the previous one. 您正在创建一个新字符串,该字符串比上一个字符长一个字符。 If you have
N
characters in your input file, you end up copying roughly N^2
characters to build the string. 如果输入文件中包含
N
字符,则最终将复制大约N^2
字符以构建字符串。
Remedy: Instead of concatenating Strings, use a StringBuilder ... like this: 补救措施:代替串联字符串,使用StringBuilder ...像这样:
StringBuilder sb = new StringBuilder();
....
sb.append((char) res);
....
String csv = sb.toString();
At this point, it is not clear to me if there is also a performance problem in converting the csv
string to JSON; 至此,尚不清楚将
csv
字符串转换为JSON时是否还有性能问题。 ie in this snippet. 即在此片段中。
JSONArray array = CDL.toJSONArray(csv);
String Csvs = array.toString();
Unfortunately, we don't know what JSONArray
and CDL
classes you are actually using. 不幸的是,我们不知道您实际使用的是什么
JSONArray
和CDL
类。 Hence, it is difficult to say why they are slow, or whether there is a faster way to do the conversion. 因此,很难说出它们为什么很慢,或者是否有更快的转换方法。 (But I suspect, that the biggest performance problems are in the earlier snippet.)
(但是我怀疑最大的性能问题出在前面的代码段中。)
This csv = csv + ((char) res)
is very slow, you are reading one char at a time, then allocating a new string with the old string and the new char. 这个
csv = csv + ((char) res)
非常慢,您一次读取一个char,然后分配一个包含旧字符串和新char的新字符串。
To load all text from a file into a string do this: 要将文件中的所有文本加载到字符串中,请执行以下操作:
static String readFile(String path, Charset encoding)
throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
(from https://stackoverflow.com/a/326440/360211 , note there is a cleaner way if using java 7) (来自https://stackoverflow.com/a/326440/360211 ,请注意,如果使用的是Java 7,则有一种更简洁的方法)
Use like this instead of loop: 像这样使用而不是循环:
String csv = readFile(orname, StandardCharsets.UTF_8);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.