I need to read ints from large csv and then do specific sums with them. Currently I have algorithm that:
String csvFile = "D:/input.csv";
String line = "";
String cvsSplitBy = ";";
Vector<Int[]> converted = new Vector<Int[]>();
try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
while ((line = br.readLine()) != null) {
String[] a = line.split(";",-1);
int[] b = new int[a.length];
for (int n = 0, n < a.length(), n++){
b[n] = Integer.parseInt(a[n]);
}
converted.add(b);
}
}
catch (IOException e) {
e.printStackTrace();
}
int x = 7;
int y = 5;
int sum = 0;
for (int m = 0; m < converted.size(); m++){
for (n = 0, n < x, n++){
sum = sum + converted.get(m)[n];
}
System.out.print(sum + " ");
for (int n = x + y, n < converted.get(m).length, n = n + y){
sum = 0;
for (int o = n -y; o < n; o++)
sum = sum + converted.get(m)[n];
}
System.out.print(sum + " ");
}
System.out.println("");
}
What I tried to do, is to get sum of first x members of a csv row, and then sum of x members every +y. (in this case sum of first x - 7(sum of 0-6), then sum of next x - 7, but y - 5 columns later(sum of 5-11), (sum of 10-16)... and write them, for every row.(in the end collecting line number with greatest (sum of 0-6), (sum of 5-11).., so final result should be for example 5,9,13,155..., which would mean line 5 had the greatest sum of 0-6, line 9 greatest sum of 5-11... ) As you can see, this is a quite inefficient way. First I've read whole csv into string[], then to int[] and saved to Vector. Then I created quite inefficient loop to do the work. I need this to run as fast as possible, as i will be using very large csv with lot of different x and y. What I was thinking about, but don't know how to do it is:
How can I do this as fast as possible? Thank you
As the sums are per line, you do not need to first read all in memory.
Path csvFile = Paths.get("D:/input.csv");
try (BufferedReader br = Files.newBufferedReader(csvFile, StandardCharsets.ISO_8859_1)) {
String line;
while ((line = br.readLine()) != null) {
int[] b = lineToInts(line);
int n = b.length;
// Sum while reading:
int sum = 0;
for (int i = 0; i < 7; ++i) {
sum += b[i];
}
System.out.print(sum + " ");
sum = 0;
for (int i = n - 5; i < n; ++i) {
sum += b[i];
}
System.out.print(sum + " ");
System.out.println();
}
}
private static int[] lineToInts(String line) {
// Using split is slow, one could optimize the implementation.
String[] a = line.split(";", -1);
int[] b = new int[a.length];
for (int n = 0, n < a.length(), n++){
b[n] = Integer.parseInt(a[n]);
}
return b;
}
A faster version:
private static int[] lineToInts(String line) {
int semicolons = 0;
for (int i = 0; (i = line.indexOf(';', i)) != -1; ++i) {
++semicolons;
}
int[] b = new int[semicolons + 1];
int pos = 0;
for (int i = 0; i < b.length(); ++i) {
int pos2 = line.indexOf(';', pos);
if (pos2 < 0) {
pos2 = line.length();
}
b[i] = Integer.parseInt(line.substring(pos, pos2));
pos = pos2 + 1;
}
return b;
}
As an aside: Vector is old, better use List and ArrayList.
List<int[]> converted = new ArrayList<>(10_000);
Above the optional argument of initial capacity is given: ten thousand.
The weird try-with-resource syntax try (BufferedReader br = ...) {
ensures that br
is alway automatically closed. Even on exception or return.
Parallelism and after reformatting the question
You could read all lines
List<String> lines = Files.readAllLines(csvFile, StandardCharsets.ISO_8859_1);
And than play with parallel streams like:
OptionalInt max = lines.parallelStream()
.mapToInt(line -> {
int[] b = lineToInst(line);
...
return sum;
}).max();
or:
IntStream.range(0, lines.size()).parallel()
.mapToObj(i -> {
String line = lines.get(i);
...
return new int[] { i, sum5, sum7 };
});
You could probably try to create some of your sums while reading the input. Might also be feasible to use HashMaps of type Integer,Integer
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.