简体   繁体   English

Java读取CSV +子数组的特定总和-最有效的方法

[英]java read csv + specific sum of subarray - most efficient way

I need to read ints from large csv and then do specific sums with them. 我需要从大型csv中读取整数,然后对它们进行特定的求和。 Currently I have algorithm that: 目前,我有以下算法:

String csvFile = "D:/input.csv";
String line = "";
String cvsSplitBy = ";";
Vector<Int[]> converted = new Vector<Int[]>();

try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {

   while ((line = br.readLine()) != null) {
       String[] a = line.split(";",-1);
       int[] b = new int[a.length]; 
       for (int n = 0, n < a.length(), n++){
          b[n] = Integer.parseInt(a[n]);
       }
       converted.add(b);
   }
} 

catch (IOException e) {
e.printStackTrace();
}

int x = 7;
int y = 5;
int sum = 0;    

for (int m = 0; m < converted.size(); m++){
  for (n = 0, n < x, n++){
      sum = sum + converted.get(m)[n];
  }
  System.out.print(sum + " ");



  for (int n = x + y, n < converted.get(m).length, n = n + y){
      sum = 0;
      for (int o = n -y; o < n; o++)
         sum = sum + converted.get(m)[n];
      }
      System.out.print(sum + " ");
  }
  System.out.println("");
}

What I tried to do, is to get sum of first x members of a csv row, and then sum of x members every +y. 我想做的是获取一个csv行的前x个成员的总和,然后每个+ y获得x个成员的总和。 (in this case sum of first x - 7(sum of 0-6), then sum of next x - 7, but y - 5 columns later(sum of 5-11), (sum of 10-16)... and write them, for every row.(in the end collecting line number with greatest (sum of 0-6), (sum of 5-11).., so final result should be for example 5,9,13,155..., which would mean line 5 had the greatest sum of 0-6, line 9 greatest sum of 5-11... ) As you can see, this is a quite inefficient way. First I've read whole csv into string[], then to int[] and saved to Vector. Then I created quite inefficient loop to do the work. I need this to run as fast as possible, as i will be using very large csv with lot of different x and y. What I was thinking about, but don't know how to do it is: (在这种情况下,第一个x-7的总和(0-6的总和),然后是下一个x-7的总和,但之后y-5列的总和(5-11的总和),(10-16的总和)...并将它们写成每一行。(在最后一个收集行号中,最大值(0-6之和),(5-11之和)。因此,最终结果应为例如5,9,13,155 ... ,这意味着第5行的最大和为0-6,第9行的最大和为5-11 ...)如您所见,这是一种效率很低的方法,首先,我将整个csv读入string [] ,然后放入int []并保存到Vector中。然后,我创建了一个效率很低的循环来完成工作。我需要使其尽可能快地运行,因为我将使用具有很多x和y的非常大的csv。在考虑,但不知道该怎么做:

  1. do these sums in the reading loop 在阅读循环中做这些总和
  2. do the sum differently, not always looping x members backward (either saving last sum and then subtract old and add new members, or other faster way to do subarray sum) 以不同的方式进行求和,并不总是向后循环x个成员(保存最后的总和,然后减去旧的成员并添加新的成员,或者以其他更快的方式进行子数组总和)
  3. use intStream and parallelism (parallel might be tricky as in the end i am looking for max ) 使用intStream和parallelism(并行可能会很棘手,因为最终我正在寻找max)
  4. use different input then csv? 使用不同的输入,然后CSV?
  5. all of the above? 上述所有的?

How can I do this as fast as possible? 我如何尽快做到这一点? Thank you 谢谢

As the sums are per line, you do not need to first read all in memory. 由于总和是每行,因此您无需先读取内存中的所有内容。

Path csvFile = Paths.get("D:/input.csv");
try (BufferedReader br = Files.newBufferedReader(csvFile, StandardCharsets.ISO_8859_1)) {

     String line;
     while ((line = br.readLine()) != null) {
         int[] b = lineToInts(line);
         int n = b.length; 

         // Sum while reading:
         int sum = 0;
         for (int i = 0; i < 7; ++i) {
             sum += b[i];
         }
         System.out.print(sum + " ");

         sum = 0;
         for (int i = n - 5; i < n; ++i) {
             sum += b[i];
         }
         System.out.print(sum + " ");

         System.out.println();
     }
}

private static int[] lineToInts(String line) {
     // Using split is slow, one could optimize the implementation.
     String[] a = line.split(";", -1);
     int[] b = new int[a.length]; 
     for (int n = 0, n < a.length(), n++){
         b[n] = Integer.parseInt(a[n]);
     }
     return b;
}

A faster version: 更快的版本:

private static int[] lineToInts(String line) {
    int semicolons = 0;
    for (int i = 0; (i = line.indexOf(';', i)) != -1; ++i) {
        ++semicolons;
    }
    int[] b = new int[semicolons + 1];
    int pos = 0;
    for (int i = 0; i < b.length(); ++i) {
        int pos2 = line.indexOf(';', pos);
        if (pos2 < 0) {
            pos2 = line.length();
        }
        b[i] = Integer.parseInt(line.substring(pos, pos2));
        pos = pos2 + 1;
    }
    return b;
}

As an aside: Vector is old, better use List and ArrayList. 顺便说一句:Vector很旧,最好使用List和ArrayList。

List<int[]> converted = new ArrayList<>(10_000);

Above the optional argument of initial capacity is given: ten thousand. 以上是初始容量的可选参数:万。

The weird try-with-resource syntax try (BufferedReader br = ...) { ensures that br is alway automatically closed. 奇怪的try-with-resource语法try (BufferedReader br = ...) {确保br总是自动关闭。 Even on exception or return. 即使有异常或返回。


Parallelism and after reformatting the question 并行性和重新格式化问题后

You could read all lines 您可以阅读所有行

List<String> lines = Files.readAllLines(csvFile, StandardCharsets.ISO_8859_1);

And than play with parallel streams like: 而不是像这样玩并行流:

OptionalInt max = lines.parallelStream()
    .mapToInt(line -> {
        int[] b = lineToInst(line);
        ...
        return sum;
    }).max();

or: 要么:

IntStream.range(0, lines.size()).parallel()
    .mapToObj(i -> {
        String line = lines.get(i);
        ...
        return new int[] { i, sum5, sum7 };
    }); 

You could probably try to create some of your sums while reading the input. 您可能会在读取输入时尝试创建一些总和。 Might also be feasible to use HashMaps of type Integer,Integer 使用Integer,Integer类型的HashMaps也是可行的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java-读取具有各种数据类型的CSV文件的最有效方法 - Java - Most efficient way to read in a CSV file with various data types Java:循环遍历 CSV 并为另一列中的每个唯一值求和的最有效方法 - Java: Most efficient way to loop through CSV and sum values of one column for each unique value in another Column 在Java中读取tcp流的最有效方法 - Most efficient way to read in a tcp stream in Java 用Java读取文件的最有效方法? - Most efficient way to read files in Java? 如何在Java / Android中以最有效的方式读取txt复杂哈希图 - How to read a txt complex hashmap in the most efficient way in Java/Android 在Java中将比特打包成byte []并将其读回来的最有效方法是什么? - What is the most efficient way in Java to pack bits into byte[] and read it back? 总结整数数组的最有效方法 - Most efficient way to sum up an array of integers 找到两个整数的最有效方法,这些整数在特定条件下总和为目标值 - Most efficient way to find two integers that sum up to target value with specific conditions 创建新的 ArrayList 并对 Kotlin (或 Java)中的相同项目求和的最有效方法是什么? - What is the most efficient way to create a new ArrayList and sum same items in Kotlin (or Java)? 读取大文件的最有效方法 - most efficient way to read huge file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM