How do I accelerate reading large files in GNU Octave?

Question

I'm importing a large CSV file into GNU Octave, doing some simple data manipulation and creating some plots. The file has about 6.5 million rows. I expected the process of file reading to take about two to three hours, because that's how long it usually takes to create a file this size in my experience. Added a status counter when it wasn't finishing and found that it was slowing down as it read; after 12 hours, only at line 1.5 million and moving at a crawl. According to Resource Monitor, though, no memory issues. Is there a more efficient way to read the code than what I have below? Do I need to do something special to allocate memory to the process so it doesn't slow down? This is the loop that's reading in the CSV. It's a while loop that scans the csv one line at a time, extracts the columns I need and ends when it reaches the first blank line:

% Process File
  F=1;
  while 1
    % Status Counter
        printf ("Status: %d \r", F);
        fflush (stdout);
        F=F+1;
    % Read first unread line
        line = fgetl(fileID);
    % Exit while loop if line is empty
        if ~ischar(line)
          break;
        endif
    % Translate Line
        Bank = textscan (line, '%f',  'Delimiter', ',');
        Bank = cell2mat (Bank);
        Bank = transpose (Bank);
    % Append Bank to Output
      Output = [Output; Bank(1, 1:9), Bank(1, 13:14), Bank(1, 20:21)];
  endwhile

Answer 1

This is the slow part:

Output = [Output; Bank(1, 1:9), Bank(1, 13:14), Bank(1, 20:21)];

What you do here is create a new matrix, copy Output and the new row into it, and assign it to Output . As Output becomes larger, the copy becomes increasingly expensive.

What you need to do is preallocate the output array. Always preallocate!

How do I accelerate reading large files in GNU Octave?

Question

1 answers

solution1
3 ACCPTED 2020-06-20 16:14:36

How do I accelerate reading large files in GNU Octave?

Question

1 answers

solution1 3 ACCPTED 2020-06-20 16:14:36

solution1
3 ACCPTED 2020-06-20 16:14:36