简体   繁体   中英

How do I accelerate reading large files in GNU Octave?

I'm importing a large CSV file into GNU Octave, doing some simple data manipulation and creating some plots. The file has about 6.5 million rows. I expected the process of file reading to take about two to three hours, because that's how long it usually takes to create a file this size in my experience. Added a status counter when it wasn't finishing and found that it was slowing down as it read; after 12 hours, only at line 1.5 million and moving at a crawl. According to Resource Monitor, though, no memory issues. Is there a more efficient way to read the code than what I have below? Do I need to do something special to allocate memory to the process so it doesn't slow down? This is the loop that's reading in the CSV. It's a while loop that scans the csv one line at a time, extracts the columns I need and ends when it reaches the first blank line:

% Process File
  F=1;
  while 1
    % Status Counter
        printf ("Status: %d \r", F);
        fflush (stdout);
        F=F+1;
    % Read first unread line
        line = fgetl(fileID);
    % Exit while loop if line is empty
        if ~ischar(line)
          break;
        endif
    % Translate Line
        Bank = textscan (line, '%f',  'Delimiter', ',');
        Bank = cell2mat (Bank);
        Bank = transpose (Bank);
    % Append Bank to Output
      Output = [Output; Bank(1, 1:9), Bank(1, 13:14), Bank(1, 20:21)];
  endwhile

This is the slow part:

Output = [Output; Bank(1, 1:9), Bank(1, 13:14), Bank(1, 20:21)];

What you do here is create a new matrix, copy Output and the new row into it, and assign it to Output . As Output becomes larger, the copy becomes increasingly expensive.

What you need to do is preallocate the output array. Always preallocate!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM