简体   繁体   中英

Combining and reading data from Excel (.xlsx) into Matlab

There are two parts of my query:

1) I have multiple .xlsx files stored in a folder, a total of 1 year's worth (~ 365 .xlsx files). They are named according to date: ' A_ddmmmyyyy.xlsx ' (eg A_01Jan2016.xlsx). Each .xlsx has 5 columns of data : Date, Quantity, Latitude, Longitude, Measurement . The problem is, each .xlsx file consists about 400,000 rows of data and although I have scripts in Excel to merge them, the inherent row restriction in Excel prevents me from merging all the data together.

(i) Is there a way to read recursively the data from each .xlsx sheet into MATLAB, and specifying the variable name (ie Date, Quantity etc) for each column(variable) within MATLAB (there are no column headings in the .xlsx files)?

(ii) How can I merge the data for each column from each .xlsx together?

Thank you Jefferson

Let's go by parts

First I do not recommend to join all your files data in one column, there is no need to have this information all together you can work separately with this, using for example datastore

working in matlab in mya directory:

>> pwd

ans =

/home/anquegi/learn/matlab/stackoverflow

I have a folder with a folder that have two sample excel files:

>> ls 
20_hz.jpg  big_data_store_analysis.m  excel_files  octave-workspace  sample-file.log
40_hz.jpg  chirp_signals.m        NewCode.m    sample.csv

>> ls excel_files/
A_01Jan2016.xlsx  A_02Jan2016.xlsx

the content of each file is :

Date    Quantity    Latitude    Longitude   Measurement 
1   1   1   1   1 
2   2   2   2   2 
3   3   3   3   3 
4   4   4   4   4 
5   5   5   5   5 
6   6   6   6   6 
7   7   7   7   7 
8   8   8   8   8 
9   9   9   9   9
10  10  10  10  10 
11  11  11  11  11 
12  12  12  12  12 
13  13  13  13  13
14  14  14  14  14 
15  15  15  15  15 
16  16  16  16  16 
17  17  17  17  17
18  18  18  18  18 
19  19  19  19  19 
20  20  20  20  20 
21  21  21  21  21
22  22  22  22  22

Only to who how it will work.

Reading the data:

>> ssds = spreadsheetDatastore('./excel_files')

ssds = 

  SpreadsheetDatastore with properties:

                      Files: {
                             '/home/anquegi/learn/matlab/stackoverflow/excel_files/A_01Jan2016.xlsx';
                             '/home/anquegi/learn/matlab/stackoverflow/excel_files/A_02Jan2016.xlsx'
                             }
                     Sheets: ''
                      Range: ''

  Sheet Format Properties:
             NumHeaderLines: 0
          ReadVariableNames: true
              VariableNames: {'Date', 'Quantity', 'Latitude' ... and 2 more}
              VariableTypes: {'double', 'double', 'double' ... and 2 more}

  Properties that control the table returned by preview, read, readall:
      SelectedVariableNames: {'Date', 'Quantity', 'Latitude' ... and 2 more}
      SelectedVariableTypes: {'double', 'double', 'double' ... and 2 more}
                   ReadSize: 'file'

Now you have all your data in tables let's see a preview

>> data = preview(ssds)

data = 

    Date    Quantity    Latitude    Longitude    Measurement
    ____    ________    ________    _________    ___________

    1       1           1           1            1          
    2       2           2           2            2          
    3       3           3           3            3          
    4       4           4           4            4          
    5       5           5           5            5          
    6       6           6           6            6          
    7       7           7           7            7          
    8       8           8           8            8    

The preview is a good point to get sample data to work.

You do not need to merge you can work throught all the elements:

>> ssds.VariableNames

ans = 

    'Date'    'Quantity'    'Latitude'    'Longitude'    'Measurement'

>> ssds.VariableTypes

ans = 

    'double'    'double'    'double'    'double'    'double'

% let's get all the Latitude elements that have Date equal 1, in this case the tow files are the same, so we wil get two elements with value 1

    >> reset(ssds)
    accum = [];
    while hasdata(ssds)
        T = read(ssds);
        accum(end +1) = T(T.Date == 1,:).Latitude;
    end
    >> accum

    accum =

         1     1

So you need to work with datastore and tables, is a bit tricky but very useful, you also would like to control the readsize and other variables in datastore objects. but this is a good way working with large data files in matlab

For older versions of matlab you can use a more traditional approximation:

folder='./excel_files';
filetype='*.xlsx';
f=fullfile(folder,filetype);
d=dir(f);
for k=1:numel(d);
  data{k}=xlsread(fullfile(folder,d(k).name));
end

Now you have the data stored in data

folder='./excel_files';
filetype='*.xlsx';
f=fullfile(folder,filetype);
d=dir(f);
for k=1:numel(d);
data{k}=xlsread(fullfile(folder,d(k).name));
end
data

data =

[22x5 double]    [22x5 double]

data{1}

ans =

 1     1     1     1     1
 2     2     2     2     2
 3     3     3     3     3
 4     4     4     4     4
 5     5     5     5     5
 6     6     6     6     6
 7     7     7     7     7
 8     8     8     8     8
 9     9     9     9     9
10    10    10    10    10
11    11    11    11    11
12    12    12    12    12
13    13    13    13    13
14    14    14    14    14
15    15    15    15    15
16    16    16    16    16
17    17    17    17    17
18    18    18    18    18
19    19    19    19    19
20    20    20    20    20
21    21    21    21    21
22    22    22    22    22

But be carefull with a lot of large file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM