简体   繁体   中英

How to Import multiple CSV files then make a Master Table?

I am a research chemist and have carried out a measurement where I record 'signal intensity' vs 'mass-to-charge ( m/z )' . I have repeated this experiment 15x, by changing a specific parameter (Collision Energy). As a result, I have 15 CSV files and would like to align/join them within the same range of m/z values and same interval values. Due to the instrument thresholding rules, certain m/z values were not recorded, thus I have files that cannot simply be exported into excel and copy/pasted. The data looks a bit like the tables posted below

Dataset 1:  x  |  y          Dataset 2:   x  | y
           ---------                    ---------       
            0.0   5                      0.0   2
            0.5   3                      0.5   6
            2.0   7                      1.0   9
            3.0   1                      2.5   1
                                         3.0   4

Using matlab I started with this code:

%% Create a table for the set m/z range with an interval of 0.1 Da
mzrange = 50:0.1:620;
mzrange = mzrange';
mzrange = array2table(mzrange,'VariableNames',{'XThompsons'}); 

Then I manually imported 1 X/Y CSV (Xtitle=XThompson, Ytitle=YCounts) to align with the specified m/z range.

%% Join/merge the two tables using a common Key variable 'XThompson' (m/z value)
mzspectrum = outerjoin(mzrange,ReserpineCE00,'MergeKeys',true);

% Replace all NaN values with zero
mzspectrum.YCounts(isnan(mzspectrum.YCounts)) = 0;

At this point I am stuck because repeating this process with a separate file will overwrite my YCounts column. The title of the YCounts column doesnt matter to me as I can change it later, however I would like to have the table continue as such:

 XThompson | YCounts_1 | YCounts_2 | YCounts_3 | etc...
--------------------------------------------------------

How can I carry this out in Matlab so that this is at least semi-automated? I've had posted earlier describing a similar scenario but it turned out that it could not carry out what I need. I must admit that my mind is not of a programmer so I have been struggling with this problem quite a bit.

PS- Is this problem best executed in Matlab or Python?

I don't know or use matlab so my answer is pure python based. I think python and matlab should be equally well suited to read csv files and generate a master table.

Please consider this answer more as pointer to how to address the problem in python.

In python one would typically address this problem using the pandas package. This package provides "high-performance, easy-to-use data structures and data analysis tools" and can read natively a large set of file formats including CSV files. A master table from two CSV files "foo.csv" and "bar.csv" could be generated eg as follows:

import pandas as pd
df = pd.read_csv('foo.csv')
df2 = pd.read_csv('bar.cvs')

master_table = pd.concat([df, df2])

Pandas further allows to group and structure the data in many ways. The pandas documentation has very good descriptions of its various features.

One can install pandas with the python package installer pip :

sudo pip install pandas

if on Linux or OSX.

The counts from the different analyses should be named differently, ie, YCounts_1, YCounts_2, and YCounts_3 from analyses 1, 2, and 3, respectively, in the different datasets before joining them. However, the M/Z name (ie, XThompson) should be the same since this is the key that will be used to join the datasets. The code below is for MATLAB.

This step is not needed (just recreates your tables) and I copied dataset2 to create dataset3 for illustration. You could use 'readtable' to import your data ie, imported_data = readtable('filename');

  dataset1 = table([0.0; 0.5; 2.0; 3.0], [5; 3; 7; 1], 'VariableNames', {'XThompson', 'YCounts_1'});
  dataset2 = table([0.0; 0.5; 1.0; 2.5; 3.0], [2; 6; 9; 1; 4], 'VariableNames', {'XThompson', 'YCounts_2'});
  dataset3 = table([0.0; 0.5; 1.0; 2.5; 3.0], [2; 6; 9; 1; 4], 'VariableNames', {'XThompson', 'YCounts_3'});

Merge tables using outerjoin. You could use loop if you have many datasets.

  combined_dataset = outerjoin(dataset1,dataset2, 'MergeKeys', true);

Add dataset3 to the combined_dataset

  combined_dataset = outerjoin(combined_dataset,dataset3, 'MergeKeys', true);

You could export the combined data as Excel Sheet by using writetable

  writetable(combined_dataset, 'joined_icp_ms_data.xlsx');

I managed to create a solution to my problem based on learning through everyone's input and taking an online matlab courses. I am not a natural coder so my script is not as elegant as the geniuses here, but hopefully it is clear enough for other non-programming scientists to use.

Here's the result that works for me:

% Reads a directory containing *.csv files and corrects the x-axis to an evenly spaced (0.1 unit) interval.

% Create a matrix with the input x range then convert it to a table
prompt = 'Input recorded min/max data range separated by space \n(ex. 1 to 100 = 1 100): ';
inputrange = input(prompt,'s');
min_max = str2num(inputrange)
datarange = (min_max(1):0.1:min_max(2))';
datarange = array2table(datarange,'VariableNames',{'XAxis'});

files = dir('*.csv');
for q=1:length(files);

    % Extract each XY pair from the csvread cell and convert it to an array, then back to a table.
    data{q} = csvread(files(q).name,2,1); 
    data1 = data(q);
    data2 = cell2mat(data1);
    data3 = array2table(data2,'VariableNames',{'XAxis','YAxis'});

    % Join the datarange table and the intensity table to obtain an evenly spaced m/z range
    data3 = outerjoin(datarange,data3,'MergeKeys',true);
    data3.YAxis(isnan(data3.YAxis)) = 0;
    data3.XAxis = round(data3.XAxis,1);

    % Remove duplicate values
    data4 = sortrows(data3,[1 -2]);
    [~, idx] = unique(data4.XAxis);
    data4 = data4(idx,:);

    % Save the file as the same name in CSV without underscores or dashes
    filename = files(q).name;
    filename = strrep(filename,'_','');
    filename = strrep(filename,'-','');
    filename = strrep(filename,'.csv','');
    writetable(data4,filename,'FileType','text');
    clear data data1 data2 data3 data4 filename

end
clear

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM