简体   繁体   English

如何将两个一维列表组合成二维数组?

[英]How to combine two 1D lists into 2D array?

I am trying to read mzXML files using Pyteomics' mzxml class. 我正在尝试使用Pyteomics的mzxml类读取mzXML文件。 The elements that I need to access are in numpy.ndarray format, which I convert as lists. 我需要访问的元素为numpy.ndarray格式,我将其转换为列表。 The mzXML files contain several columns with lists as values. mzXML文件包含几列,并以列表作为值。 The main objective is to combine the two lists into 2D array (side by side in column-wise) so that I can save as CSV files. 主要目标是将两个列表组合成2D数组(逐列并排),以便我可以另存为CSV文件。

I tried using np.concatenate((mzplist, mzplist2), axis=1) , which produced axis=1 error saying that axis=1 is out of bounds for 1D arrays. 我尝试使用np.concatenate((mzplist, mzplist2), axis=1) ,这产生了axis=1错误,表明axis=1超出了一维数组的范围。 I also tried using hstack , column_stack . 我也尝试使用hstackcolumn_stack The closest I got was from column_stack (code below) but the resulting array was 1D when I viewed the resulting CSV files (each cell of Excel contains m/z value and intensity value separated by a space). 我最接近的是column_stack (下面的代码),但是当我查看生成的CSV文件时,生成的数组为1D(Excel的每个单元格均包含m / z值和强度值,并用空格分隔)。

plist = []

for files in os.listdir(full_path):
    filename = os.path.basename(files)
    with mzxml.read(full_path + '\\' + filename) as reader:
        for line in reader:
            mzplist = line['m/z array'].tolist()
            mzplist2 = line['intensity array'].tolist()
            print(type(mzplist))
            mzplist = np.column_stack([mzplist, mzplist2])
            #mzplist.columns = ['mass', 'Intensity']
            np.savetxt(newfolder + '\\' + filename + '.csv', mzplist) 
            plist = []
            mzplist = []
            mzplist2 = []

Expected results for mzplist : mzplist预期结果:

 Mass       Intensity
  1            2
  3            4
  5            6

Here line['m/z array'].tolist() yields a list [1, 3, 5, ...] , and line['intensity array'].tolist() yields a list [2, 4, 6, ...] . 在这里, line['m/z array'].tolist()产生一个列表[1, 3, 5, ...] ,而line['intensity array'].tolist()产生一个列表[2, 4, 6, ...]

Am I missing something? 我想念什么吗?

each cell of Excel contains m/z value and intensity value separated by a space Excel的每个单元格均包含以空格分隔的m / z值和强度值

I suspect problem source is that line 我怀疑问题源是那条线

np.savetxt(newfolder + '\\' + filename + '.csv', mzplist)

as space is default delimiter for np.savetxt ( as documentation say ), try to replace that line with 由于空格是np.savetxt默认定界符( 如文档所述 ),请尝试将该行替换为

np.savetxt(newfolder + '\\' + filename + '.csv', mzplist, delimiter=',')

and check if that would help. 并检查是否有帮助。

With 2 lists as you describe: 有2个您描述的清单:

In [39]: alist=[1,3,5,7]; blist=[2,4,6,8]

A natural way to combine them into an array is: 将它们组合成数组的自然方法是:

In [40]: arr = np.array((alist, blist))
In [41]: arr
Out[41]: 
array([[1, 3, 5, 7],
       [2, 4, 6, 8]])

Transpose of that array looks like: 该数组的转置看起来像:

In [42]: arr.T
Out[42]: 
array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

Which we can write with savetxt as: 我们可以用savetxt编写为:

In [44]: np.savetxt('foo.txt', arr.T, fmt='%5d')
In [45]: cat foo.txt
    1     2
    3     4
    5     6
    7     8

column_stack and c_ will produce the same array. column_stackc_将产生相同的数组。

You can add a ',' delimiter if that is what your external reader demands. 如果外部阅读器需要,则可以添加“,”定界符。

Do you know how to read the output of a savetxt write as plain text? 您知道如何以纯文本格式读取savetxt的输出吗? I'm using the bash shell cat . 我正在使用bashcat

When people have problems reading and writing csv files we usually ask for samples, so we can reproduce the problem. 当人们在读写csv文件时遇到问题时,我们通常会要求提供示例,因此我们可以重现该问题。 If needed a sample of intermediate arrays (such as the output of the column_stack ) may help. 如果需要的话,中间数组的样本(例如column_stack的输出)可能会有所帮助。 Otherwise we are left guessing as to what the problem is. 否则,我们只能猜测问题出在哪里。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM