简体   繁体   中英

Casting python list to numpy array gives the wrong shape

I am reading data from a file, like so:

f = open('some/file/path')
data = f.read().split('\n')

Which gives me something like data = ['1 a #', '3 e &'] if the original file was

1 a #

3 e &

I need it in a form like

[['1','a','#'],['3','e','&']]

so that I can then do a np.swapaxes() on it and turn it into

[['1','3'],['a','e'],['#','&']]

But whenever I do do that, the swapaxes call fails, and it is because I am not ending up with a numpy array of the right shape. To turn the strings into lists of strings, I do:

for n in range(len(data)): data[n] = data[n].split()
data = np.array(data)

But when i check the shape:

np.shape(data)
>>>(2,)

So I cannot swap axes. I've tried doing the numpy array in a few different ways but everything seems to create a numpy array that doesn't know there is another dimension inside of the arrays within the array.

To turn data = ['1 a #', '3 e &'] into [['1','a','#'],['3','e','&']] you should do:

>>> data2 = []
>>> for line in data:
    data2.append(line.split())


>>> data2
[['1', 'a', '#'], ['3', 'e', '&']]

split the strings first:

import numpy as np
data = ['1 a #', '3 e &']
np.array([x.split() for x in data]).T

Your line split looks fine

In [110]: data = ['1 a #', '3 e &']

In [111]: for n in range(len(data)): data[n] = data[n].split()

In [112]: data
Out[112]: [['1', 'a', '#'], ['3', 'e', '&']]

In [113]: A=np.array(data)

In [114]: A
Out[114]: 
array([['1', 'a', '#'],
       ['3', 'e', '&']], 
      dtype='<U1')

In [115]: A.shape
Out[115]: (2, 3)

In [116]: A.T
Out[116]: 
array([['1', '3'],
       ['a', 'e'],
       ['#', '&']], 
      dtype='<U1')

In [117]: A.T.tolist()
Out[117]: [['1', '3'], ['a', 'e'], ['#', '&']]

I can 'transpose' a list of lists with zip as well:

In [119]: list(zip(*data))
Out[119]: [('1', '3'), ('a', 'e'), ('#', '&')]

The original list spliting can also be done with a list comprehension

In [120]: [i.split() for i in ['1 a #', '3 e &']]
Out[120]: [['1', 'a', '#'], ['3', 'e', '&']]

You could have combined the file read and splits with something like

[i.strip().split() for i in f.readlines()]

readlines returns a list of lines, but they still include the \\n , which strip removes. The other thing to watch out for is blank lines between the data lines

===================

In case it wasn't clear,

In [122]: data = ['1 a #', '3 e &']

In [123]: np.array(data)
Out[123]: 
array(['1 a #', '3 e &'], 
      dtype='<U5')

produces a 2 element array, where each element is a 5 character string. No amount of reshaping or transposing will convert this into an array of single element strings. You can reshape it into other 2 element arrays

In [124]: _.reshape(2,1)
Out[124]: 
array([['1 a #'],
       ['3 e &']], 
      dtype='<U5')

In [125]: __.reshape(1,2,1)
Out[125]: 
array([[['1 a #'],
        ['3 e &']]], 
      dtype='<U5')

I could view it as a single character array:

In [128]: A.view('<U1')
Out[128]: 
array(['1', ' ', 'a', ' ', '#', '3', ' ', 'e', ' ', '&'], 
      dtype='<U1')

In [129]: A.view('<U1').reshape(5,2)
Out[129]: 
array([['1', ' '],
       ['a', ' '],
       ['#', '3'],
       [' ', 'e'],
       [' ', '&']], 
      dtype='<U1')

but those blank characters get in the way.

There is also a library that applies string functions to arrays:

np.concatenate(np.char.split(A)).reshape(2,3)

读取文件(strip()将删除'\\ n'): filename="some/file/path" data=[i.strip().split(' ') for i in open(filename)] print(data)将列表转换为numpy数组并交换轴: import numpy as np print(np.asarray(data)) print(np.asarray(data).T)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM