简体   繁体   English

NumPy认为2-D数组是1-D

[英]NumPy thinks a 2-D array is 1-D

I have a NumPy array that is constructed from a text file. 我有一个从文本文件构造的NumPy数组。 I've been doing things this way for weeks and never seen this problem before. 我已经以这种方式进行了数周的工作,之前从未见过此问题。

print data
print data[:, 1:]

outputs 输出

[['1', '200', '300', '400', '500\n']
 ['3', '500', '400', '200', '1000\n']
 ['14', '900', '200', '300', '100\n'] ...,
 ['999142', '24', '21', '20', '12\n']]
Traceback (most recent call last):
File ...., line ..., in ....
print data[:, 1:]
IndexError:  too many indices

Why is this happening and how can I fix it? 为什么会发生这种情况,我该如何解决?

Edit: Big clue. 编辑:大线索。 data.shape is (3313869,) with no second value. data.shape(3313869,)没有第二个值。

data.ndim is 1 . data.ndim1

len(data[1]) , however, is 5. len(data[1])是5。

Edit, I am constructing it with 编辑,我用

data = [re.split(' ', line) for line in f]
f.close()
data = np.array(data)

When I interject 当我插话

f.close()
print data[0:10]

It gives ie 它给

[['1', '200', '300', '400', '500\\n'], ['3', .... ]]

The problem happened because your code is somehow creating a numpy.array of objects. 发生问题是因为您的代码以某种方式创建了对象的numpy.array See this question with a similar issue. 请参阅与此问题类似的问题。 When it happens you get something like: 发生这种情况时,您会得到类似以下内容的信息:

a = numpyp.array([list1, list2, list3, ... , listn], dtype=object)

It is a 1D array, but when you ask to print it will call the __str__ of each list inside, giving: 它是一维数组,但是当您要求打印时,它将调用内部每个列表的__str__ ,给出:

[[ 1, 2, 3, 4],
 [ 5, 6, 7, 8]]

which seems like a 2D array. 好像是二维数组

You can simulate it doing: 您可以模拟它:

a = ['aaa' for i in range(10)]
b = numpy.empty((5),dtype=object)
b.fill(a) 

lets check b : 让我们检查b

b.shape # (5,)
b.ndim  # 1

but print b gives: print b给出:

[['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
 ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
 ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
 ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']
 ['aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa', 'aaa']]

Quite tricky... 相当棘手...

I solved this with 我解决了

for line in data:
          if (len(line) != 5):
                  print len(line)
                  print line

A few of the lines in my data had spaces at the end, which was leading to 500 and \\n being separated into separate tokens. 我数据中的几行末尾有空格,这导致500\\n被分离为单独的令牌。 This snuck in because on Friday, the last time I messed with this code, I had added in a default option to the Python script that builds the input files for this script for rows that were missing a particular value, and Vim put in a space token on the line-wrap, which just happened to be on the character right before \\n . 之所以这样说是因为上周五,我上次弄乱了这段代码时,我在Python脚本中添加了一个默认选项,该选项为缺少特定值的行为此脚本构建输入文件,而Vim放在空格中换行符,恰好在\\n之前的字符上。

[re.split(' ', line.replace('\\n', '').rstrip()) for line in f] gives the desires result. [re.split(' ', line.replace('\\n', '').rstrip()) for line in f]给出期望的结果。

It is a little strange, I think, that NumPy treats the array as both 1-D and 2-D (allowing me to select data[1] as a row) but I guess if the rows aren't of consistent length it just sees it as an array of arrays rather than a 2-D array, making a distinction between the two. 我认为NumPy将数组同时视为一维和二维数组(允许我选择data[1]作为行)有点奇怪,但是我猜这些行的长度不是一致的,只是将其视为数组的数组而不是二维数组,从而在两者之间进行了区分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM