简体   繁体   English

numpy数组维度不匹配

[英]numpy arrays dimension mismatch

I am using numpy and pandas to attempt to concatenate a number of heterogenous values into a single array. 我正在使用numpy和pandas来尝试将一些异构值连接成一个数组。

np.concatenate((tmp, id, freqs))

Here are the exact values: 以下是确切的值:

tmp = np.array([u'DNMT3A', u'p.M880V', u'chr2', 25457249], dtype=object)
freqs = np.array([0.022831050228310501], dtype=object)
id = "id_23728"

The dimensions of tmp , 17232 , and freqs are as follows: tmp17232freqs的尺寸如下:

[in]  tmp.shape
[out] (4,)
[in]  np.array(17232).shape
[out] ()
[in]  freqs.shape
[out] (1,)

I have also tried casting them all as numpy arrays to no avail. 我也尝试将它们全部作为numpy数组投射无济于事。

Although the variable freqs will frequently have more than one value. 虽然变量freqs经常会有多个值。

However, with both the np.concatenate and np.append functions I get the following error: 但是,使用np.concatenatenp.append函数时,我收到以下错误:

*** ValueError: all the input arrays must have same number of dimensions

These all have the same number of columns (0) , why can't I concatenate them with either of the above described numpy methods? 这些都具有相同数量的列(0) ,为什么我不能将它们与上述任何一种numpy方法连接起来?

All I'm looking to obtain is [(tmp), 17232, (freqs)] in one single dimensional array, which is to be appended onto the end of a pandas dataframe. 我想要获得的是一个单维数组中的[(tmp), 17232, (freqs)] ,它将被附加到pandas数据帧的末尾。

Thanks. 谢谢。

Update 更新

It appears I can concatenate the two existing arrays: 看来我可以连接两个现有的数组:

np.concatenate([tmp, freqs],axis=0)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 0.022831050228310501], dtype=object)

However, the integer, even when casted cannot be used in concatenate. 但是,整数,即使在连接时也不能用于连接。

np.concatenate([tmp, np.array(17571)],axis=0)
*** ValueError: all the input arrays must have same number of dimensions

What does work, however is nesting append and concatenate 什么工作,但嵌套追加和连接

np.concatenate((np.append(tmp, 17571), freqs),)
array([u'DNMT3A', u'p.M880V', u'chr2', 25457249, 17571,
       0.022831050228310501], dtype=object)

Although this is kind of messy. 虽然这有点乱。 Does anyone have a better solution for concatenating a number of heterogeneous arrays? 有没有人有更好的解决方案来连接多个异构数组?

The problem is that id , and later the integer np.array(17571) , are not an array_like object. 问题是id ,以及后来的integer np.array(17571) ,不是array_like对象。 See here how numpy decides whether an object can be converted automatically to a numpy array or not. 在这里看numpy如何决定一个对象是否可以自动转换为numpy数组。

The solution is to make id array_like , ie to be an element of a list or tuple , so that numpy understands that id belongs to a 1D array_like structure 解决方案是使id array_like ,即成为listtuple的元素,以便numpy理解id属于1D array_like结构

It all boils down to 这一切归结为

concatenate((tmp, (id,), freqs))

or 要么

concatenate((tmp, [id], freqs))

To avoid this sort of problems when dealing with input variables in functions using numpy , you can use atleast_1d , as pointed out by @askewchan. 为了避免在使用numpy处理函数中的输入变量时出现这种问题,可以使用atleast_1d ,如atleast_1d所指出的那样。 See about it this question/answer. 看看这个问题/答案。

Basically, if you are unsure if in different scenarios your variable id will be a single str or a list of str , you are better off using 基本上,如果您不确定在不同情况下您的变量id是单个str还是str列表,那么最好使用

concatenate((tmp, atleast_1d(id), freqs))

because the two options above will fail if id is already a list/tuple of strings. 因为如果id已经是字符串的列表/元组,则上面的两个选项将失败。

EDIT: It may not be obvious why np.array(17571) is not an array_like object. 编辑:为什么np.array(17571)不是array_like对象可能并不明显。 This happens because np.array(17571).shape==() , so it is not iterable as it has no dimensions. 这是因为np.array(17571).shape==() ,所以它不可迭代,因为它没有维度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM