简体   繁体   English

Python:将二维数组与具有不同值的 1 个公共列相结合

[英]Python: Combining 2D arrays with 1 common column that has different values

I want to combine two arrays which represent a curve where the variable is column 1, however the column 0 values do not always match:我想组合两个数组,它们表示变量为第 1 列的曲线,但是第 0 列的值并不总是匹配:

import numpy as np
arr1= np.array([(12,1003),(17,900),(20,810)])
arr2= np.array([(10,1020),(17,902),(19,870),(21,750)])

I want to combine these into one array where the column 0 is combined and both column 1s are stacked with gaps where there is no value for the corresponding column 0 value, something like this:我想将这些组合成一个数组,其中第 0 列被合并,并且第 1 列都堆叠有间隙,其中对应的第 0 列值没有值,如下所示:

arr3=np.array([((10,None,1020),(12,1003,None),(17,900,902),(19,None,870),(20,810,None),(21,None,750))])

The reason for this is that I want to be able to get mean values of the second column for each array but they are not at exactly the same column 0 value so the idea of creating this array is to then interpolate to replace all the None values, then create mean values from column 1 and 2 and have an extra column to represent that.这样做的原因是我希望能够获得每个数组的第二列的平均值,但它们不是完全相同的第 0 列值,因此创建此数组的想法是然后进行插值以替换所有 None 值,然后从第 1 列和第 2 列创建平均值,并有一个额外的列来表示。

I have used numPy for everything else so far but obviously have got stuck with the np.column_stack function as it needs lists of the same length and also will be blind to stacking based on values from column o.到目前为止,我已经将 numPy 用于其他所有内容,但显然已经被 np.column_stack 函数卡住了,因为它需要相同长度的列表,并且也无法根据 o 列的值进行堆叠。 Lastly I do not want to create a fit for the data as the actual data is non-linear and possibily not consistent so a fit will not work and interpolation seems like the most accurate method.最后,我不想为数据创建拟合,因为实际数据是非线性的,并且可能不一致,因此拟合不起作用,插值似乎是最准确的方法。

There may be an answer already but due to me not knowing how to describe it well I can't find it.可能已经有了答案,但由于我不知道如何描述它,所以我找不到它。 Also I am relatively new to python so please don't make any assumptions about my knowledge other than it is very little.另外,我对 python 比较陌生,所以请不要对我的知识做出任何假设,除非它非常少。

Thank you.谢谢你。

will this help ??这会有帮助吗??

import pandas
import numpy as np

arr1= np.array([(12,1003),(17,900),(20,810)])
arr2= np.array([(10,1020),(17,902),(19,870),(21,750)])

d1 = pandas.DataFrame(arr1)
d2 = pandas.DataFrame(arr2)

d1.columns = d2.columns  = ['t','v']
d3 =  pandas.DataFrame(np.array(d1.merge(d2, on='t',how='outer')))
print d3.values

# use d3.as_matrix() to convert to numpy array 

output输出

[[   12.  1003.    nan]
 [   17.   900.   902.]
 [   20.   810.    nan]
 [   10.    nan  1020.]
 [   19.    nan   870.]
 [   21.    nan   750.]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM