简体   繁体   English

Function 检查二维数组中是否存在所有值都相等的列?

[英]Function that checks if there is a column in a 2D array in which all values are equal?

I have a 2D array that includes data about 14 days and the temperature changes every hour during each day (the matrix is 14x24= 336 data points).我有一个二维数组,其中包含大约 14 天的数据,并且每天的温度每小时都在变化(矩阵为 14x24= 336 个数据点)。 I would like to know if there is a function/command that checks if there is a column in the 2D array in which all values are equal?我想知道是否有一个函数/命令可以检查二维数组中是否存在所有值都相等的列? Thanks!谢谢!

An alternative could be to use the reduce method of the np.logical_and ufunc.另一种方法是使用 np.logical_and ufunc 的 reduce 方法。 Using the example array from Mark Setchell's answer.使用 Mark Setchell 的答案中的示例数组。

import numpy as np

arr = np.array([[10.92206418,  9.00678018,  5.        ,  6.83022007, 16.18869687], 
   ...:        [14.98451533,  2.04903653,  5.        , 12.49089931,  7.93300109], 
   ...:        [ 0.63397121,  5.27492337,  5.        , 10.70274734, 18.68862265], 
   ...:        [ 7.31692528, 17.98960002,  5.        , 13.94986875,  3.83450356], 
   ...:        [ 3.20441573, 11.31828108,  5.        , 12.7831887 ,  6.69083798], 
   ...:        [10.52480423, 14.99047775,  5.        , 12.18751519, 19.43634789], 
   ...:        [15.95100606, 17.74638291,  5.        ,  8.06684746,  8.06391555], 
   ...:        [14.91391738, 12.78786562,  5.        ,  7.57760045, 19.73240734], 
   ...:        [ 2.90594641, 15.00832554,  5.        ,  2.25471882,  2.3352564 ], 
   ...:        [ 7.05680473, 10.68381728,  5.        ,  8.9835386 ,  5.2305576 ], 
   ...:        [ 1.32183032,  3.5445554 ,  5.        , 15.68051617, 13.08684098], 
   ...:        [16.78607292, 12.07334951,  5.        , 16.97163501, 11.05617307], 
   ...:        [18.75894622, 13.1007517 ,  5.        ,  5.91909606,  1.02953968], 
   ...:        [14.00847642, 13.69674151,  5.        , 13.49089591,  9.30763748]])     

np.logical_and.reduce( arr[1:,:] == arr[:-1,:], axis = 0)                                                                            
# array([False, False,  True, False, False])

Breaking down the steps.分解台阶。

temp = arr[0] == arr[1:,:]   
# array([[False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False],
#        [False, False,  True, False, False]])

np.logical_and.reduce( temp, axis = 0 )  # Cumulatively ANDs each column.                                                                             
array([False, False,  True, False, False])

Or with floats use isclose instead of == to capture very nearly equal.或者使用浮点数使用isclose而不是==来捕获几乎相等的数据。

np.logical_and.reduce( np.isclose(arr[1:,:],arr[:-1,:]), axis = 0)                                                                   
Out[6]: array([False, False,  True, False, False])

you can try this你可以试试这个

temp_date = [
    [13, 19, 10, 18, 14, 12, 20, 12, 19, 17, 11, 12, 20, 11, 15, 19, 15, 11, 13, 19, 15, 12, 13, 13],
    [14, 18, 11, 16, 11, 17, 10, 16, 18, 10, 14, 10, 17, 11, 20, 18, 18, 14, 14, 10, 17, 11, 15, 12],
    [11, 20, 15, 19, 12, 18, 12, 19, 18, 15, 20, 20, 18, 10, 11, 13, 14, 12, 14, 12, 15, 13, 19, 14],
    [19, 11, 12, 19, 20, 14, 13, 16, 20, 20, 11, 18, 12, 19, 13, 14, 13, 11, 17, 20, 18, 14, 11, 18],
    [11, 14, 17, 14, 15, 18, 18, 13, 12, 16, 18, 11, 19, 20, 13, 16, 12, 20, 19, 15, 12, 15, 11, 15],
    [14, 18, 11, 11, 16, 17, 10, 13, 15, 18, 14, 19, 10, 12, 19, 16, 18, 18, 12, 12, 12, 14, 18, 11],   # this
    [10, 17, 15, 15, 18, 20, 16, 15, 19, 12, 19, 10, 16, 18, 12, 14, 14, 17, 12, 13, 13, 18, 11, 10],
    [15, 19, 11, 16, 15, 10, 11, 19, 20, 11, 10, 16, 11, 16, 18, 12, 20, 10, 20, 13, 14, 20, 19, 10],
    [15, 18, 19, 15, 20, 20, 17, 10, 18, 17, 17, 14, 13, 12, 20, 20, 10, 17, 16, 17, 20, 15, 20, 11],
    [17, 10, 19, 11, 19, 17, 19, 16, 13, 13, 10, 17, 12, 14, 19, 10, 13, 20, 19, 11, 12, 16, 16, 11],
    [11, 11, 11, 17, 20, 20, 11, 10, 19, 18, 16, 15, 19, 16, 19, 19, 12, 15, 19, 19, 20, 11, 19, 17],
    [19, 13, 11, 16, 16, 18, 12, 16, 20, 20, 13, 16, 19, 10, 11, 16, 14, 10, 17, 13, 14, 19, 19, 19],
    [14, 18, 11, 11, 16, 17, 10, 13, 15, 18, 14, 19, 10, 12, 19, 16, 18, 18, 12, 12, 12, 14, 18, 11],   # this
    [11, 13, 10, 17, 18, 19, 17, 16, 16, 13, 12, 18, 15, 16, 13, 13, 14, 10, 14, 12, 13, 13, 14, 18],
]

for i in range(14):
    for j in range(i + 1, 14):
        line_a = temp_date[i]
        line_b = temp_date[j]
        if line_a == line_b:
            print('EQUAL!!!  {} and {}'.format(i, j))

Generate some sample temperatures (a bit narrower than yours so we can see them), with one suspicious column生成一些样品温度(比您的稍窄,因此我们可以看到它们),其中包含一个可疑列

# Generate sample temperature and 1 constant column
t = np.random.ranf((14,5)) * 20
t[:, 2] = 5

Looks like this:看起来像这样:

array([[10.92206418,  9.00678018,  5.        ,  6.83022007, 16.18869687],
       [14.98451533,  2.04903653,  5.        , 12.49089931,  7.93300109],
       [ 0.63397121,  5.27492337,  5.        , 10.70274734, 18.68862265],
       [ 7.31692528, 17.98960002,  5.        , 13.94986875,  3.83450356],
       [ 3.20441573, 11.31828108,  5.        , 12.7831887 ,  6.69083798],
       [10.52480423, 14.99047775,  5.        , 12.18751519, 19.43634789],
       [15.95100606, 17.74638291,  5.        ,  8.06684746,  8.06391555],
       [14.91391738, 12.78786562,  5.        ,  7.57760045, 19.73240734],
       [ 2.90594641, 15.00832554,  5.        ,  2.25471882,  2.3352564 ],
       [ 7.05680473, 10.68381728,  5.        ,  8.9835386 ,  5.2305576 ],
       [ 1.32183032,  3.5445554 ,  5.        , 15.68051617, 13.08684098],
       [16.78607292, 12.07334951,  5.        , 16.97163501, 11.05617307],
       [18.75894622, 13.1007517 ,  5.        ,  5.91909606,  1.02953968],
       [14.00847642, 13.69674151,  5.        , 13.49089591,  9.30763748]])

Now look at the standard deviation down the columns - it will be zero where there is no variation:现在查看列下的标准偏差 - 在没有变化的情况下它将为零:

np.std(t, axis=0)

That looks like this:看起来像这样:

array([5.97455208, 4.72880646, 0.        , 3.97108072, 6.13620197])

Or, calculate the forward differences between each row and the one below:或者,计算每一行与以下行之间的前向差异:

d = t[1:,...] - t[:-1,...]

and take their absolute values:并取它们的绝对值:

np.abs(d)

That looks like this:看起来像这样:

array([[ 4.06245114,  6.95774365,  0.        ,  5.66067925,  8.25569578],
       [14.35054412,  3.22588684,  0.        ,  1.78815198, 10.75562156],
       [ 6.68295407, 12.71467666,  0.        ,  3.24712141, 14.85411909],
       [ 4.11250954,  6.67131894,  0.        ,  1.16668005,  2.85633441],
       [ 7.3203885 ,  3.67219667,  0.        ,  0.59567351, 12.74550991],
       [ 5.42620183,  2.75590516,  0.        ,  4.12066773, 11.37243234],
       [ 1.03708868,  4.9585173 ,  0.        ,  0.489247  , 11.6684918 ],
       [12.00797096,  2.22045993,  0.        ,  5.32288163, 17.39715094],
       [ 4.15085831,  4.32450827,  0.        ,  6.72881978,  2.8953012 ],
       [ 5.7349744 ,  7.13926187,  0.        ,  6.69697756,  7.85628339],
       [15.4642426 ,  8.52879411,  0.        ,  1.29111884,  2.03066791],
       [ 1.9728733 ,  1.02740219,  0.        , 11.05253894, 10.02663339],
       [ 4.75046981,  0.59598981,  0.        ,  7.57179985,  8.27809779]])

Now sum down the columns:现在总结列:

np.sum(np.abs(d), axis=0)

That looks like this:看起来像这样:

array([ 87.07352726,  64.79266138,   0.        ,  55.73235753,
   120.99233952])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM