合並NumPy數組並在Python中查找列

Question

我是Python的新手。 我有兩個CSV格式的數據文件。 我將CSV文件數據加載到兩個NumPy數組中：

matrix1 = numpy.genfromtxt(fileName1)
matrix2 = numpy.genfromtxt(fileName2)

兩個矩陣的行和列不相等。

>>print(matrix1.shape)
(971, 4413)
>>print(matrix2.shape)
>>(5504, 4431)

我想以這種方式組合matrix1和matrix2：

mergedMatrix = [ matrix1, matrix2 ]

哪里可以訪問matrix1從mergedMatrix使用指數0和matrix2使用索引1 。

我嘗試使用numpy.concatenate但不適用於這兩個矩陣。 所以我試圖用熊貓轉換后合並功能matrix1和matrix2到大熊貓DataFrames。 但是，這樣做花了很多時間，並且所有矩陣都合並到單個線性數組中，例如[1, 2, 3,4,5...] ，我沒有任何方法可以區分matrix1和matrix2中mergedMatrix 。

所以我正在使用：

#mergedMatrix as a list
mergedMatrix = [matrix1, matrix2]

我的數據包含Inf值。 如果一列在matrix1包含值Inf ，那么我想刪除該列以及對應的列，即在matrix2具有相同列號的列。

問題

有沒有比使用列表mergedMatrix更好的方法？
如何快速查找matrix1列是否包含此類值，而無需一一檢查每個元素及其列號？

例：

matrix1 = [[1, 2, 3],
           [3, inf,0],
           [2 , inf, inf]]
matrix2 = [[0, 4, 2, 7],
           [0, 1, 0.5, 3],
           [1, 2, 3, 9]]

mergedMatrix = [[1, 2, 3],
           [3, inf,0],
           [2 , inf, inf],
           [0, 4, 2, 7],
           [0, 1, 0.5, 3],
           [1, 2, 3, 9]]

結果應為：

mergedMatrix = [[1],
                [3],
                [2],
                [0,7],
                [0,3],
                [1,9]]

removedMatrixCols = [[2, 3],
               [inf,0],
               [inf, inf],
               [4, 2],
               [1, 0.5],
               [2, 3]]

然后，我想分割矩陣：

newMatrix1 = [[1],
              [3],
              [2]]
newMatrix2 = [[0,7],
              [0,3],
              [1,9]]

removedCols1 = [[2, 3],
                [inf,0],
                [inf, inf]]

removedCols2 = [[4, 2],
                [1, 0.5],
                [2, 3]]

這樣我就可以將它們分別存儲到CSV文件中。

Answer 1

簡而言之，答案是：從技術上講是，但不是，不是，是。

1：如果需要3-D列表，則應使用列表，但我也將其放入數組（ mergedMatrix = numpy.array([matrix1, matrix2]) ）中，以便仍可以按元素使用-新矩陣中的元素邏輯

2 ：（注意：這些是完全不同的問題，因此，嚴格來講，與合並為一個問題相比，應該在2個不同的問題中提出問題，但我可以生存）

為此，您可以使用numpy.delete刪除列。 要刪除列，請使用axis=1 arg，例如：

new_mat = numpy.delete(mergedMatrix, cols_to_delete, axis=1)

其中mergedMatrix和cols_to_delete都是數組。

您可以使用numpy.isinf ，而不是使用嵌套的for循環遍歷數組來查找包含Inf編號的列，然后可以從上方替換cols_to_delete （*注：cols_to_delete = numpy.isinf（merged_Matrix）[:, 1]

無論如何，希望這會有所幫助！ 干杯

Answer 2

我可以想到四種解決方案：

像在問題中一樣使用列表。 沒有什么不妥。 而且您可以按list[0][xx:yy]索引數組list[0][xx:yy]
將數據存儲在{1：matrix1,2：matrix2}之類的字典中
如果您確實想使用熊貓，則必須在數據合並之前添加一個標識符列（data1，data2），然后可以使用groupy對數據進行groupy或設置索引df.set_index('id_column') 。 但是我認為那太過分了。
如果您使用np.vstack或np.hstack （取決於它們相等的軸，則將丟失哪個矩陣是哪個信息。除非您生成具有布爾ID的掩碼，例如
mask = np.ones(len(merged_matrix)) mask[0:len(matrix1)] = 0

Answer 3

假設您實際上並不需要mergedMatrix ，這是在不顯式構造mergedMatrix情況下獲取newMatrix1 ， newMatrix2 ， removedCols1和removedCols2 mergedMatrix 。

找到有趣的價值

首先，讓我們查找inf條目：

import numpy as np
matrix1 = np.genfromtxt(fileName1)
matrix2 = np.genfromtxt(fileName2)

matrix1_infs = matrix1 == float('inf')

# or if you want to treat -inf the same as inf:
matrix1_infs = np.isinf(matrix1)

這為您提供了一個布爾2D NumPy數組。 對於您的小型示例數組，它將是

array([[False, False, False],
       [False,  True, False],
       [False,  True,  True]], dtype=bool)

煮沸成柱

您對單個元素不感興趣，但是哪些列具有任何inf值。 直接找出答案的方法是使用

matrix1_inf_columns = matrix1_infs.any(axis=0)

使用線性代數和布爾代數的組合來得出以下向量矩陣乘積會更加晦澀：

matrix1_inf_columns = np.dot(np.repeat(True, matrix1.shape[1]), matrix1_infs)

結果是一樣的：

array([False,  True,  True], dtype=bool)

使用布爾索引數組進行切片

當您將布爾值NumPy數組用作其他NumPy數組的索引時，會發生一些有趣的事情：

>>> matrix1[:, matrix1_inf_columns] # First index is rows, second columns.
                                    # : means all. Thus here:
                                    # All rows, but only the selected columns.
array([[  2.,   3.],
       [ inf,   0.],
       [ inf,  inf]])

尼斯。 這就是我們想要的removedCols1 。 但是它變得更加瘋狂。 當您采用布爾數組的負數時會發生什么？

>>> -matrix1_inf_columns
array([ True, False, False], dtype=bool)

NumPy否定其元素！ 這意味着我們可以將newMatrix1作為

newMatrix1 = matrix1[:, -matrix1_inf_columns]
# array([[ 0.],
#        [ 0.],
#        [ 1.]])

當然，布爾索引數組並不知道它最初是由matrix1構造的，因此我們可以很容易地使用它來索引matrix2 ：

removedCols2 = matrix2[:, matrix1_inf_columns]
# array([[ 4. ,  2. ],
#        [ 1. ,  0.5],
#        [ 2. ,  3. ]])

但是，如果布爾索引數組的長度小於索引數組的維數，則對於缺少的布爾索引，它將假定False ：

>>> matrix2[:, -matrix1_inf_columns]
array([[ 0.],
       [ 0.],
       [ 1.]])

那不是我們想要的完整newMatrix2 。

尺寸麻煩

因此，我們必須使用更大的索引數組。

>>> matrix1_inf_columns.resize(matrix2.shape[1])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot resize an array references or is referenced
by another array in this way.  Use the resize function

噢。 resize功能？ 文檔說，當請求的大小大於數組時，它將（除了我嘗試在此處使用的resize NumPy數組resize方法）不填充零（對於布爾數組，則為False ），而是重復數組。

因此，讓我們看看是否可以獲得深層副本，而不是在matrix1上的視圖：

>>> tmp = matrix1_inf_columns.copy()
>>> tmp.resize(matrix2.shape[1])
>>> tmp
array([False,  True,  True, False], dtype=bool)
>>> -tmp
array([ True, False, False,  True], dtype=bool)

好的，那行得通。 讓我們將其作為matrix2的索引插入。

removedCols2 = matrix2[:, tmp]
# array([[ 4. ,  2. ],
#        [ 1. ,  0.5],
#        [ 2. ,  3. ]])

很好，所以仍然有效。

newMatrix2 = matrix2[:, -tmp]
# array([[ 0.,  7.],
#        [ 0.,  3.],
#        [ 1.,  9.]])

好極了！

要無限。超越

如果您還希望將matrix2無限值考慮在內以進行過濾，或者您的實際情況更加復雜，則情況將更加復雜。 但是您現在已經了解了所需的大多數概念。

合並NumPy數組並在Python中查找列

問題描述

問題

例：

3 個解決方案

解決方案1
1 2015-07-05 12:38:54

解決方案2
0 2015-07-05 12:52:52

解決方案3
0 2015-07-05 23:20:03

找到有趣的價值

煮沸成柱

使用布爾索引數組進行切片

尺寸麻煩

要無限。超越

合並NumPy數組並在Python中查找列

問題描述

問題

例：

3 個解決方案

解決方案1 1 2015-07-05 12:38:54

解決方案2 0 2015-07-05 12:52:52

解決方案3 0 2015-07-05 23:20:03

找到有趣的價值

煮沸成柱

使用布爾索引數組進行切片

尺寸麻煩

要無限。 超越

解決方案1
1 2015-07-05 12:38:54

解決方案2
0 2015-07-05 12:52:52

解決方案3
0 2015-07-05 23:20:03

要無限。超越