在Python中搜索二維數組

Question

我希望能夠通過Python給定兩個或多個參數來檢索大型數據集（900萬行，1.4 GB）中的特定行。

例如，從該數據集中：

ID1 2   10  2   2   1   2   2   2   2   2   1

ID2 10  12  2   2   2   2   2   2   2   1   2

ID3 2   22  0   1   0   0   0   0   0   1   2

ID4 14  45  0   0   0   0   1   0   0   1   1

ID5 2   8   1   1   1   1   1   1   1   1   2

給定示例參數：

第二列必須等於2，並且
第三列必須在4到15的范圍內

我應該獲得：

ID1 2   10  2   2   1   2   2   2   2   2   1

ID5 2   8   1   1   1   1   1   1   1   1   2

問題是我不知道如何在Python中的二維數組上有效地執行這些操作。

這是我嘗試的：

line_list = []

# Loading of the whole file in memory
for line in file:
    line_list.append(line)

# set conditions
i = 2
start_range = 4
end_range = 15

# Iteration through the loaded list and split for each column
for index in data_list:
    data = index.strip().split()
    # now test if the current line matches with conditions
    if(data[1] == i and data[2] >= start_range and data[2] <= end_range):
        print str(data)

我想執行此過程很多次，但我的方法確實很慢，即使將數據文件加載到內存中也是如此。

我正在考慮使用numpy數組，但是我不知道如何在給定條件的情況下檢索行。

謝謝你的幫助！

更新：

如建議的那樣，我使用了關系數據庫系統。 我選擇Sqlite3是因為它易於使用且部署迅速。

我的文件是通過sqlite3中的導入功能加載的，大約需要4分鍾。

我在第二和第三列上做了索引，以加快檢索信息時的過程。

該查詢是通過Python使用模塊“ sqlite3”完成的。

那是方法，方法更快！

Answer 1

我幾乎會去買（未經測試）的東西：

with open('somefile') as fin:
    rows = (line.split() for line in fin)
    take = (row for row in rows if int(row[1] == 2) and 4 <= int(row[2]) <= 15)
    # data = list(take)
    for row in take:
        pass # do something

在Python中搜索二維數組

問題描述

更新：

1 個解決方案

解決方案1
1 已采納 2013-02-01 01:08:14

在Python中搜索二維數組

問題描述

更新：

1 個解決方案

解決方案1 1 已采納 2013-02-01 01:08:14

解決方案1
1 已采納 2013-02-01 01:08:14