如何在Python中搜索元組列表

Question

所以我有一個這樣的元組列表：

[(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

我希望此列表用於一個元組，其值等於某值。

這樣，如果我執行search(53) ，它將返回索引值2

是否有捷徑可尋？

Answer 1

[i for i, v in enumerate(L) if v[0] == 53]

Answer 2

您可以使用列表推導：

>>> a = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]
>>> [x[0] for x in a]
[1, 22, 53, 44]
>>> [x[0] for x in a].index(53)
2

Answer 3

tl; dr

生成器表達式可能是最有效，最簡單的解決方案：

l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

result = next((i for i, v in enumerate(l) if v[0] == 53), None)
# 2

說明

有幾個答案可以通過列表理解為該問題提供簡單的解決方案。 這些答案是完全正確的，但不是最佳選擇。 根據您的用例，進行一些簡單的修改可能會帶來很多好處。

我在此用例中使用列表理解所遇到的主要問題是，盡管您只想查找1個元素 ，但將處理整個列表 。

Python提供了一個簡單的結構，在這里非常理想。 它稱為生成器表達式。 這是一個例子：

# Our input list, same as before
l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

# Call next on our generator expression.
next((i for i, v in enumerate(l) if v[0] == 53), None)

在我們的瑣碎示例中，我們可以期望這種方法的執行與列表理解基本相同，但是如果使用更大的數據集該怎么辦？ 這就是使用生成器方法的優勢發揮作用的地方。 而不是構造一個新列表，我們將使用您現有的列表作為可迭代列表，並使用next()從生成器中獲取第一項。

讓我們看一下這些方法在某些較大的數據集上的表現如何不同。 這些是由10000000 +1個元素組成的大型列表，目標是開始（最佳）或結束（最差）。 我們可以使用以下列表理解來驗證這兩個列表的性能是否相同：

清單理解

“最壞的情況下”

worst_case = ([(False, 'F')] * 10000000) + [(True, 'T')]
print [i for i, v in enumerate(worst_case) if v[0] is True]

# [10000000]
#          2 function calls in 3.885 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    3.885    3.885    3.885    3.885 so_lc.py:1(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

“最好的情況”

best_case = [(True, 'T')] + ([(False, 'F')] * 10000000)
print [i for i, v in enumerate(best_case) if v[0] is True]

# [0]
#          2 function calls in 3.864 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    3.864    3.864    3.864    3.864 so_lc.py:1(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

生成器表達式

這是我對生成器的假設：我們將看到，在最佳情況下，生成器的性能將顯着提高，但在最壞情況下，生成器的性能也會類似。 這種性能提升主要是由於生成器被延遲評估的事實所致，這意味着生成器將僅計算產生值所需的內容。

最壞的情況下

# 10000000
#          5 function calls in 1.733 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         2    1.455    0.727    1.455    0.727 so_lc.py:10(<genexpr>)
#         1    0.278    0.278    1.733    1.733 so_lc.py:9(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
#         1    0.000    0.000    1.455    1.455 {next}

最好的情況

best_case  = [(True, 'T')] + ([(False, 'F')] * 10000000)
print next((i for i, v in enumerate(best_case) if v[0] == True), None)

# 0
#          5 function calls in 0.316 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.316    0.316    0.316    0.316 so_lc.py:6(<module>)
#         2    0.000    0.000    0.000    0.000 so_lc.py:7(<genexpr>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
#         1    0.000    0.000    0.000    0.000 {next}

什么？！ 最好的情況會破壞列表的理解力，但是我沒想到我們最壞的情況會在一定程度上勝過列表的理解力。 那個怎么樣？ 坦白說，我只能推測，無需進一步研究。

一粒鹽地拿走所有這些，我這里沒有進行任何可靠的分析，只是一些非常基本的測試。 這應該足以了解生成器表達式對於這種類型的列表搜索更有效。

請注意，這都是基本的內置python。 我們不需要導入任何東西或使用任何庫。

我首先在Peter Norvig的Udacity cs212課程中看到了這項搜索技術。

Answer 4

您的元組基本上是鍵-值對-一個python dict -so：

l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]
val = dict(l)[53]

編輯-啊哈，您說您想要索引值為（53，“ xuxa”）。 如果這確實是您想要的，則必須遍歷原始列表，或者可能制作更復雜的字典：

d = dict((n,i) for (i,n) in enumerate(e[0] for e in l))
idx = d[53]

Answer 5

嗯...好吧，想到的簡單方法就是將其轉換為字典

d = dict(thelist)

並訪問d[53] 。

編輯：糟糕，第一次誤讀您的問題。 聽起來您實際上想要獲取存儲給定數字的索引。 在這種情況下，請嘗試

dict((t[0], i) for i, t in enumerate(thelist))

而不是簡單的舊dict轉換。 則d[53]為2。

Answer 6

假設列表可能很長且數字可能重復，請考慮使用Python sortedcontainers模塊中的SortedList類型。 SortedList類型將自動按數字順序維護元組，並允許快速搜索。

例如：

from sortedcontainers import SortedList
sl = SortedList([(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")])

# Get the index of 53:

index = sl.bisect((53,))

# With the index, get the tuple:

tup = sl[index]

通過執行二進制搜索，這將比列表理解建議快得多。 字典建議仍然會更快，但是如果可能存在帶有不同字符串的重復數字，則字典建議將不起作用。

如果重復的數字使用不同的字符串，則您需要再執行一步：

end = sl.bisect((53 + 1,))

results = sl[index:end]

通過平分54，我們將找到切片的結束索引。 與接受的答案相比，這在長列表上將明顯更快。

Answer 7

只是另一種方式。

zip(*a)[0].index(53)

Answer 8

[如果v ==' delicia '，則k為l中的k，v

這里l是元組列表-[（1，“ juca”），（22，“ james”），（53，“ xuxa”），（44，“ delicia”）]

而且，我們沒有將其轉換為字典，而是使用了llist理解。

*Key* in Key,Value in list, where value = **delicia**

如何在Python中搜索元組列表

問題描述

8 個解決方案

解決方案1
86 已采納 2010-05-26 22:47:33

解決方案2
48 2010-05-26 22:48:27

解決方案3
43 2012-06-02 19:36:40

tl; dr

說明

清單理解

“最壞的情況下”

“最好的情況”

生成器表達式

最壞的情況下

最好的情況

解決方案4
26 2010-05-26 22:49:53

解決方案5
12 2010-05-26 22:47:19

解決方案6
6 2014-04-10 23:31:26

解決方案7
1 2013-07-23 19:55:47

解決方案8
-1 2017-04-24 23:52:22

如何在Python中搜索元組列表

問題描述

8 個解決方案

解決方案1 86 已采納 2010-05-26 22:47:33

解決方案2 48 2010-05-26 22:48:27

解決方案3 43 2012-06-02 19:36:40

tl; dr

說明

清單理解

“最壞的情況下”

“最好的情況”

生成器表達式

最壞的情況下

最好的情況

解決方案4 26 2010-05-26 22:49:53

解決方案5 12 2010-05-26 22:47:19

解決方案6 6 2014-04-10 23:31:26

解決方案7 1 2013-07-23 19:55:47

解決方案8 -1 2017-04-24 23:52:22

解決方案1
86 已采納 2010-05-26 22:47:33

解決方案2
48 2010-05-26 22:48:27

解決方案3
43 2012-06-02 19:36:40

解決方案4
26 2010-05-26 22:49:53

解決方案5
12 2010-05-26 22:47:19

解決方案6
6 2014-04-10 23:31:26

解決方案7
1 2013-07-23 19:55:47

解決方案8
-1 2017-04-24 23:52:22