如何在Python中搜索元组列表

Question

所以我有一个这样的元组列表：

[(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

我希望此列表用于一个元组，其值等于某值。

这样，如果我执行search(53) ，它将返回索引值2

是否有捷径可寻？

Answer 1

[i for i, v in enumerate(L) if v[0] == 53]

Answer 2

您可以使用列表推导：

>>> a = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]
>>> [x[0] for x in a]
[1, 22, 53, 44]
>>> [x[0] for x in a].index(53)
2

Answer 3

tl; dr

生成器表达式可能是最有效，最简单的解决方案：

l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

result = next((i for i, v in enumerate(l) if v[0] == 53), None)
# 2

说明

有几个答案可以通过列表理解为该问题提供简单的解决方案。 这些答案是完全正确的，但不是最佳选择。 根据您的用例，进行一些简单的修改可能会带来很多好处。

我在此用例中使用列表理解所遇到的主要问题是，尽管您只想查找1个元素 ，但将处理整个列表 。

Python提供了一个简单的结构，在这里非常理想。 它称为生成器表达式。 这是一个例子：

# Our input list, same as before
l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

# Call next on our generator expression.
next((i for i, v in enumerate(l) if v[0] == 53), None)

在我们的琐碎示例中，我们可以期望这种方法的执行与列表理解基本相同，但是如果使用更大的数据集该怎么办？ 这就是使用生成器方法的优势发挥作用的地方。 而不是构造一个新列表，我们将使用您现有的列表作为可迭代列表，并使用next()从生成器中获取第一项。

让我们看一下这些方法在某些较大的数据集上的表现如何不同。 这些是由10000000 +1个元素组成的大型列表，目标是开始（最佳）或结束（最差）。 我们可以使用以下列表理解来验证这两个列表的性能是否相同：

清单理解

“最坏的情况下”

worst_case = ([(False, 'F')] * 10000000) + [(True, 'T')]
print [i for i, v in enumerate(worst_case) if v[0] is True]

# [10000000]
#          2 function calls in 3.885 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    3.885    3.885    3.885    3.885 so_lc.py:1(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

“最好的情况”

best_case = [(True, 'T')] + ([(False, 'F')] * 10000000)
print [i for i, v in enumerate(best_case) if v[0] is True]

# [0]
#          2 function calls in 3.864 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    3.864    3.864    3.864    3.864 so_lc.py:1(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

生成器表达式

这是我对生成器的假设：我们将看到，在最佳情况下，生成器的性能将显着提高，但在最坏情况下，生成器的性能也会类似。 这种性能提升主要是由于生成器被延迟评估的事实所致，这意味着生成器将仅计算产生值所需的内容。

最坏的情况下

# 10000000
#          5 function calls in 1.733 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         2    1.455    0.727    1.455    0.727 so_lc.py:10(<genexpr>)
#         1    0.278    0.278    1.733    1.733 so_lc.py:9(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
#         1    0.000    0.000    1.455    1.455 {next}

最好的情况

best_case  = [(True, 'T')] + ([(False, 'F')] * 10000000)
print next((i for i, v in enumerate(best_case) if v[0] == True), None)

# 0
#          5 function calls in 0.316 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.316    0.316    0.316    0.316 so_lc.py:6(<module>)
#         2    0.000    0.000    0.000    0.000 so_lc.py:7(<genexpr>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
#         1    0.000    0.000    0.000    0.000 {next}

什么？！ 最好的情况会破坏列表的理解力，但是我没想到我们最坏的情况会在一定程度上胜过列表的理解力。 那个怎么样？ 坦白说，我只能推测，无需进一步研究。

一粒盐地拿走所有这些，我这里没有进行任何可靠的分析，只是一些非常基本的测试。 这应该足以了解生成器表达式对于这种类型的列表搜索更有效。

请注意，这都是基本的内置python。 我们不需要导入任何东西或使用任何库。

我首先在Peter Norvig的Udacity cs212课程中看到了这项搜索技术。

Answer 4

您的元组基本上是键-值对-一个python dict -so：

l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]
val = dict(l)[53]

编辑-啊哈，您说您想要索引值为（53，“ xuxa”）。 如果这确实是您想要的，则必须遍历原始列表，或者可能制作更复杂的字典：

d = dict((n,i) for (i,n) in enumerate(e[0] for e in l))
idx = d[53]

Answer 5

嗯...好吧，想到的简单方法就是将其转换为字典

d = dict(thelist)

并访问d[53] 。

编辑：糟糕，第一次误读您的问题。 听起来您实际上想要获取存储给定数字的索引。 在这种情况下，请尝试

dict((t[0], i) for i, t in enumerate(thelist))

而不是简单的旧dict转换。 则d[53]为2。

Answer 6

假设列表可能很长且数字可能重复，请考虑使用Python sortedcontainers模块中的SortedList类型。 SortedList类型将自动按数字顺序维护元组，并允许快速搜索。

例如：

from sortedcontainers import SortedList
sl = SortedList([(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")])

# Get the index of 53:

index = sl.bisect((53,))

# With the index, get the tuple:

tup = sl[index]

通过执行二进制搜索，这将比列表理解建议快得多。 字典建议仍然会更快，但是如果可能存在带有不同字符串的重复数字，则字典建议将不起作用。

如果重复的数字使用不同的字符串，则您需要再执行一步：

end = sl.bisect((53 + 1,))

results = sl[index:end]

通过平分54，我们将找到切片的结束索引。 与接受的答案相比，这在长列表上将明显更快。

Answer 7

只是另一种方式。

zip(*a)[0].index(53)

Answer 8

[如果v ==' delicia '，则k为l中的k，v

这里l是元组列表-[（1，“ juca”），（22，“ james”），（53，“ xuxa”），（44，“ delicia”）]

而且，我们没有将其转换为字典，而是使用了llist理解。

*Key* in Key,Value in list, where value = **delicia**

如何在Python中搜索元组列表

问题描述

8 个解决方案

解决方案1
86 已采纳 2010-05-26 22:47:33

解决方案2
48 2010-05-26 22:48:27

解决方案3
43 2012-06-02 19:36:40

tl; dr

说明

清单理解

“最坏的情况下”

“最好的情况”

生成器表达式

最坏的情况下

最好的情况

解决方案4
26 2010-05-26 22:49:53

解决方案5
12 2010-05-26 22:47:19

解决方案6
6 2014-04-10 23:31:26

解决方案7
1 2013-07-23 19:55:47

解决方案8
-1 2017-04-24 23:52:22

如何在Python中搜索元组列表

问题描述

8 个解决方案

解决方案1 86 已采纳 2010-05-26 22:47:33

解决方案2 48 2010-05-26 22:48:27

解决方案3 43 2012-06-02 19:36:40

tl; dr

说明

清单理解

“最坏的情况下”

“最好的情况”

生成器表达式

最坏的情况下

最好的情况

解决方案4 26 2010-05-26 22:49:53

解决方案5 12 2010-05-26 22:47:19

解决方案6 6 2014-04-10 23:31:26

解决方案7 1 2013-07-23 19:55:47

解决方案8 -1 2017-04-24 23:52:22

解决方案1
86 已采纳 2010-05-26 22:47:33

解决方案2
48 2010-05-26 22:48:27

解决方案3
43 2012-06-02 19:36:40

解决方案4
26 2010-05-26 22:49:53

解决方案5
12 2010-05-26 22:47:19

解决方案6
6 2014-04-10 23:31:26

解决方案7
1 2013-07-23 19:55:47

解决方案8
-1 2017-04-24 23:52:22