简体   繁体   English

如何在Python中搜索元组列表

[英]How to search a list of tuples in Python

So I have a list of tuples such as this: 所以我有一个这样的元组列表:

[(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

I want this list for a tuple whose number value is equal to something. 我希望此列表用于一个元组,其值等于某值。

So that if I do search(53) it will return the index value of 2 这样,如果我执行search(53) ,它将返回索引值2

Is there an easy way to do this? 是否有捷径可寻?

[i for i, v in enumerate(L) if v[0] == 53]

You can use a list comprehension : 您可以使用列表推导

>>> a = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]
>>> [x[0] for x in a]
[1, 22, 53, 44]
>>> [x[0] for x in a].index(53)
2

tl;dr tl; dr

A generator expression is probably the most performant and simple solution to your problem: 生成器表达式可能是最有效,最简单的解决方案:

l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

result = next((i for i, v in enumerate(l) if v[0] == 53), None)
# 2

Explanation 说明

There are several answers that provide a simple solution to this question with list comprehensions. 有几个答案可以通过列表理解为该问题提供简单的解决方案。 While these answers are perfectly correct, they are not optimal. 这些答案是完全正确的,但不是最佳选择。 Depending on your use case, there may be significant benefits to making a few simple modifications. 根据您的用例,进行一些简单的修改可能会带来很多好处。

The main problem I see with using a list comprehension for this use case is that the entire list will be processed, although you only want to find 1 element . 我在此用例中使用列表理解所遇到的主要问题是,尽管您只想查找1个元素 ,但将处理整个列表

Python provides a simple construct which is ideal here. Python提供了一个简单的结构,在这里非常理想。 It is called the generator expression . 它称为生成器表达式 Here is an example: 这是一个例子:

# Our input list, same as before
l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]

# Call next on our generator expression.
next((i for i, v in enumerate(l) if v[0] == 53), None)

We can expect this method to perform basically the same as list comprehensions in our trivial example, but what if we're working with a larger data set? 在我们的琐碎示例中,我们可以期望这种方法的执行与列表理解基本相同,但是如果使用更大的数据集该怎么办? That's where the advantage of using the generator method comes into play. 这就是使用生成器方法的优势发挥作用的地方。 Rather than constructing a new list, we'll use your existing list as our iterable, and use next() to get the first item from our generator. 而不是构造一个新列表,我们将使用您现有的列表作为可迭代列表,并使用next()从生成器中获取第一项。

Lets look at how these methods perform differently on some larger data sets. 让我们看一下这些方法在某些较大的数据集上的表现如何不同。 These are large lists, made of 10000000 + 1 elements, with our target at the beginning (best) or end (worst). 这些是由10000000 +1个元素组成的大型列表,目标是开始(最佳)或结束(最差)。 We can verify that both of these lists will perform equally using the following list comprehension: 我们可以使用以下列表理解来验证这两个列表的性能是否相同:

List comprehensions 清单理解

"Worst case" “最坏的情况下”

worst_case = ([(False, 'F')] * 10000000) + [(True, 'T')]
print [i for i, v in enumerate(worst_case) if v[0] is True]

# [10000000]
#          2 function calls in 3.885 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    3.885    3.885    3.885    3.885 so_lc.py:1(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

"Best case" “最好的情况”

best_case = [(True, 'T')] + ([(False, 'F')] * 10000000)
print [i for i, v in enumerate(best_case) if v[0] is True]

# [0]
#          2 function calls in 3.864 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    3.864    3.864    3.864    3.864 so_lc.py:1(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Generator expressions 生成器表达式

Here's my hypothesis for generators: we'll see that generators will significantly perform better in the best case, but similarly in the worst case. 这是我对生成器的假设:我们将看到,在最佳情况下,生成器的性能将显着提高,但在最坏情况下,生成器的性能也会类似。 This performance gain is mostly due to the fact that the generator is evaluated lazily, meaning it will only compute what is required to yield a value. 这种性能提升主要是由于生成器被延迟评估的事实所致,这意味着生成器将仅计算产生值所需的内容。

Worst case 最坏的情况下

# 10000000
#          5 function calls in 1.733 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         2    1.455    0.727    1.455    0.727 so_lc.py:10(<genexpr>)
#         1    0.278    0.278    1.733    1.733 so_lc.py:9(<module>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
#         1    0.000    0.000    1.455    1.455 {next}

Best case 最好的情况

best_case  = [(True, 'T')] + ([(False, 'F')] * 10000000)
print next((i for i, v in enumerate(best_case) if v[0] == True), None)

# 0
#          5 function calls in 0.316 seconds
#
#    Ordered by: standard name
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.316    0.316    0.316    0.316 so_lc.py:6(<module>)
#         2    0.000    0.000    0.000    0.000 so_lc.py:7(<genexpr>)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
#         1    0.000    0.000    0.000    0.000 {next}

WHAT?! 什么?! The best case blows away the list comprehensions, but I wasn't expecting the our worst case to outperform the list comprehensions to such an extent. 最好的情况会破坏列表的理解力,但是我没想到我们最坏的情况会在一定程度上胜过列表的理解力。 How is that? 那个怎么样? Frankly, I could only speculate without further research. 坦白说,我只能推测,无需进一步研究。

Take all of this with a grain of salt, I have not run any robust profiling here, just some very basic testing. 一粒盐地拿走所有这些,我这里没有进行任何可靠的分析,只是一些非常基本的测试。 This should be sufficient to appreciate that a generator expression is more performant for this type of list searching. 这应该足以了解生成器表达式对于这种类型的列表搜索更有效。

Note that this is all basic, built-in python. 请注意,这都是基本的内置python。 We don't need to import anything or use any libraries. 我们不需要导入任何东西或使用任何库。

I first saw this technique for searching in the Udacity cs212 course with Peter Norvig. 我首先在Peter Norvig的Udacity cs212课程中看到了这项搜索技术。

Your tuples are basically key-value pairs--a python dict --so: 您的元组基本上是键-值对-一个python dict -so:

l = [(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")]
val = dict(l)[53]

Edit -- aha, you say you want the index value of (53, "xuxa"). 编辑-啊哈,您说您想要索引值为(53,“ xuxa”)。 If this is really what you want, you'll have to iterate through the original list, or perhaps make a more complicated dictionary: 如果这确实是您想要的,则必须遍历原始列表,或者可能制作更复杂的字典:

d = dict((n,i) for (i,n) in enumerate(e[0] for e in l))
idx = d[53]

Hmm... well, the simple way that comes to mind is to convert it to a dict 嗯...好吧,想到的简单方法就是将其转换为字典

d = dict(thelist)

and access d[53] . 并访问d[53]

EDIT : Oops, misread your question the first time. 编辑 :糟糕,第一次误读您的问题。 It sounds like you actually want to get the index where a given number is stored. 听起来您实际上想要获取存储给定数字的索引。 In that case, try 在这种情况下,请尝试

dict((t[0], i) for i, t in enumerate(thelist))

instead of a plain old dict conversion. 而不是简单的旧dict转换。 Then d[53] would be 2. d[53]为2。

Supposing the list may be long and the numbers may repeat, consider using the SortedList type from the Python sortedcontainers module . 假设列表可能很长且数字可能重复,请考虑使用Python sortedcontainers模块中SortedList类型。 The SortedList type will automatically maintain the tuples in order by number and allow for fast searching. SortedList类型将自动按数字顺序维护元组,并允许快速搜索。

For example: 例如:

from sortedcontainers import SortedList
sl = SortedList([(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")])

# Get the index of 53:

index = sl.bisect((53,))

# With the index, get the tuple:

tup = sl[index]

This will work a lot faster than the list comprehension suggestion by doing a binary search. 通过执行二进制搜索,这将比列表理解建议快得多。 The dictionary suggestion will be faster still but won't work if there could be duplicate numbers with different strings. 字典建议仍然会更快,但是如果可能存在带有不同字符串的重复数字,则字典建议将不起作用。

If there are duplicate numbers with different strings then you need to take one more step: 如果重复的数字使用不同的字符串,则您需要再执行一步:

end = sl.bisect((53 + 1,))

results = sl[index:end]

By bisecting for 54, we will find the end index for our slice. 通过平分54,我们将找到切片的结束索引。 This will be significantly faster on long lists as compared with the accepted answer. 与接受的答案相比,这在长列表上将明显更快。

只是另一种方式。

zip(*a)[0].index(53)

[k for k,v in l if v ==' delicia '] [如果v ==' delicia ',则k为l中的k,v

here l is the list of tuples-[(1,"juca"),(22,"james"),(53,"xuxa"),(44,"delicia")] 这里l是元组列表-[(1,“ juca”),(22,“ james”),(53,“ xuxa”),(44,“ delicia”)]

And instead of converting it to a dict, we are using llist comprehension. 而且,我们没有将其转换为字典,而是使用了llist理解。

*Key* in Key,Value in list, where value = **delicia**

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM