简体   繁体   English

用python中元组列表中元组的第一个元素索引元素的最快方法

[英]Fastest way to index an element by the first element of a tuple in a list of tuples in python

list_ = [(1, 'a'), (2, 'b'), (3, 'c')]
item1 = 1
item2 = 'c'
#hypothetical:
assert list_.index_by_first_value(item1) == 0
assert list_.index_by_second_value(item2) == 2

What would be the fastest way to emulate the index_by_first/second_value method in python? 在python中模拟index_by_first/second_value方法的最快方法是什么?

If you don't understand what's going on; 如果您不了解发生了什么事, if you have a list of tuples (as is contained in list_ ), how would you go about finding the index of a tuple with the first/second value of the tuple being the element you want to index? 如果您有一个元组列表(如list_所包含),您将如何查找元组的索引,而元组的第一个/第二个值是您要索引的元素?


My best guess would be this: 我最好的猜测是:

[i[0] for i in list_].index(item1)
[i[1] for i in list_].index(item2)

But I'm interested in seeing what you guys will come up with. 但是我有兴趣看到你们会想出什么。 Any ideas? 有任何想法吗?

At first, I thought along the same lines as Nick T . 起初,我的想法与Nick T相同 Your method is fine if the number of tuples (N) is short. 如果元组(N)的数量很短,则您的方法很好。 But of course a linear search is O(N). 但是,当然线性搜索为O(N)。 As the number of tuples increases, the time increases directly with it. 随着元组数量的增加,时间也随之增加。 You can get O(1) lookup time with a dict mapping the zeroth element of each tuple to its index: 您可以通过字典将O(1)的查找时间映射到每个元组的第零个元素到其索引:

{el[0]:idx for idx,el in enumerate(list_)}

But the cost of converting the list to a dict may be too high! 但是将列表转换为字典的成本可能太高了! Here are my results: 这是我的结果:

>>> from timeit import timeit as t
>>> t('[i[0] for i in list_].index(1)', "import random;list_=[(i,'a') for i in range(10)]; random.shuffle(list_)")
1.557116985321045
>>> t('[i[0] for i in list_].index(1)', "import random;list_=[(i,'a') for i in range(100)]; random.shuffle(list_)")
7.415766954421997
>>> t('{el[0]:idx for idx,el in enumerate(list_)}[1]', "import random;list_=[(i,'a') for i in range(10)]; random.shuffle(list_)")
2.1753010749816895
>>> t('{el[0]:idx for idx,el in enumerate(list_)}[1]', "import random;list_=[(i,'a') for i in range(100)]; random.shuffle(list_)")
15.062835216522217

So the list-to-dict conversion is killing any benefit we get from having the O(1) lookups. 因此,列表到字典的转换将使我们从使用O(1)查找中获得的任何好处都无法实现。 But just to prove that the dict is really fast if we can avoid doing the conversion more than once: 但是只是为了证明dict很快,如果我们可以避免多次进行转换:

>>> t('dict_[1]', "import random;list_=[(i,'a') for i in range(10)];random.shuffle(list_);dict_={el[0]:idx for idx,el in enumerate(list_)}")
0.050583839416503906
>>> t('dict_[1]', "import random;list_=[(i,'a') for i in range(100)];random.shuffle(list_);dict_={el[0]:idx for idx,el in enumerate(list_)}")
0.05001211166381836
>>> t('dict_[1]', "import random;list_=[(i,'a') for i in range(1000)];random.shuffle(list_);dict_={el[0]:idx for idx,el in enumerate(list_)}")
0.050894975662231445

Searching a list is O(n). 搜索列表为O(n)。 Convert it to a dictionary, then lookups take O(1). 将其转换为字典,然后查找取O(1)。

>>> list_ = [(1, 'a'), (2, 'b'), (3, 'c')]
>>> dict(list_)
{1: 'a', 2: 'b', 3: 'c'}
>>> dict((k, v) for v, k in list_)
{'a': 1, 'c': 3, 'b': 2}

If you want the original index you could enumerate it: 如果您想要原始索引,可以枚举它:

>>> dict((kv[0], (i, kv[1])) for i, kv in enumerate(list_))
{1: (0, 'a'), 2: (1, 'b'), 3: (2, 'c')}

>> dict((kv[1], (i, kv[0])) for i, kv in enumerate(list_))
{'a': (0, 1), 'c': (2, 3), 'b': (1, 2)}

EDIT: Just kidding. 编辑:开玩笑。 As the lists grow longer it looks like the manual for loop takes less time. 随着列表的增加,手动for循环看起来会花费更少的时间。 Updated to generate random lists via kojiro's method: 更新为通过kojiro的方法生成随机列表:

Just some timing tests for your information while maintaining lists. 维护列表时,只需进行一些计时测试即可获得您的信息。 The good thing about preserving list form versus a dictionary is that it's expansible to include tuples of any length. 相对于字典而言,保存列表形式的好处是可以扩展包括任何长度的元组。

import timeit
from operator import itemgetter
import random

list_= [('a', i) for i in range(10)]
random.shuffle(list_)

def a():
    return [i[1] for i in list_].index(1)

def b():
    return zip(*list_)[1].index(1)

def c():
    return map(itemgetter(1), list_).index(1)

def d():
     for index, value in enumerate(list_):
         if 1 == value[1]:
             return index

With timeit : 随着timeit

C:\Users\Jesse\Desktop>python -m timeit -s "import test" "test.a()"
1000000 loops, best of 3: 1.21 usec per loop

C:\Users\Jesse\Desktop>python -m timeit -s "import test" "test.b()"
1000000 loops, best of 3: 1.2 usec per loop

C:\Users\Jesse\Desktop>python -m timeit -s "import test" "test.c()"
1000000 loops, best of 3: 1.45 usec per loop

C:\Users\Jesse\Desktop>python -m timeit -s "import test" "test.d()"
1000000 loops, best of 3: 0.922 usec per loop

What is fastest? 什么是最快的? It depends on how many times you need to use it, and if you are able to create an index dictionary from the very beginning. 它取决于您需要使用多少次,以及是否能够从一开始就创建索引字典。

As the others have mentioned, dictionary is much faster once you have it, but it is costly to transform the list into a dictionary. 正如其他人所提到的,一旦拥有字典,字典就会快得多,但是将列表转换成字典的成本很高。 I'm going to show what I get on my computer so that I have numbers to compare to. 我将展示计算机上显示的内容,以便比较数字。 Here's what I got: 这是我得到的:

>>> import timeit
>>> timeit.timeit('mydict = {val[0]:(ind, val[1]) for ind, val in enumerate(mylist)}', 'mylist = [(i, "a") for i in range(1000)]')
200.36049539601527

Surprisingly, this is significantly slower than it was even to create the list in the first place: 令人惊讶的是,这比最初创建列表的速度要慢得多:

>>> timeit.timeit('mylist = [(i, "a") for i in range(1000)]')
70.15259253453814

So how does this compare to creating a dictionary in the first place? 那么,这与首先创建字典有何不同?

>>> timeit.timeit('mydict = {i:("a", i) for i in range(1000)}')
90.78464277950229

Obviously, this is not always possible because you are not always the one creating the list, but I wanted to include this for comparisons. 显然,这并非总是可能的,因为您并非总是创建列表的人,但我想将其包括在内以进行比较。

Summary of initializations: 初始化摘要:

  • Creating a list - 70.15 创建列表-70.15
  • Creating a dictionary - 90.78 创建字典-90.78
  • Indexing an existing list - 70.15 + 200.36 = 270.51 索引现有列表-70.15 + 200.36 = 270.51

So now, supposing you have a list or dictionary already set up, how long does it take? 所以现在,假设您已经设置了列表或词典,它需要多长时间?

>>> timeit.timeit('[i[0] for i in mylist].index(random.randint(0,999))', 'import random; mylist = [(i, "a") for i in range(1000)]')
68.15473008213394

However, this creates a new temporary list each time, so let's look at the breakdown 但是,这每次都会创建一个新的临时列表,因此让我们看一下细分

>>> timeit.timeit('indexed = [i[0] for i in mylist]', 'import random; mylist = [(i, "a") for i in range(1000)];')
55.86422327528999
>>> timeit.timeit('indexed.index(random.randint(0,999))', 'import random; mylist = [(i, "a") for i in range(1000)]; indexed = [i[0] for i in mylist]')
12.302146224677017

55.86 + 12.30 = 68.16, which is consistent with the 68.15 the previous result gave us. 55.86 + 12.30 = 68.16,这与先前结果给我们的68.15一致。 Now the dictionary: 现在字典:

>>> timeit.timeit('mydict[random.randint(0,999)]', 'import random; mylist = [(i, "a") for i in range(1000)]; mydict = {val[0]:(ind, val[1]) for ind, val in enumerate(mylist)}')
1.5201382921450204

Of course, in each of these cases I'm using random.randint so let's time that to factor it out: 当然,在每种情况下,我都使用random.randint所以让我们花点时间考虑一下:

>>> timeit.timeit('random.randint(0,999)', 'import random')
1.4206546251180043

So now a summary of using the index: 现在,使用索引的摘要:

  • Using a list - (68.16-1.42) = 66.74 first time, (12.30-1.42) = 10.88 after that 第一次使用列表-(68.16-1.42)= 66.74,之后(12.30-1.42)= 10.88
  • Using a dictionary - (1.52-1.42) = 0.10 each time 使用字典-每次(1.52-1.42)= 0.10

Now let's figure out how many accesses it takes for the dictionary to become more useful. 现在让我们弄清楚字典变得更有用需要进行多少次访问。 First, a formula for time as a function of number of accesses: 首先,将时间作为访问次数函数的公式:

  • List - 55.86 + 10.88x 列表-55.86 + 10.88x
  • Dictionary - 200.36 + 0.10x 字典-200.36 + 0.10x
  • Initial dictionary - 20.63 + 0.10x 初始词典-20.63 + 0.10x

Based on these formulas, a dictionary becomes faster if you need to access it at least 14 times. 根据这些公式,如果您需要至少访问14次,则字典变得更快。 If you can create a dictionary from the get-go instead of a list, then the extra overhead to create a dictionary instead of a list is more than offset by the overhead to create a list of just the first values in the tuples. 如果您可以从一开始就创建字典而不是列表,那么创建字典而不是列表的额外开销将远远超过创建元组中第一个值的列表的开销。

So which is fastest? 那么哪个最快? It depends on how many times you need to use it, and if you are able to create an index dictionary from the very beginning. 它取决于您需要使用多少次,以及是否能够从一开始就创建索引字典。

Note: I'm using Python 2.7.5. 注意:我正在使用Python 2.7.5。 Timings in Python 3.x could be very different, and also will probably be different on different machines. Python 3.x中的时间可能会非常不同,并且在不同的机器上也可能会有所不同。 I'd be curious to see what someone else would come up with on their machine. 我很想知道别人会在他们的机器上想到什么。

All times are in seconds, but timed for one million runs. 所有时间都以秒为单位,但计时为一百万次。 So individual runs are about the same number in microseconds. 因此,单独运行的时间大约以微秒为单位。

@Nick T @尼克T

I think some time is wasted enumerating the list and then converting it to a dictionary, so even if it is an O(1) lookup for a dict, creating the dict in the first place is too costly to consider it a viable option for large lists. 我认为浪费时间来枚举列表,然后将其转换为字典,因此,即使它是字典的O(1)查找,但首先创建字典还是太昂贵了,以至于无法将其视为大型的可行选择名单。

This is the test I used to determine it: 这是我用来确定它的测试:

import time
l = [(i, chr(i)) for i in range(1000000)]
def test1():
    t1 = time.time()
    ([i[0] for i in l].index(10872))
    t2 = time.time()
    return t2 - t1

def test2():
    t1 = time.time()
    (dict((kv[0], (i, kv[1])) for i, kv in enumerate(l))[10872][0])
    t2 = time.time()
    return t2 - t1

def test3():
    sum1 = []
    sum2 = []
    for i in range(1000):
        sum1.append(test1())
        sum2.append(test2())
    print(sum(sum1)/1000)
    print(sum(sum2)/1000)

test3()

EDIT: Haha Kojiro, you beat me to it! 编辑:哈哈小次郎,你击败了我!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在列表中访问元组列表中元组的第一个元素 - python - Accessing first element of a tuple in a list of tuples, in a list - python Python元组列表中第二个和第三个元素的和,按第一个分组 - Python sum of second and third element of tuple in list of tuples grouped by first Python3:按元组第一个元素中包含的数据戳对元组列表进行排序 - Python3: Sorting a list of tuples by datastamp contained in first element of tuple 按每个元组的第一个元素对元组列表进行排序-Python 2.7 - Sorting list of tuples by first element of each tuple - Python 2.7 如何在元组列表中索引元组的最后一个元素 - how to index the last element of a tuple in a list of tuples 元组列表中元组的小写第一个元素 - Lowercase first element of tuple in list of tuples 访问元组列表中元组第一个元素的范围 - Accessing a range of the first element of a tuple in a list of tuples Python 2.7-检查单元素元组列表中二元素元组中的第一个元素 - Python 2.7 - Check if first element in two-element tuple in list of single-element tuples 按Python 3中的特定元组元素对元组列表进行排序 - Sort list of tuples by specific tuple element in Python 3 Python中是否有一种方法可以通过容器的元素为容器列表(元组,列表,字典)建立索引? - Is there a way in Python to index a list of containers (tuples, lists, dictionaries) by an element of a container?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM