简体   繁体   English

仅当元素尚未存在时,将元素添加到列表的最有效方法是什么?

[英]What is the most efficient way to add an element to a list only if isn't there yet?

I have the following code in Python: 我在Python中有以下代码:

def point_to_index(point):
    if point not in points:
        points.append(point)
    return points.index(point)

This code is awfully inefficient, especially since I expect points to grow to hold a few million elements. 这段代码非常低效,特别是因为我希望points增长到容纳几百万个元素。

If the point isn't in the list, I traverse the list 3 times: 如果该点不在列表中,我将遍历列表3次:

  1. look for it and decide it isn't there 寻找它,并决定它不存在
  2. go to the end of the list and add a new element 转到列表的末尾并添加一个新元素
  3. go to the end of the list until I find the index 转到列表的末尾,直到找到索引

If it is in the list, I traverse it twice: 1. look for it and decide it is there 2. go almost to the end of the list until I find the index 如果在列表中,我穿越了两遍:1.寻找它,并决定它是有2去几乎到了列表的末尾,直到我找到指数

Is there any more efficient way to do this? 有没有更有效的方法来做到这一点? For instance, I know that: 例如,我知道:

  • I'm more likely to call this function with a point that isn't in the list. 我更有可能用一个不在列表中的点来调用此函数。
  • If the point is in the list, it's likelier to be near the end than in the beginning. 如果该点在列表中,那么它可能比在开头时接近结尾。

So if I could have the line: 所以,如果我有这条线:

if point not in points:

search the list from the end to the beginning it would improve performance when the point is already in the list. 从结尾到开头搜索列表,当点已经在列表中时,它将提高性能。

However, I don't want to do: 但是,我不想这样做:

if point not in reversed(points):

because I imagine that reversed(points) itself will come at a huge cost. 因为我认为reversed(points)本身会带来巨大的代价。

Nor do I want to add new points to the beginning of the list (assuming I knew how to do that in Python) because that would change the indices, which must remain constant for the algorithm to work. 我也不想在列表的开头添加新的点(假设我知道如何在Python中这样做)因为这会改变索引,索引必须保持不变才能使算法工作。

The only improvement I can think of is to implement the function with only one pass, if possible from the end to the beginning. 我能想到的唯一改进是只使用一次传递来实现该功能,如果可能的话,从最后到开始。 The bottom line is: 底线是:

  • Is there a good way to do this? 有没有办法做到这一点?
  • Is there a better way to optimize the function? 有没有更好的方法来优化功能?

Edit: I've gotten suggestions for implementing this with only one pass. 编辑:我已经得到了只用一次通过实现这个的建议。 Is there any way for index() to go from the end to the beginning? index()有没有办法从最后到开头?

Edit: People have asked why the index is critical. 编辑:人们已经问过为什么索引是关键的。 I'm trying to describe a 3D surface using the OFF file format . 我正在尝试使用OFF文件格式描述3D表面。 This format describes a surface using its vertices and faces. 此格式使用其顶点和面来描述曲面。 First the vertices are listed, and the faces are described using a list of indices of vertices. 首先列出顶点,然后使用顶点索引列表描述面。 That's why once I add a vortex to the list, its index must not change. 这就是为什么一旦我向列表中添加一个漩涡,它的索引就不能改变。

Edit: There have been some suggestions (such as igor's ) to use a dict. 编辑:有一些建议(如igor )使用dict。 This is a good solution for scanning the list. 这是扫描列表的好方法。 However, when I'm done I need to print out the list in the same order it was created. 但是,当我完成后,我需要按照创建的顺序打印出列表。 If I use a dict, I need to print out its keys sorted by value. 如果我使用dict,我需要打印出按值排序的键。 Is there a good way to do that? 有没有一个好方法呢?

Edit: I implemented www.brool.com 's suggestion . 编辑:我实施了www.brool.com建议 This was the simplest and fastest. 这是最简单,最快速的。 It is essentially an ordered Dict, but without the overhead. 它本质上是一个有序的Dict,但没有开销。 The performance is great! 表现很棒!

You want to use a set : 你想使用一套

>>> x = set()
>>> x
set([])
>>> x.add(1)
>>> x
set([1])
>>> x.add(1)
>>> x
set([1])

A set contains only one instance of any item you add, and it will be a lot more efficient than iterating a list manually. 集合仅包含您添加的任何项目的一个实例,并且比手动迭代列表更有效。

This wikibooks page looks like a good primer if you haven't used sets in Python before. 如果您以前没有在Python中使用过套点,那么这个wikibooks页面看起来就像一个很好的入门。

This will traverse at most once: 这将最多遍历一次:

def point_to_index(point):
    try: 
        return points.index(point)
    except ValueError:
        points.append(point)
        return len(points)-1

You may also want to try this version, which takes into account that matches are likely to be near the end of the list. 您可能还想尝试此版本,其中考虑到匹配可能接近列表的末尾。 Note that reversed() has almost no cost even on very large lists - it does not create a copy and does not traverse the list more than once. 请注意,即使在非常大的列表上, reversed()也几乎没有成本 - 它不会创建副本,也不会多次遍历列表。

def point_to_index(point):
    for index, this_point in enumerate(reversed(points)):
        if point == this_point:
            return len(points) - (index+1)
    else:
        points.append(point)
        return len(points)-1

You might also consider keeping a parallel dict or set of points to check for membership, since both of those types can do membership tests in O(1). 您可能还会考虑保留并行dict或一set点来检查成员资格,因为这两种类型都可以在O(1)中进行成员资格测试。 There would be, of course, a substantial memory cost. 当然,会有大量的内存成本。

Obviously, if the points were ordered somehow, you would have many other options for speeding this code up, notably using a binary search for membership tests. 显然,如果以某种方式对点进行排序,那么您将有许多其他选项来加速此代码,特别是使用二进制搜索进行成员资格测试。

If you're worried about memory usage, but want to optimize the common case, keep a dictionary with the last n points and their indexes. 如果您担心内存使用情况,但想要优化常见情况,请保留包含最后n个点及其索引的字典。 points_dict = dictionary, max_cache = size of the cache. points_dict = dictionary,max_cache =缓存的大小。

def point_to_index(point):
    try:
        return points_dict.get(point, points.index(point))
    except:
        if len(points) >= max_cache:
            del points_dict[points[len(points)-max_cache]]
        points.append(point)
        points_dict[points] = len(points)-1
        return len(points)-1
def point_to_index(point):
    try:
        return points.index(point)
    except:
        points.append(point)
        return len(points)-1

Update: Added in Nathan's exception code. 更新:在Nathan的异常代码中添加。

As others said, consider using set or dict. 正如其他人所说,考虑使用set或dict。 You don't explain why you need the indices. 你不解释为什么你需要索引。 If they are needed only to assign unique ids to the points (and I can't easily come up with another reason for using them), then dict will indeed work much better, eg, 如果他们只需要为点分配唯一ID(我不能轻易想出使用它们的另一个原因),那么dict确实会更好地工作,例如,

points = {}
def point_to_index(point):
    if point in points:
        return points[point]
    else:
       points[point] = len(points)
       return len(points) - 1

What you really want is an ordered dict (key insertion determines the order): 你真正想要的是一个有序的字典(键插入确定顺序):

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM