比循环查找数据更快或更更好的方法？

Question

I have an array of object of class Person like the below, with thisRate first set to None : 我有一个类似于以下类的Person类的对象数组，首先将thisRate设置为None ：

class Person(object):
    def __init__(self, id, name):
        self.id = id
        self.name = name
        self.thisRate= None

I loaded around 21K Person objects into an array, name not sorted. 我将21K Person对象加载到数组中， name未排序。

Then I loaded another array from data in a file which has data for thisRate , about 13K of them, name is not sorted as well: 然后，我从文件中的数据中加载了另一个数组，该文件包含thisRate数据， thisRate约13K， name也未排序：

person_data = []

# read from file
row['name'] = 'Peter'
row['thisRate'] = '0.12334'

person_data.append(row)

Now with these 2 sets of arrays, when the name is matched between them, I will assign thisRate from person_data into Person.thisRate . 现在，有了这两组数组，当它们之间的name匹配时，我将thisRate从person_data分配给Person.thisRate 。

What I am doing is a loop is like this: 我正在做的是一个循环，像这样：

for person in persons:
    data = None
    try:
        data = next(personData for personData in person_data
                        if personData['name'] == person.name)
    except StopIteration:
        print("No rate for this person: {}".format(person.name))

    if data:
        person.thisRate = float( data['thisRate'] )

This loop 这个循环

data = next(personData for personData in person_data
                if personData['name'] == person.name)

is running fine and uses 21 seconds on my machine with Python 2.7.13. 运行良好，并且在使用Python 2.7.13的计算机上使用了21秒。

My question is, is there a faster or better way to achieve the same thing with the 2 arrays I have? 我的问题是，是否有更快或更好的方法可以用我拥有的两个阵列来实现相同的目的？

Answer 1

Yes. 是。 Make an dictionary from name to thisRate : 创建一个从name到thisRate的字典：

nd = {}

with open(<whatever>) as f:
    reader = csv.DictReader(<whatever>):
    for row in reader:
        nd[row['name']] = row['thisRate']

Now, use this dictionary to do a single pass over your Person list: 现在，使用此字典对“ Person列表进行一次遍历：

for person in persons:
    thisRate = nd.get(person.name, None)
    person.thisRate = thisRate
    if thisRate is None:
        print("No rate for this person: {}".format(person.name))

Dictionaries have a .get method which allows you to provide a default value in case the key is not in the dict . 字典具有.get方法，如果键不在dict ，则可让您提供默认值。 I used None (which is actually what is the default default value) but you can use whatever you want. 我使用了None （实际上是默认的默认值），但是您可以使用任何您想要的东西。

This is a linear-time solution. 这是一个线性时间解决方案。 Your solution was quadratic time, because you are essentially doing: 您的解决方案是二次时间，因为您实际上在做：

for person in persons:
    for data in person_data:
        if data['name'] == person.name:
            person.thisRate = data['thisRate']
            break
    else:
        print("No rate for this person: {}".format(person.name))

Just in a fashion that obscures this fundamentally nested for-loop inside of a generator expression (not really a good use-case for a generator expression, you should have just used a for-loop to begin with, then you don't have to deal with try-catch a StopIteration 只是以一种掩盖了生成器表达式内这个根本嵌套的for循环的方式（对于生成器表达式而言，这并不是一个很好的用例，您应该只使用for循环开始，那么您不必处理try-catch StopIteration

比循环查找数据更快或更更好的方法？

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-04-06 01:15:16

比循环查找数据更快或更更好的方法？

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-04-06 01:15:16

解决方案1
4 已采纳 2017-04-06 01:15:16