[英]Faster or better way than looping to find data?
I have an array of object of class Person like the below, with thisRate
first set to None
: 我有一个类似于以下类的Person类的对象数组,首先将thisRate
设置为None
:
class Person(object):
def __init__(self, id, name):
self.id = id
self.name = name
self.thisRate= None
I loaded around 21K Person
objects into an array, name
not sorted. 我将21K Person
对象加载到数组中, name
未排序。
Then I loaded another array from data in a file which has data for thisRate
, about 13K of them, name
is not sorted as well: 然后,我从文件中的数据中加载了另一个数组,该文件包含thisRate
数据, thisRate
约13K, name
也未排序:
person_data = []
# read from file
row['name'] = 'Peter'
row['thisRate'] = '0.12334'
person_data.append(row)
Now with these 2 sets of arrays, when the name
is matched between them, I will assign thisRate
from person_data
into Person.thisRate
. 现在,有了这两组数组,当它们之间的name
匹配时,我将thisRate
从person_data
分配给Person.thisRate
。
What I am doing is a loop is like this: 我正在做的是一个循环,像这样:
for person in persons:
data = None
try:
data = next(personData for personData in person_data
if personData['name'] == person.name)
except StopIteration:
print("No rate for this person: {}".format(person.name))
if data:
person.thisRate = float( data['thisRate'] )
This loop 这个循环
data = next(personData for personData in person_data
if personData['name'] == person.name)
is running fine and uses 21 seconds on my machine with Python 2.7.13. 运行良好,并且在使用Python 2.7.13的计算机上使用了21秒。
My question is, is there a faster or better way to achieve the same thing with the 2 arrays I have? 我的问题是,是否有更快或更好的方法可以用我拥有的两个阵列来实现相同的目的?
Yes. 是。 Make an dictionary from name
to thisRate
: 创建一个从name
到thisRate
的字典:
nd = {}
with open(<whatever>) as f:
reader = csv.DictReader(<whatever>):
for row in reader:
nd[row['name']] = row['thisRate']
Now, use this dictionary to do a single pass over your Person
list: 现在,使用此字典对“ Person
列表进行一次遍历:
for person in persons:
thisRate = nd.get(person.name, None)
person.thisRate = thisRate
if thisRate is None:
print("No rate for this person: {}".format(person.name))
Dictionaries have a .get
method which allows you to provide a default value in case the key is not in the dict
. 字典具有.get
方法,如果键不在dict
,则可让您提供默认值。 I used None
(which is actually what is the default default value) but you can use whatever you want. 我使用了None
(实际上是默认的默认值),但是您可以使用任何您想要的东西。
This is a linear-time solution. 这是一个线性时间解决方案。 Your solution was quadratic time, because you are essentially doing: 您的解决方案是二次时间,因为您实际上在做:
for person in persons:
for data in person_data:
if data['name'] == person.name:
person.thisRate = data['thisRate']
break
else:
print("No rate for this person: {}".format(person.name))
Just in a fashion that obscures this fundamentally nested for-loop inside of a generator expression (not really a good use-case for a generator expression, you should have just used a for-loop to begin with, then you don't have to deal with try-catch
a StopIteration
只是以一种掩盖了生成器表达式内这个根本嵌套的for循环的方式(对于生成器表达式而言,这并不是一个很好的用例,您应该只使用for循环开始,那么您不必处理try-catch
StopIteration
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.