简体   繁体   English

比循环查找数据更快或更更好的方法?

[英]Faster or better way than looping to find data?

I have an array of object of class Person like the below, with thisRate first set to None : 我有一个类似于以下类的Person类的对象数组,首先将thisRate设置为None

class Person(object):
    def __init__(self, id, name):
        self.id = id
        self.name = name
        self.thisRate= None

I loaded around 21K Person objects into an array, name not sorted. 我将21K Person对象加载到数组中, name未排序。

Then I loaded another array from data in a file which has data for thisRate , about 13K of them, name is not sorted as well: 然后,我从文件中的数据中加载了另一个数组,该文件包含thisRate数据, thisRate约13K, name也未排序:

person_data = []

# read from file
row['name'] = 'Peter'
row['thisRate'] = '0.12334'

person_data.append(row)

Now with these 2 sets of arrays, when the name is matched between them, I will assign thisRate from person_data into Person.thisRate . 现在,有了这两组数组,当它们之间的name匹配时,我将thisRateperson_data分配给Person.thisRate

What I am doing is a loop is like this: 我正在做的是一个循环,像这样:

for person in persons:
    data = None
    try:
        data = next(personData for personData in person_data
                        if personData['name'] == person.name)
    except StopIteration:
        print("No rate for this person: {}".format(person.name))

    if data:
        person.thisRate = float( data['thisRate'] )

This loop 这个循环

data = next(personData for personData in person_data
                if personData['name'] == person.name)

is running fine and uses 21 seconds on my machine with Python 2.7.13. 运行良好,并且在使用Python 2.7.13的计算机上使用了21秒。

My question is, is there a faster or better way to achieve the same thing with the 2 arrays I have? 我的问题是,是否有更快或更好的方法可以用我拥有的两个阵列来实现相同的目的?

Yes. 是。 Make an dictionary from name to thisRate : 创建一个从namethisRate的字典:

nd = {}

with open(<whatever>) as f:
    reader = csv.DictReader(<whatever>):
    for row in reader:
        nd[row['name']] = row['thisRate'] 

Now, use this dictionary to do a single pass over your Person list: 现在,使用此字典对“ Person列表进行一次遍历:

for person in persons:
    thisRate = nd.get(person.name, None)
    person.thisRate = thisRate
    if thisRate is None:
        print("No rate for this person: {}".format(person.name))

Dictionaries have a .get method which allows you to provide a default value in case the key is not in the dict . 字典具有.get方法,如果键不在dict ,则可让您提供默认值。 I used None (which is actually what is the default default value) but you can use whatever you want. 我使用了None (实际上是默认的默认值),但是您可以使用任何您想要的东西。

This is a linear-time solution. 这是一个线性时间解决方案。 Your solution was quadratic time, because you are essentially doing: 您的解决方案是二次时间,因为您实际上在做:

for person in persons:
    for data in person_data:
        if data['name'] == person.name:
            person.thisRate = data['thisRate']
            break
    else:
        print("No rate for this person: {}".format(person.name))

Just in a fashion that obscures this fundamentally nested for-loop inside of a generator expression (not really a good use-case for a generator expression, you should have just used a for-loop to begin with, then you don't have to deal with try-catch a StopIteration 只是以一种掩盖了生成器表达式内这个根本嵌套的for循环的方式(对于生成器表达式而言,这并不是一个很好的用例,您应该只使用for循环开始,那么您不必处理try-catch StopIteration

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有比循环遍历更好的方法将数据框转换为“真值表”? - Is there a better way to transform a data frame into a “truth table” than looping through it? 有没有比遍历 numpy arrays 更快的方法? - Is there a faster way than looping over numpy arrays? 还有比这更好的多处理方法吗? 最好在 memory 上更快且不那么难 - Is there a better way of doing multiprocessing than this? Preferably faster and not so hard on the memory 比循环和调用循环和调用其他函数的函数更好的方法 - A better way than looping and calling functions that loop and call another functions 比这更好的在XML文件中查找子级的方法? - Better way to find children in XML file than this? 比pandas groupby更快的数据分组方式 - Faster way to group data than pandas groupby 有没有更快的方法来实现此循环功能? - Is there a faster way to implement this looping function? 有没有比列出目录中的所有文件更快的方法来查找文件? - Is there a faster way to find a file than by listing all files in the directory? 哪个是python中更好更快的方法? - Which is the better & faster way in python? 有没有更好的方法来使用 Pandas 来查找垃圾数据? - Is there a better way to use Pandas to find garbage data?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM