简体   繁体   中英

Faster or better way than looping to find data?

I have an array of object of class Person like the below, with thisRate first set to None :

class Person(object):
    def __init__(self, id, name):
        self.id = id
        self.name = name
        self.thisRate= None

I loaded around 21K Person objects into an array, name not sorted.

Then I loaded another array from data in a file which has data for thisRate , about 13K of them, name is not sorted as well:

person_data = []

# read from file
row['name'] = 'Peter'
row['thisRate'] = '0.12334'

person_data.append(row)

Now with these 2 sets of arrays, when the name is matched between them, I will assign thisRate from person_data into Person.thisRate .

What I am doing is a loop is like this:

for person in persons:
    data = None
    try:
        data = next(personData for personData in person_data
                        if personData['name'] == person.name)
    except StopIteration:
        print("No rate for this person: {}".format(person.name))

    if data:
        person.thisRate = float( data['thisRate'] )

This loop

data = next(personData for personData in person_data
                if personData['name'] == person.name)

is running fine and uses 21 seconds on my machine with Python 2.7.13.

My question is, is there a faster or better way to achieve the same thing with the 2 arrays I have?

Yes. Make an dictionary from name to thisRate :

nd = {}

with open(<whatever>) as f:
    reader = csv.DictReader(<whatever>):
    for row in reader:
        nd[row['name']] = row['thisRate'] 

Now, use this dictionary to do a single pass over your Person list:

for person in persons:
    thisRate = nd.get(person.name, None)
    person.thisRate = thisRate
    if thisRate is None:
        print("No rate for this person: {}".format(person.name))

Dictionaries have a .get method which allows you to provide a default value in case the key is not in the dict . I used None (which is actually what is the default default value) but you can use whatever you want.

This is a linear-time solution. Your solution was quadratic time, because you are essentially doing:

for person in persons:
    for data in person_data:
        if data['name'] == person.name:
            person.thisRate = data['thisRate']
            break
    else:
        print("No rate for this person: {}".format(person.name))

Just in a fashion that obscures this fundamentally nested for-loop inside of a generator expression (not really a good use-case for a generator expression, you should have just used a for-loop to begin with, then you don't have to deal with try-catch a StopIteration

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM