Creating a dict of empty sets through a loop, then filling those sets using another loop - more efficient method? (Python)

Question

In this answer , I am creating a dict of empty set s by iterating through a list. Then I iterate through the same list & filling those sets. An MRE :

# imports we need
import time
import numpy as np
np.random.seed(42)

What I am doing

An example list of letters. Note that at least one letter will appear more than once.

letters=[np.random.choice([letter for letter in string.ascii_lowercase]) for _ in range(1000)]

Result:

['w', 'n', 'k', 'o', 'm', 'r', ...

Creating a dict with letters as keys, empty sets as values:

letterdict={letter:set() for letter in letters}

Iterating through the letters list again, each entry in the list with the corresponding letter will be a set where the indices of that letter appears in the letters list:

for index, letter in enumerate(letters):
    letterdict[letter].add(index)

letterdict will look like:

{'w': {0, 12, 62, 67, 69, ...

How long it was

This process took:

start = time.time()
letterdict={letter:set() for letter in letters}
for index, letter in enumerate(letters):
    letterdict[letter].add(index)
end = time.time()
print(end-start)

0.000538... sec.

The question

Is there a way to make the creation of letterdict quicker? Afterall, I am iterating through letters twice.

My thoughts: If I could make it in one loop, when it encounters a letter for the first time, I could create a set , and put the index of the letter in it. When encountering the letter for the second time, it could not reset the set , just add the index. However, checking that a letter is already encountered or not seems tedious (ie slow).

In the MRE, assume, that we do not know what all the letters are, so replacing the first loop by {letter:set() for letter in string.ascii_lowercase} is not really useful.

Answer 1

You probably want a collections.defaultdict .

This will create the empty set on lookup if the key is not already present:

from collections import defaultdict

letterdict = defaultdict(set)
for index, letter in enumerate(letters):
     letterdict[letter].add(index)

This way you don't have to initialize the dictionary with empty sets.

You could instead just use the .setdefault() method on a normal dict.

letterdict = {}
for index, letter in enumerate(letters):
     letterdict.setdefault(letter, set()).add(index)

This has the overhead of creating a new set object each time whether it's needed or not, but the builtin set() types seem to construct pretty quickly. It wasn't any slower on the small samples I used to time it. CPython might pool these somehow when they get deleted quickly, like it does for tuples.

Answer 2

Okay, so I think there is no significant gain in time with this method, but you can do the same job with one loop, by using a try except block.

start = time.time()
for index, letter in enumerate(letters):
    try:
        letterdict[letter].add(index)
    except:
        letterdict[letter] = {index}
end = time.time()
print(end-start)

Basically, we create a set while looping though the list, therefore no need for another loop.

My time, first is your method, second is mine:

0.0009996 #Your method
0.0009992 #My method

As I said, no significant gain, but according to several runs, it slightly faster.

Creating a dict of empty sets through a loop, then filling those sets using another loop - more efficient method? (Python)

Question

What I am doing

How long it was

The question

2 answers

solution1
1 ACCPTED 2020-07-25 21:24:39

solution2
1 2020-07-25 21:26:15

Creating a dict of empty sets through a loop, then filling those sets using another loop - more efficient method? (Python)

Question

What I am doing

How long it was

The question

2 answers

solution1 1 ACCPTED 2020-07-25 21:24:39

solution2 1 2020-07-25 21:26:15

solution1
1 ACCPTED 2020-07-25 21:24:39

solution2
1 2020-07-25 21:26:15