What's the efficient way to filter a python dictionary based on whether an element in a value list exists?

Question

I have a dictionary (table) defined like this:

table = {{"id": [1, 2, 3]}, {"file": ['good1.txt', 'bad2.txt', 'good3.txt']}}

and I have a list of bad candidates that should be removed:

to_exclude = ['bad0.txt', 'bad1.txt', 'bad2.txt']

I hope to filter the table based on if the file in a row of my table can be found inside to_exclude.

filtered = {{"id": [1, 2]}, {"file": ['good1.txt', 'good3.txt']}}

I guess I could use a for loop to check the entries one by one, but I was wondering what's the most python-efficient manner to solve this problem.

Could someone provide some guidance on this? Thanks.

Answer 1

The most efficient thing to do is to convert to_exclude into a set. And then do the straightforward search

# just so things are efficient
to_exclude_set = set(to_exclude)

table = {key: [value for value in values if value not in to_exclude_set] 
         for key, values in table.items()
        }

Answer 2

I'm assuming you miswrote your data structure. You have a set of two dictionaries, which is impossible. (Dictionaries are not hashable). I'm hoping your actual data is:

data = {"id": [1, 2, 3], "file": [.......]}

a dictionary with two keys.

So for me, the simplest would be:

# Create a set for faster testing
to_exclude_set = set(to_exclude)
# Create (id, file) pairs for the pairs we want to keep
pairs = [(id, file) for id, file in zip(data["id"], data["file"])
          if file not in to_exclude_set]
# Recreate the data structure
result = { 'id': [_ for id, _ in pairs],
           'file': [_ for _, file in pairs] }

What's the efficient way to filter a python dictionary based on whether an element in a value list exists?

Question

1 answers

solution1
0 2021-12-04 00:41:10

solution2
0 2021-12-04 07:00:24

What's the efficient way to filter a python dictionary based on whether an element in a value list exists?

Question

1 answers

solution1 0 2021-12-04 00:41:10

solution2 0 2021-12-04 07:00:24

solution1
0 2021-12-04 00:41:10

solution2
0 2021-12-04 07:00:24