简体   繁体   中英

What's the efficient way to filter a python dictionary based on whether an element in a value list exists?

I have a dictionary (table) defined like this:

table = {{"id": [1, 2, 3]}, {"file": ['good1.txt', 'bad2.txt', 'good3.txt']}}

and I have a list of bad candidates that should be removed:

to_exclude = ['bad0.txt', 'bad1.txt', 'bad2.txt']

I hope to filter the table based on if the file in a row of my table can be found inside to_exclude.

filtered = {{"id": [1, 2]}, {"file": ['good1.txt', 'good3.txt']}}

I guess I could use a for loop to check the entries one by one, but I was wondering what's the most python-efficient manner to solve this problem.

Could someone provide some guidance on this? Thanks.

The most efficient thing to do is to convert to_exclude into a set. And then do the straightforward search

# just so things are efficient
to_exclude_set = set(to_exclude)

table = {key: [value for value in values if value not in to_exclude_set] 
         for key, values in table.items()
        }

I'm assuming you miswrote your data structure. You have a set of two dictionaries, which is impossible. (Dictionaries are not hashable). I'm hoping your actual data is:

data = {"id": [1, 2, 3], "file": [.......]}

a dictionary with two keys.

So for me, the simplest would be:

# Create a set for faster testing
to_exclude_set = set(to_exclude)
# Create (id, file) pairs for the pairs we want to keep
pairs = [(id, file) for id, file in zip(data["id"], data["file"])
          if file not in to_exclude_set]
# Recreate the data structure
result = { 'id': [_ for id, _ in pairs],
           'file': [_ for _, file in pairs] }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM