简体   繁体   English

根据键的值过滤字典列表

[英]Filter list of dictionaries based on the value of a key

if two values are identical in a list of dictionaries, I would like the list filtered with only one of the dictionaries.如果字典列表中的两个值相同,我希望仅使用其中一个字典过滤列表。 I do not care about the second (or third dictionary that matches).我不关心第二个(或第三个匹配的字典)。

crcs = [
        {'compress_name': 'file1.bin', 'crc': '55A0669C', 'name': 'R:\\filepath\\system\\compress1.zip'},
        {'compress_name': 'file3.bin', 'crc': '55A0669C', 'name': 'R:\\filepath\\system\\compress2.zip'},
        {'compress_name': 'file2.bin', 'crc': '66B07710', 'name': 'R:\\filepath\\system\\compress2.zip'},
        {'compress_name': 'file5.bin', 'crc': '66B07710', 'name': 'R:\\filepath\\system\\compress3.zip'}
    ]

expected results is a list of two dictionaries with differing "crc" values.预期结果是具有不同“crc”值的两个字典的列表。

[
        {'compress_name': 'file1.bin', 'crc': '55A0669C', 'name': 'R:\\filepath\\system\\compress1.zip'},
        {'compress_name': 'file2.bin', 'crc': '66B07710', 'name': 'R:\\filepath\\system\\compress2.zip'},
    ]

or any other combination of the CRC values matching 55A0669C and 66B07710.或匹配 55A0669C 和 66B07710 的 CRC 值的任何其他组合。 The list of dictionaries could be 400 or more items long.字典列表可能有 400 个或更多项。

I'm using python 3.7我正在使用 python 3.7

if it's only crc what need to be unique, then you can use如果只有 crc 需要唯一,那么你可以使用

crcs = [ {'compress_name': 'file1.bin', 'crc': '55A0669C', 'name': 'R:\filepath\system\compress1.zip'}, {'compress_name': 'file3.bin', 'crc': '55A0669C', 'name': 'R:\filepath\system\compress2.zip'}, {'compress_name': 'file2.bin', 'crc': '66B07710', 'name': 'R:\filepath\system\compress2.zip'}, {'compress_name': 'file5.bin', 'crc': '66B07710', 'name': 'R:\filepath\system\compress3.zip'} ]

crcs_all = []
crcs_uniq = []

for i in range(len(crcs)):
    crc = crcs[i]['crc']
    if crc not in crcs_all:
        crcs_all.append(crc)
        crcs_uniq.append(crcs[i])

print(crcs_uniq)

That will give you那会给你

    [ {'compress_name': 'file1.bin', 'crc': '55A0669C', 'name': 'R:\x0cilepath\\system\\compress1.zip'}, 
      {'compress_name': 'file2.bin', 'crc': '66B07710', 'name': 'R:\x0cilepath\\system\\compress2.zip'}]

Solution解决方案

You could use caste the list of dictionaries into a dataframe and then select the unique crc values.您可以使用种姓将字典列表转换为 dataframe 然后 select 唯一的crc值。 Finally, you could get the first occurences of the duplicate crc values by using list.index(crc) and store than in a list unique_idx .最后,您可以使用list.index(crc)获取重复的crc值的第一次出现,并将其存储在 list unique_idx中。 We use this unique_idx to filter out the relevant rows from the dataframe df and then extract that data as a dict .我们使用这个unique_idx从 dataframe df中过滤掉相关行,然后将该数据提取为dict

Short Solution简短的解决方案

import pandas as pd

df = pd.DataFrame(crcs)
unique_crcs = df.crc.unique().tolist()
unique_idx = []
for crc in unique_crcs:
    unique_idx.append(all_crcs.index(crc))

dfu = df.iloc[unique_idx]
dfu.T.to_dict()

Output : Output

{0: {'compress_name': 'file1.bin',
  'crc': '55A0669C',
  'name': 'R:\\filepath\\system\\compress1.zip'},
 2: {'compress_name': 'file2.bin',
  'crc': '66B07710',
  'name': 'R:\\filepath\\system\\compress2.zip'}}

Detailed Solution详细解决方案

1. Make Data 1.制作数据

import pandas as pd

crcs = [{'compress_name': 'file1.bin', 'crc': '55A0669C', 'name': r'R:\filepath\system\compress1.zip'}, 
        {'compress_name': 'file3.bin', 'crc': '55A0669C', 'name': r'R:\filepath\system\compress2.zip'}, 
        {'compress_name': 'file2.bin', 'crc': '66B07710', 'name': r'R:\filepath\system\compress2.zip'}, 
        {'compress_name': 'file5.bin', 'crc': '66B07710', 'name': r'R:\filepath\system\compress3.zip'} ]

df = pd.DataFrame(crcs)
print(df)

Output : Output

  compress_name       crc                              name
0     file1.bin  55A0669C  R:\filepath\system\compress1.zip
1     file3.bin  55A0669C  R:\filepath\system\compress2.zip
2     file2.bin  66B07710  R:\filepath\system\compress2.zip
3     file5.bin  66B07710  R:\filepath\system\compress3.zip

Select Unique CRC Rows Select 唯一 CRC 行

unique_crcs = df.crc.unique().tolist()
all_crcs = df.crc.to_list()

unique_idx = []
uniques = dict()
for crc in unique_crcs:
    idx = all_crcs.index(crc)
    uniques.update({crc: idx})
    unique_idx.append(idx)

print(uniques)
print(all_crcs)

Output : Output

{'55A0669C': 0, '66B07710': 2}
['55A0669C', '55A0669C', '66B07710', '66B07710']

Make Dict with Unique CRC Redords Only仅使用唯一的 CRC Redords 制作字典

dfu = df.iloc[unique_idx]
dfu.T.to_dict()

Output : Output

{0: {'compress_name': 'file1.bin',
  'crc': '55A0669C',
  'name': 'R:\\filepath\\system\\compress1.zip'},
 2: {'compress_name': 'file2.bin',
  'crc': '66B07710',
  'name': 'R:\\filepath\\system\\compress2.zip'}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM