简体   繁体   English

从熊猫列表的字典中删除重复项

[英]Remove duplicates from dict of lists with pandas

I'm trying to extract dictionary from dataframe with no duplicates. 我正在尝试从没有重复的数据框中提取字典。

Here is the dataframe: 这是数据帧:

{'Country': {0: 'Japan', 1: 'China', 2: 'USA', 3: 'Russia', 4: 'Japan', 
5: 'Japan', 6: 'China'}, 'Port': {0: 'Yokohama', 1: 'Ningbo', 2: 
'Baltimore', 3: 'Moscow', 4: 'Tokyo', 5: 'Tokyo', 6: 'Shanghai'}}

I set the countries as keys and removed duplicates.Now I need to remove the duplicates from list 我将国家/地区设置为键并删除了重复项。现在,我需要从列表中删除重复项

import pandas as pd
a ={'Country': {0: 'Japan', 1: 'China', 2: 'USA', 3: 'Russia', 4: 'Japan', 
5: 'Japan', 6: 'China'}, 'Port': {0: 'Yokohama', 1: 'Ningbo', 2: 
'Baltimore', 3: 'Moscow', 4: 'Tokyo', 5: 'Tokyo', 6: 'Shanghai'}}

a_dict=a.groupby(['Country'])['Port'].apply(list).to_dict()
print(a_dict)

Output: 输出:

{'China': ['Ningbo', 'Shanghai'], 'Japan': ['Yokohama', 'Tokyo', 
'Tokyo'], 'Russia': ['Moscow'], 'USA': ['Baltimore']}

Expected output: 预期产量:

{'China': ['Ningbo', 'Shanghai'], 'Japan': ['Yokohama', 'Tokyo'], 
'Russia': ['Moscow'], 'USA': ['Baltimore']}

Use drop_duplicates along with your code: 与代码一起使用drop_duplicates

d = df.drop_duplicates().groupby(['Country'])['Port'].apply(list).to_dict()

print(d)
{'China': ['Ningbo', 'Shanghai'], 'Japan': ['Yokohama', 'Tokyo'], 
 'Russia': ['Moscow'], 'USA': ['Baltimore']}

GroupBy.apply with set GroupBy.applyset

df.groupby('Country')['Port'].apply(set).map(list).to_dict()

If you don't care that your output is a dict of lists or dict of sets, this will simplify to 如果您不在乎输出是列表或集合的字典,这将简化为

df.groupby('Country')['Port'].apply(set).to_dict()

defaultdict

from collections import defaultdict

d = defaultdict(set)
for c, p in zip(df['Country'], df['Port']):
    d[c].add(p)

{k: list(v) for k, v in d.items()}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM