簡體   English   中英

Python詞典:將多個值列表合並為一個唯一值列表

[英]Python dictionaries: merging multiple value lists into a single list of unique values

我只是使用Python 2.7學習Python。 我有一個包含兩列的csv文件。 列是:

Coll_id:條目可以是單個收集者,也可以是組

Participant_Coll_id:如果Coll_id是單個收集器,則該值將為null。 如果Coll_id是一個組,那么該組中的每個參與者將只有一行。

樣本在這里:

Coll_id,Participant_Coll_id<br>
ARA,ARG  
ARA,RAT  
ARG,NULL  
BRSAR,SGMB  
BRSAR,SANTM  
BRSAR,CRSR  
BRSAR,RAT  
CRSR,NULL  
DBY,NULL  
HZIE,NULL  
RAT,NULL  
SANTM,NULL  
SGMB,NULL  
ARG,NULL  
DRS,CRSR  
DRS,RAT  
DRS,ARG  

對於每個收集器(coll_id),我正在嘗試創建他們收集的所有其他收集器的列表。 我試圖將代碼拉到一起來執行以下操作,它現在非常接近:

#This is giving me a dictionary with each COLL_ID having a list of PARTICIPANT_COLL_IDs

with open('colls_mv1.csv', 'r') as f:
    reader = csv.DictReader(f, ['COLL_ID', 'PARTICIPANT_COLL_ID'])
    data1 = defaultdict(list)

    for line in reader:
        data1[line['COLL_ID']].append(line['PARTICIPANT_COLL_ID'])


#And this is giving me a dictionary with each PARTICIPANT_COLL_ID having a list of COLL_IDs
with open('colls_mv1.csv', 'r') as f:
    reader = csv.DictReader(f, ['COLL_ID', 'PARTICIPANT_COLL_ID'])
    data2 = defaultdict(list)

    for line in reader:
        if line['PARTICIPANT_COLL_ID'] != 'NULL':
            data2[line['PARTICIPANT_COLL_ID']].append(line['COLL_ID'])

dict3 = {k: [data1[i] for i in v] for k, v in data2.items()}

print dict3

我得到以下輸出:

{'SGMB': [['SGMB', 'SANTM', 'CRSR', 'RAT']], 'CRSR': [['SGMB', 'SANTM', 'CRSR', 'RAT'], ['CRSR', 'RAT', 'ARG']], 'RAT': [['ARG', 'RAT'], ['SGMB', 'SANTM', 'CRSR', 'RAT'], ['CRSR', 'RAT', 'ARG']], 'PARTICIPANT_COLL_ID': [['PARTICIPANT_COLL_ID']], 'ARG': [['ARG', 'RAT'], ['CRSR', 'RAT', 'ARG']], 'SANTM': [['SGMB', 'SANTM', 'CRSR', 'RAT']]}

我想要的是將值列表合並為每個鍵,刪除重復項並從值列表中刪除鍵:

{'SGMB': ['SANTM', 'CRSR', 'RAT'], 'CRSR': ['SGMB', 'SANTM', 'RAT', 'ARG'], 'RAT': ['ARG', 'SGMB', 'SANTM', 'CRSR'], 'PARTICIPANT_COLL_ID': [['PARTICIPANT_COLL_ID']], 'ARG': ['RAT', 'CRSR'], 'SANTM': ['SGMB', 'CRSR', 'RAT']}

迭代列表,刪除密鑰和重復數據刪除

>>> res = {'SGMB': [['SGMB', 'SANTM', 'CRSR', 'RAT']], 'CRSR': [['SGMB', 'SANTM', 'CRSR', 'RAT'], ['CRSR', 'RAT', 'ARG']], 'RAT': [['ARG', 'RAT'], ['SGMB', 'SANTM', 'CRSR', 'RAT'], ['CRSR', 'RAT', 'ARG']], 'PARTICIPANT_COLL_ID': [['PARTICIPANT_COLL_ID']], 'ARG': [['ARG', 'RAT'], ['CRSR', 'RAT', 'ARG']], 'SANTM': [['SGMB', 'SANTM', 'CRSR', 'RAT']]}
>>> newres = {k: list({x for t in v for x in t if x != k}) for k, v in res.iteritems()}
>>> newres
{'SGMB': ['CRSR', 'SANTM', 'RAT'], 'CRSR': ['SANTM', 'SGMB', 'RAT', 'ARG'], 'RAT': ['CRSR', 'SANTM', 'SGMB', 'ARG'], 'PARTICIPANT_COLL_ID': [], 'ARG': ['CRSR', 'RAT'], 'SANTM': ['CRSR', 'RAT', 'SGMB']}

演示: http//ideone.com/87HKM9

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM