[英]Create a list of character-offsets in a dictionary from multiple columns of multiple csv files
我有兩個類似的csv文件,如下所示:
{http://www.omg.org/XMI}id,begin,end,Character
45440,34,45,Miss Parker
45455,137,147,Farrington
48976,295,298,Mr Alleyne
45533,890,900,Mr Alleyne
49020,2147,2154,Mr Alleyne
49020,2147,2154,Mr Alleyne
48606,2689,2696,Farrington
46858,3690,3693,Farrington
48680,5280,5291,clients
46880,5373,5376,Farrington
46728,5396,5407,clients
49057,5673,5683,clients
48734,6145,6155,Mr Alleyne
48734,6145,6155,Mr Alleyne
46699,6661,6664,Miss Delacour
49094,6969,6972,Farrington
48841,8451,8461,Mr Alleyne
48849,8466,8479,Miss Delacour
我希望能夠創建一個包含唯一字符提及作為鍵的字典,並添加其偏移量'begin'
和'end'
,而忽略列'{http://www.omg.org/XMI}id'
每次在兩個文件中提及時,各自的唯一字符又稱為密鑰。
我想要的輸出應如下所示:
print(dict_of_mentions)
輸出:
{'Farrington': [(137,147),(2689,2696) #etc...],
'Mr Alleyne': [(295,298), (890,900) #etc...], #rest of characters... }
到目前為止,我的代碼如下所示:
import tkinter
from tkinter import filedialog
def character_mentions():
filenames = filedialog.askopenfilenames()
for filename in filenames:
reader = csv.DictReader(open(filename))
dict_of_mentions = {}
for row in reader:
key = row.pop('Character')
if key in dict_of_mentions:
#implement duplicate row handling here
pass
dict_of_mentions[key] = row
print(dict_of_mentions)
輸出看起來像這樣:
{'Miss Parker': OrderedDict([('{http://www.omg.org/XMI}id', '45440'), ('begin', '34'), ('end', '45')]) 'Farrington': OrderedDict([('{http://www.omg.org/XMI}id', '46645'), ('begin', '22012'), ('end', '22014')]), 'Mr Alleyne': OrderedDict([('{http://www.omg.org/XMI}id', '47297'), ('begin', '13952'), ('end', '13962')]), 'clients': OrderedDict([('{http://www.omg.org/XMI}id', '49057'), ('begin', '5673'), ('end', '5683')]), 'Miss Delacour': OrderedDict([('{http://www.omg.org/XMI}id', '45867'), ('begin', '9101'), ('end', '9109')]), 'Everyone': OrderedDict([('{http://www.omg.org/XMI}id', '45836'), ('begin', '11896'), ('end', '11900')]), "Terry Kelly's clerk": OrderedDict([('{http://www.omg.org/XMI}id', '49278'), ('begin', '11971'), ('end', '11980')]), 'crowd': OrderedDict([('{http://www.omg.org/XMI}id', '49337'), ('begin', '12458'), ('end', '12471')]), 'office-girls': OrderedDict([('{http://www.omg.org/XMI}id', '49359'), ('begin', '12537'), ('end', '12549')]), 'Higgins': OrderedDict([('{http://www.omg.org/XMI}id', '45936'), ('begin', '13925'), ('end', '13927')]), 'friends': OrderedDict([('{http://www.omg.org/XMI}id', '49592'), ('begin', '17499'), ('end', '17506')]), 'boys': OrderedDict([('{http://www.omg.org/XMI}id', '47949'), ('begin', '17638'), ('end', '17649')]), 'one of the young women': OrderedDict([('{http://www.omg.org/XMI}id', '46257'), ('begin', '19945'), ('end', '19954')]), 'Weathers': OrderedDict([('{http://www.omg.org/XMI}id', '49643'), ('begin', '19881'), ('end', '19891')]), 'curate': OrderedDict([('{http://www.omg.org/XMI}id', '46142'), ('begin', '19094'), ('end', '19101')]), 'Ada': OrderedDict([('{http://www.omg.org/XMI}id', '46364'), ('begin', '20313'), ('end', '20316')]), 'Tom': OrderedDict([('{http://www.omg.org/XMI}id', '49804'), ('begin', '21852'), ('end', '21855')])}
任何幫助表示贊賞!
僅當不存在具有該名稱的密鑰時,這才構成密鑰
for row in reader:
key = row.pop('Character')
if key not in dict_of_mentions:
dict_of_mentions[key] = row
您可以使用itertools.groupby
輕松完成此操作
>>> import csv
>>> from itertools import groupby
>>> l = list(csv.reader(open('file.csv')))
>>> f = lambda x: x[-1]
>>> {k:[tuple(x[1:3]) for x in v] for k,v in groupby(sorted(l[1:], key=f), f)}
{'Farrington': [('137', '147'), ('2689', '2696'), ('3690', '3693'), ('5373', '5376'), ('6969', '6972')], 'Miss Delacour': [('6661', '6664'), ('8466', '8479')], 'Miss Parker': [('34', '45')], 'Mr Alleyne': [('295', '298'), ('890', '900'), ('2147', '2154'), ('2147', '2154'), ('6145', '6155'), ('6145', '6155'), ('8451', '8461')], 'clients': [('5280', '5291'), ('5396', '5407'), ('5673', '5683')]}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.