![](/img/trans.png)
[英]How to iterate through 2 zipped list of dictionaries under certain conditions?
[英]Iterate through list of dictionaries with conditions
假設這里test是大量詞典的列表(這只是一個示例):
test = [
{'alignedWord': 'welcome',
'case': 'success',
'end': 0.9400000000000001,
'start': 0.56
'word': 'Welcome'},
{'alignedWord': 'to',
'case': 'success',
'end': 1.01,
'start': 0.94,
'word': 'to'},
{'alignedWord': 'story',
'case': 'not-found-in-audio',
'word': 'Story'},
{'alignedWord': 'in',
'case': 'success',
'end': 1.4100000000000001,
'start': 1.34,
'word': 'in'},
{'alignedWord': 'a',
'case': 'success',
'end': 1.44,
'start': 1.41,
'word': 'a'},
{'alignedWord': 'bottle',
'case': 'success',
'end': 1.78,
'start': 1.44,
'word': 'Bottle'} ]
輸出為case =='success'和duration_s <10的每個連續塊的json文件:
Output:
{"text": "Welcome to", "duration_s": 0.45}
{"text": "in a bottle", "duration_s': 0.44}
duration = ('end' - 'start') #of the text
我在測試列表的中間添加了一個沒有start
鍵和end
鍵的新詞典,現在對您有用嗎? 正如您所澄清的,我還更改了持續時間。
from collections import OrderedDict
# add 'duration' var to dicts (makes code in loop clearer)
for dict_ in list_of_dicts:
try:
dict_.update({'duration': dict_['end'] - dict_['start']})
except KeyError:
dict_['duration'] = 999
# initialize result_dict with keys we'll add to
rolling_duration = 0
result_dict = OrderedDict([('text', ''), ('duration', 0)])
# looping directly through objects as mentioned in comments
for dict_ in list_of_dicts:
rolling_duration = rolling_duration + dict_['duration']
#print(dict_['word'], dict_['duration'], rolling_duration)
if dict_['case'] == 'success' and rolling_duration < 10:
result_dict['text'] = (result_dict['text'] + " " + dict_['word']).lstrip()
result_dict['duration'] = round(rolling_duration, 2)
# print accrued results and reset dict / rolling duration
else:
if result_dict['text'] != '':
print(json.dumps(result_dict))
result_dict = OrderedDict([('text', ''), ('duration', 0)])
rolling_duration = 0
# print final json result_dict after exiting loop
print(json.dumps(result_dict))
{“文本”:“歡迎使用”,“持續時間”:0.45}
{“文本”:“在瓶中”,“持續時間”:0.44}
這可以通過使用生成器來解決,該生成器可以根據需要生成最終詞典:
def split(it):
it = iter(it)
acc, duration = [], 0 # defaults
for item in it:
if item['case'] != 'success': # split when there's a non-success
if acc:
yield {'text': ' '.join(acc), 'duration': duration}
acc, duration = [], 0 # reset defaults
else:
tmp_duration = item['end'] - item['start']
if tmp_duration + duration >= 10: # split when the duration is too long
if acc:
yield {'text': ' '.join(acc), 'duration': duration}
acc, duration = [item['word']], tmp_duration # new defaults
else:
acc.append(item['word'])
duration += tmp_duration
if acc: # give the remaining items
yield {'text': ' '.join(acc), 'duration': duration}
一個簡單的測試給出:
>>> list(split(test))
[{'duration': 0.45000000000000007, 'text': 'Welcome to'},
{'duration': 0.44000000000000017, 'text': 'in a Bottle'}]
這可以很容易地轉儲到JSON文件中:
>>> import json
>>> json.dumps(list(split(test)))
'[{"text": "Welcome to", "duration": 0.45000000000000007}, {"text": "in a Bottle", "duration": 0.44000000000000017}]'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.