簡體   English   中英

遍歷有條件的詞典列表

[英]Iterate through list of dictionaries with conditions

假設這里test是大量詞典的列表(這只是一個示例):

  test = [
{'alignedWord': 'welcome',
  'case': 'success',
  'end': 0.9400000000000001,
  'start': 0.56
  'word': 'Welcome'},

 {'alignedWord': 'to',
  'case': 'success',
  'end': 1.01,
  'start': 0.94,
  'word': 'to'},

 {'alignedWord': 'story',
  'case': 'not-found-in-audio',
  'word': 'Story'},

 {'alignedWord': 'in',
  'case': 'success',
  'end': 1.4100000000000001,
  'start': 1.34,
  'word': 'in'},

 {'alignedWord': 'a',
  'case': 'success',
  'end': 1.44,
  'start': 1.41,
  'word': 'a'},

 {'alignedWord': 'bottle',
  'case': 'success',
  'end': 1.78,
  'start': 1.44,
  'word': 'Bottle'} ]

輸出為case =='success'和duration_s <10的每個連續塊的json文件:

Output:

{"text": "Welcome to", "duration_s": 0.45}
{"text": "in a bottle", "duration_s': 0.44}

duration = ('end' - 'start') #of the text

我在測試列表的中間添加了一個沒有start鍵和end鍵的新詞典,現在對您有用嗎? 正如您所澄清的,我還更改了持續時間。

from collections import OrderedDict

# add 'duration' var to dicts (makes code in loop clearer)
for dict_ in list_of_dicts:
  try:
    dict_.update({'duration': dict_['end'] - dict_['start']})
  except KeyError:
    dict_['duration'] = 999


# initialize result_dict with keys we'll add to
rolling_duration = 0
result_dict = OrderedDict([('text', ''), ('duration', 0)])

# looping directly through objects as mentioned in comments
for dict_ in list_of_dicts:
  rolling_duration = rolling_duration + dict_['duration']
  #print(dict_['word'], dict_['duration'], rolling_duration)

  if dict_['case'] == 'success' and rolling_duration < 10:
    result_dict['text'] = (result_dict['text'] + " " + dict_['word']).lstrip()
    result_dict['duration'] = round(rolling_duration, 2)

  # print accrued results and reset dict / rolling duration
  else:
    if result_dict['text'] != '':
      print(json.dumps(result_dict))
    result_dict = OrderedDict([('text', ''), ('duration', 0)])
    rolling_duration = 0

# print final json result_dict after exiting loop
print(json.dumps(result_dict))

{“文本”:“歡迎使用”,“持續時間”:0.45}

{“文本”:“在瓶中”,“持續時間”:0.44}

這可以通過使用生成器來解決,該生成器可以根據需要生成最終詞典:

def split(it):
    it = iter(it)
    acc, duration = [], 0  # defaults
    for item in it:
        if item['case'] != 'success':   # split when there's a non-success
            if acc:
                yield {'text': ' '.join(acc), 'duration': duration}
                acc, duration = [], 0  # reset defaults

        else:
            tmp_duration = item['end'] - item['start']

            if tmp_duration + duration >= 10:  # split when the duration is too long
                if acc:
                    yield {'text': ' '.join(acc), 'duration': duration}
                acc, duration = [item['word']], tmp_duration  # new defaults

            else:
                acc.append(item['word'])
                duration += tmp_duration

    if acc:  # give the remaining items
        yield {'text': ' '.join(acc), 'duration': duration}

一個簡單的測試給出:

>>> list(split(test))
[{'duration': 0.45000000000000007, 'text': 'Welcome to'},
 {'duration': 0.44000000000000017, 'text': 'in a Bottle'}]

這可以很容易地轉儲到JSON文件中:

>>> import json
>>> json.dumps(list(split(test)))
'[{"text": "Welcome to", "duration": 0.45000000000000007}, {"text": "in a Bottle", "duration": 0.44000000000000017}]'

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM