遍歷有條件的詞典列表

Question

假設這里test是大量詞典的列表（這只是一個示例）：

  test = [
{'alignedWord': 'welcome',
  'case': 'success',
  'end': 0.9400000000000001,
  'start': 0.56
  'word': 'Welcome'},

 {'alignedWord': 'to',
  'case': 'success',
  'end': 1.01,
  'start': 0.94,
  'word': 'to'},

 {'alignedWord': 'story',
  'case': 'not-found-in-audio',
  'word': 'Story'},

 {'alignedWord': 'in',
  'case': 'success',
  'end': 1.4100000000000001,
  'start': 1.34,
  'word': 'in'},

 {'alignedWord': 'a',
  'case': 'success',
  'end': 1.44,
  'start': 1.41,
  'word': 'a'},

 {'alignedWord': 'bottle',
  'case': 'success',
  'end': 1.78,
  'start': 1.44,
  'word': 'Bottle'} ]

輸出為case =='success'和duration_s <10的每個連續塊的json文件：

Output:

{"text": "Welcome to", "duration_s": 0.45}
{"text": "in a bottle", "duration_s': 0.44}

duration = ('end' - 'start') #of the text

Answer 1

我在測試列表的中間添加了一個沒有start鍵和end鍵的新詞典，現在對您有用嗎？ 正如您所澄清的，我還更改了持續時間。

from collections import OrderedDict

# add 'duration' var to dicts (makes code in loop clearer)
for dict_ in list_of_dicts:
  try:
    dict_.update({'duration': dict_['end'] - dict_['start']})
  except KeyError:
    dict_['duration'] = 999


# initialize result_dict with keys we'll add to
rolling_duration = 0
result_dict = OrderedDict([('text', ''), ('duration', 0)])

# looping directly through objects as mentioned in comments
for dict_ in list_of_dicts:
  rolling_duration = rolling_duration + dict_['duration']
  #print(dict_['word'], dict_['duration'], rolling_duration)

  if dict_['case'] == 'success' and rolling_duration < 10:
    result_dict['text'] = (result_dict['text'] + " " + dict_['word']).lstrip()
    result_dict['duration'] = round(rolling_duration, 2)

  # print accrued results and reset dict / rolling duration
  else:
    if result_dict['text'] != '':
      print(json.dumps(result_dict))
    result_dict = OrderedDict([('text', ''), ('duration', 0)])
    rolling_duration = 0

# print final json result_dict after exiting loop
print(json.dumps(result_dict))

{“文本”：“歡迎使用”，“持續時間”：0.45}

{“文本”：“在瓶中”，“持續時間”：0.44}

Answer 2

這可以通過使用生成器來解決，該生成器可以根據需要生成最終詞典：

def split(it):
    it = iter(it)
    acc, duration = [], 0  # defaults
    for item in it:
        if item['case'] != 'success':   # split when there's a non-success
            if acc:
                yield {'text': ' '.join(acc), 'duration': duration}
                acc, duration = [], 0  # reset defaults

        else:
            tmp_duration = item['end'] - item['start']

            if tmp_duration + duration >= 10:  # split when the duration is too long
                if acc:
                    yield {'text': ' '.join(acc), 'duration': duration}
                acc, duration = [item['word']], tmp_duration  # new defaults

            else:
                acc.append(item['word'])
                duration += tmp_duration

    if acc:  # give the remaining items
        yield {'text': ' '.join(acc), 'duration': duration}

一個簡單的測試給出：

>>> list(split(test))
[{'duration': 0.45000000000000007, 'text': 'Welcome to'},
 {'duration': 0.44000000000000017, 'text': 'in a Bottle'}]

這可以很容易地轉儲到JSON文件中：

>>> import json
>>> json.dumps(list(split(test)))
'[{"text": "Welcome to", "duration": 0.45000000000000007}, {"text": "in a Bottle", "duration": 0.44000000000000017}]'

遍歷有條件的詞典列表

問題描述

2 個解決方案

解決方案1
0 已采納 2017-04-15 00:41:46

解決方案2
0 2017-04-15 02:35:31

遍歷有條件的詞典列表

問題描述

2 個解決方案

解決方案1 0 已采納 2017-04-15 00:41:46

解決方案2 0 2017-04-15 02:35:31

解決方案1
0 已采納 2017-04-15 00:41:46

解決方案2
0 2017-04-15 02:35:31