简体   繁体   English

没有键值对的 Python 拆分嵌套字典

[英]Python Split Nested Dictionary with No Key Value Pairs

I have a nested dictionary with no key-value pairs.我有一个没有键值对的嵌套字典。 I am trying to separate them into a dataframe with separate columns and do not need to preserve the original structure.我试图将它们分成具有单独列的数据帧,并且不需要保留原始结构。 The intention is to separate each visible row into an actual row in a dataframe with the columns being named word , start_time , and end_time .目的是将每个可见行分隔为数据end_time的实际行,其中列名为wordstart_timeend_time I have tried to flatten it with flatdict , but since there is no named key it does not work.我试图用flatdict把它flatdict ,但由于没有命名键,它不起作用。

Here is an example of the nested dictionary stored in the variable word_timestamps .这是存储在变量word_timestamps中的嵌套字典的word_timestamps

[[['hello', 3.06, 3.32]],
 [['hi', 4.2, 4.32],
  ['can', 4.54, 4.62],
  ['i', 4.66, 4.7],
  ['please', 4.74, 4.86],
  ['speak', 4.9, 5.04],
  ['to', 5.06, 5.14],
  ['ashley', 5.2, 5.56]],
 [['yeah', 6.84, 6.94],
  ['may', 7.04, 7.12],
  ['i', 7.12, 7.12],
  ['ask', 7.18, 7.28],
  ["who's", 7.36, 7.46],
  ['calling', 7.54, 7.86]]]

I can view individual "rows" of this successfully using this format word_timestamps[0] .我可以使用这种格式word_timestamps[0]成功查看单个“行”。 This returns:这将返回:

[['hello', 3.06, 3.32]]

Or I can access a single word using word_timestamps[0][0][0] which returns 'hello' .或者我可以使用返回'hello' word_timestamps[0][0][0]访问单个单词。

How do I flatten the dictionary and get rid of the entire structure?如何展平字典并摆脱整个结构?

Edit: added everything below.编辑:在下面添加了所有内容。

I used [value for sublist in word_timestamps for value in sublist] which returned the same answer as below.我使用[value for sublist in word_timestamps for value in sublist]返回与下面相同的答案。 The full code used was:使用的完整代码是:

df_word_timestamps = pd.DataFrame([value for sublist in word_timestamps for value in sublist], columns =["word", "from", "to"])

Which results in:结果是:

    word    from    to
0   hello   3.06    3.32
1   hi  4.20    4.32
2   can 4.54    4.62
3   i   4.66    4.70
4   please  4.74    4.86
... ... ... ...
1179    right   399.98  400.08
1180    bye-bye 400.64  400.86
1181    thanks  401.70  401.92
1182    bye 402.02  402.16
1183    bye 402.88  403.04

The reason I did this was so I could join a second dataframe on the matching start/stop times.我这样做的原因是我可以在匹配的开始/停止时间加入第二个数据帧。 The second dataframe contains the person that spoke these words.第二个数据帧包含说这些话的人。 Together I can create a labeled transcript.我可以一起创建带标签的成绩单。

You're basically "adding" a set of sublists.您基本上是在“添加”一组子列表。

word_timestamps = [[['hello', 3.06, 3.32]],
 [['hi', 4.2, 4.32],
  ['can', 4.54, 4.62],
  ['i', 4.66, 4.7],
  ['please', 4.74, 4.86],
  ['speak', 4.9, 5.04],
  ['to', 5.06, 5.14],
  ['ashley', 5.2, 5.56]],
 [['yeah', 6.84, 6.94],
  ['may', 7.04, 7.12],
  ['i', 7.12, 7.12],
  ['ask', 7.18, 7.28],
  ["who's", 7.36, 7.46],
  ['calling', 7.54, 7.86]]]

combine = sum(word_timestamps, [])
print(combine)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM