[英]Python Split Nested Dictionary with No Key Value Pairs
I have a nested dictionary with no key-value pairs.我有一个没有键值对的嵌套字典。 I am trying to separate them into a dataframe with separate columns and do not need to preserve the original structure.
我试图将它们分成具有单独列的数据帧,并且不需要保留原始结构。 The intention is to separate each visible row into an actual row in a dataframe with the columns being named
word
, start_time
, and end_time
.目的是将每个可见行分隔为数据
end_time
的实际行,其中列名为word
、 start_time
和end_time
。 I have tried to flatten it with flatdict
, but since there is no named key it does not work.我试图用
flatdict
把它flatdict
,但由于没有命名键,它不起作用。
Here is an example of the nested dictionary stored in the variable word_timestamps
.这是存储在变量
word_timestamps
中的嵌套字典的word_timestamps
。
[[['hello', 3.06, 3.32]],
[['hi', 4.2, 4.32],
['can', 4.54, 4.62],
['i', 4.66, 4.7],
['please', 4.74, 4.86],
['speak', 4.9, 5.04],
['to', 5.06, 5.14],
['ashley', 5.2, 5.56]],
[['yeah', 6.84, 6.94],
['may', 7.04, 7.12],
['i', 7.12, 7.12],
['ask', 7.18, 7.28],
["who's", 7.36, 7.46],
['calling', 7.54, 7.86]]]
I can view individual "rows" of this successfully using this format word_timestamps[0]
.我可以使用这种格式
word_timestamps[0]
成功查看单个“行”。 This returns:这将返回:
[['hello', 3.06, 3.32]]
Or I can access a single word using word_timestamps[0][0][0]
which returns 'hello'
.或者我可以使用返回
'hello'
word_timestamps[0][0][0]
访问单个单词。
How do I flatten the dictionary and get rid of the entire structure?如何展平字典并摆脱整个结构?
Edit: added everything below.编辑:在下面添加了所有内容。
I used [value for sublist in word_timestamps for value in sublist]
which returned the same answer as below.我使用
[value for sublist in word_timestamps for value in sublist]
返回与下面相同的答案。 The full code used was:使用的完整代码是:
df_word_timestamps = pd.DataFrame([value for sublist in word_timestamps for value in sublist], columns =["word", "from", "to"])
Which results in:结果是:
word from to
0 hello 3.06 3.32
1 hi 4.20 4.32
2 can 4.54 4.62
3 i 4.66 4.70
4 please 4.74 4.86
... ... ... ...
1179 right 399.98 400.08
1180 bye-bye 400.64 400.86
1181 thanks 401.70 401.92
1182 bye 402.02 402.16
1183 bye 402.88 403.04
The reason I did this was so I could join a second dataframe on the matching start/stop times.我这样做的原因是我可以在匹配的开始/停止时间加入第二个数据帧。 The second dataframe contains the person that spoke these words.
第二个数据帧包含说这些话的人。 Together I can create a labeled transcript.
我可以一起创建带标签的成绩单。
You're basically "adding" a set of sublists.您基本上是在“添加”一组子列表。
word_timestamps = [[['hello', 3.06, 3.32]],
[['hi', 4.2, 4.32],
['can', 4.54, 4.62],
['i', 4.66, 4.7],
['please', 4.74, 4.86],
['speak', 4.9, 5.04],
['to', 5.06, 5.14],
['ashley', 5.2, 5.56]],
[['yeah', 6.84, 6.94],
['may', 7.04, 7.12],
['i', 7.12, 7.12],
['ask', 7.18, 7.28],
["who's", 7.36, 7.46],
['calling', 7.54, 7.86]]]
combine = sum(word_timestamps, [])
print(combine)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.