简体   繁体   中英

Python Split Nested Dictionary with No Key Value Pairs

I have a nested dictionary with no key-value pairs. I am trying to separate them into a dataframe with separate columns and do not need to preserve the original structure. The intention is to separate each visible row into an actual row in a dataframe with the columns being named word , start_time , and end_time . I have tried to flatten it with flatdict , but since there is no named key it does not work.

Here is an example of the nested dictionary stored in the variable word_timestamps .

[[['hello', 3.06, 3.32]],
 [['hi', 4.2, 4.32],
  ['can', 4.54, 4.62],
  ['i', 4.66, 4.7],
  ['please', 4.74, 4.86],
  ['speak', 4.9, 5.04],
  ['to', 5.06, 5.14],
  ['ashley', 5.2, 5.56]],
 [['yeah', 6.84, 6.94],
  ['may', 7.04, 7.12],
  ['i', 7.12, 7.12],
  ['ask', 7.18, 7.28],
  ["who's", 7.36, 7.46],
  ['calling', 7.54, 7.86]]]

I can view individual "rows" of this successfully using this format word_timestamps[0] . This returns:

[['hello', 3.06, 3.32]]

Or I can access a single word using word_timestamps[0][0][0] which returns 'hello' .

How do I flatten the dictionary and get rid of the entire structure?

Edit: added everything below.

I used [value for sublist in word_timestamps for value in sublist] which returned the same answer as below. The full code used was:

df_word_timestamps = pd.DataFrame([value for sublist in word_timestamps for value in sublist], columns =["word", "from", "to"])

Which results in:

    word    from    to
0   hello   3.06    3.32
1   hi  4.20    4.32
2   can 4.54    4.62
3   i   4.66    4.70
4   please  4.74    4.86
... ... ... ...
1179    right   399.98  400.08
1180    bye-bye 400.64  400.86
1181    thanks  401.70  401.92
1182    bye 402.02  402.16
1183    bye 402.88  403.04

The reason I did this was so I could join a second dataframe on the matching start/stop times. The second dataframe contains the person that spoke these words. Together I can create a labeled transcript.

You're basically "adding" a set of sublists.

word_timestamps = [[['hello', 3.06, 3.32]],
 [['hi', 4.2, 4.32],
  ['can', 4.54, 4.62],
  ['i', 4.66, 4.7],
  ['please', 4.74, 4.86],
  ['speak', 4.9, 5.04],
  ['to', 5.06, 5.14],
  ['ashley', 5.2, 5.56]],
 [['yeah', 6.84, 6.94],
  ['may', 7.04, 7.12],
  ['i', 7.12, 7.12],
  ['ask', 7.18, 7.28],
  ["who's", 7.36, 7.46],
  ['calling', 7.54, 7.86]]]

combine = sum(word_timestamps, [])
print(combine)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM