[英]How to build a MultiIndex Pandas DataFrame from a nested dictionary with lists
[英]How to build a pandas dataframe from a nested for loop
我正在使用谷歌雲視頻智能 API,我正在嘗試將結果放入 Pandas 數據幀中。 API的輸出類是repeatcompositecontainer。 所以,我的想法是在 API 函數中使用的 for 循環中構建一個數據框。
這是 API 函數處理結果的方式:
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
在這篇 Stack Overflow 文章的幫助下,我創建了一個空列表並附加了結果,以便稍后將其轉換為 Pandas 數據框,如下所示:
df = []
# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
df.append({'Description': category_entity.description})
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
df.append({'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
當我只嘗試最后一個 for 循環時,它給了我一個很好的結構化數據框,如下所示
>>> frame = pd.DataFrame(df)
>>> frame
Confidence End Start
0.704168 599.682416 0.0
0.737053 599.682416 0.0
0.832496 599.682416 0.0
0.427637 599.682416 0.0
0.518693 599.682416 0.0
但是,當我將相同的邏輯添加到 for 循環時,它會給出一個扭曲的數據框,如下所示
>>> frame = pd.DataFrame(df)
>>> frame
Confidence Description End Start
NaN technology NaN NaN
0.741133 NaN 599.682416 0.0
NaN keyboard NaN NaN
0.328138 NaN 599.682416 0.0
NaN person NaN NaN
0.436333 NaN 599.682416 0.0
NaN person NaN NaN
我希望是否有辦法修復它並獲得如下數據框:
>>> frame = pd.DataFrame(df)
>>> frame
Confidence Description End Start
0.741133 technology 599.682416 0.0
0.328138 keyboard 599.682416 0.0
0.436333 person 599.682416 0.0
我接下來可以嘗試什么?
更改您的代碼,如下所示:
df = []
# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
label_row = {} # Create a dictionary for the label
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
# Add the description
label_row['Description'] = category_entity.description
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
row_segment_info = {'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
# Add the segment info for this row
label_row.update(row_segment_info)
df.append(label_row) # Now add the row
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
總之:您在每個子循環中添加行列表。 您只想添加該行一次。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.