[英]How to build a pandas dataframe from a nested for loop
I am working with the google cloud video intelligence API and I am trying to get the results into a pandas dataframe.我正在使用谷歌云视频智能 API,我正在尝试将结果放入 Pandas 数据帧中。 The output class of the API is repeatedcompositecontainer.
API的输出类是repeatcompositecontainer。 So, my thought was to build a dataframe inside the for loop used in the API function.
所以,我的想法是在 API 函数中使用的 for 循环中构建一个数据框。
This is how the API function process the results:这是 API 函数处理结果的方式:
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
With the help of this Stack Overflow article I created an empty list and appended the results to be later converted into a pandas dataframe as below:在这篇 Stack Overflow 文章的帮助下,我创建了一个空列表并附加了结果,以便稍后将其转换为 Pandas 数据框,如下所示:
df = []
# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
df.append({'Description': category_entity.description})
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
df.append({'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
When I tried only for the last for loop, it gives me a nice structured data frame as below当我只尝试最后一个 for 循环时,它给了我一个很好的结构化数据框,如下所示
>>> frame = pd.DataFrame(df)
>>> frame
Confidence End Start
0.704168 599.682416 0.0
0.737053 599.682416 0.0
0.832496 599.682416 0.0
0.427637 599.682416 0.0
0.518693 599.682416 0.0
However when I added the same to logic to the for loop, it gives a distorted dataframe as below但是,当我将相同的逻辑添加到 for 循环时,它会给出一个扭曲的数据框,如下所示
>>> frame = pd.DataFrame(df)
>>> frame
Confidence Description End Start
NaN technology NaN NaN
0.741133 NaN 599.682416 0.0
NaN keyboard NaN NaN
0.328138 NaN 599.682416 0.0
NaN person NaN NaN
0.436333 NaN 599.682416 0.0
NaN person NaN NaN
I was hoping if there is a way to fix it and get a data frame as below:我希望是否有办法修复它并获得如下数据框:
>>> frame = pd.DataFrame(df)
>>> frame
Confidence Description End Start
0.741133 technology 599.682416 0.0
0.328138 keyboard 599.682416 0.0
0.436333 person 599.682416 0.0
What can I try next?我接下来可以尝试什么?
Change your code like the following:更改您的代码,如下所示:
df = []
# Process video/segment level label annotations
segment_labels = result.annotation_results[0].segment_label_annotations
for i, segment_label in enumerate(segment_labels):
print('Video label description: {}'.format(
segment_label.entity.description))
label_row = {} # Create a dictionary for the label
for category_entity in segment_label.category_entities:
print('\tLabel category description: {}'.format(
category_entity.description))
# Add the description
label_row['Description'] = category_entity.description
for i, segment in enumerate(segment_label.segments):
start_time = (segment.segment.start_time_offset.seconds +
segment.segment.start_time_offset.nanos / 1e9)
end_time = (segment.segment.end_time_offset.seconds +
segment.segment.end_time_offset.nanos / 1e9)
positions = '{}s to {}s'.format(start_time, end_time)
confidence = segment.confidence
row_segment_info = {'Confidence': segment.confidence, 'Start': start_time, 'End': end_time})
# Add the segment info for this row
label_row.update(row_segment_info)
df.append(label_row) # Now add the row
print('\tSegment {}: {}'.format(i, positions))
print('\tConfidence: {}'.format(confidence))
print('\n')
In summary: you were adding lists of rows in each subloop.总之:您在每个子循环中添加行列表。 You want to add the row only once.
您只想添加该行一次。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.