简体   繁体   English

如何从Google的AudioSet中提取音频嵌入(功能)?

[英]How can I extract the audio embeddings (features) from Google’s AudioSet?

I'm talking about the audio features dataset available at https://research.google.com/audioset/download.html as a tar.gz archive consisting of frame-level audio tfrecords. 我在谈论https://research.google.com/audioset/download.html上提供的音频功能数据集,作为由帧级音频tfrecords组成的tar.gz存档。

Extracting everything else from the tfrecord files works fine (I could extract the keys: video_id, start_time_seconds, end_time_seconds, labels), but the actual embeddings needed for training do not seem to be there at all. 从tfrecord文件中提取其他所有内容工作正常(我可以提取密钥:video_id,start_time_seconds,end_time_seconds,标签),但培训所需的实际嵌入似乎根本不存在。 When I iterate over the contents of any tfrecord file from the dataset, only the four keys video_id, start_time_seconds, end_time_seconds, and labels, are printed. 当我从数据集迭代任何tfrecord文件的内容时,只打印四个键video_id,start_time_seconds,end_time_seconds和标签。

This is the code I'm using: 这是我正在使用的代码:

import tensorflow as tf
import numpy as np

def readTfRecordSamples(tfrecords_filename):

    record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)

    for string_record in record_iterator:
        example = tf.train.Example()
        example.ParseFromString(string_record)
        print(example)  # this prints the abovementioned 4 keys but NOT audio_embeddings

        # the first label can be then parsed like this:
        label = (example.features.feature['labels'].int64_list.value[0])
        print('label 1: ' + str(label))

        # this, however, does not work:
        #audio_embedding = (example.features.feature['audio_embedding'].bytes_list.value[0])

readTfRecordSamples('embeddings/01.tfrecord')

Is there any trick to extracting the 128-dimensional embeddings? 提取128维嵌入有什么技巧吗? Or are they really not in this dataset? 或者他们真的不在这个数据集中?

Solved it, the tfrecord files need to be read as sequence examples, not as examples. 解决了它,tfrecord文件需要作为序列示例读取,而不是作为示例。 The above code works if the line 以上代码适用于该行

example = tf.train.Example()

is replaced by 被替换为

example = tf.train.SequenceExample()

The embeddings and all other content can then be viewed by simply running 然后,只需运行即可查看嵌入和所有其他内容

print(example)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM