如何從Google的AudioSet中提取音頻嵌入（功能）？

Question

我在談論https://research.google.com/audioset/download.html上提供的音頻功能數據集，作為由幀級音頻tfrecords組成的tar.gz存檔。

從tfrecord文件中提取其他所有內容工作正常（我可以提取密鑰：video_id，start_time_seconds，end_time_seconds，標簽），但培訓所需的實際嵌入似乎根本不存在。 當我從數據集迭代任何tfrecord文件的內容時，只打印四個鍵video_id，start_time_seconds，end_time_seconds和標簽。

這是我正在使用的代碼：

import tensorflow as tf
import numpy as np

def readTfRecordSamples(tfrecords_filename):

    record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename)

    for string_record in record_iterator:
        example = tf.train.Example()
        example.ParseFromString(string_record)
        print(example)  # this prints the abovementioned 4 keys but NOT audio_embeddings

        # the first label can be then parsed like this:
        label = (example.features.feature['labels'].int64_list.value[0])
        print('label 1: ' + str(label))

        # this, however, does not work:
        #audio_embedding = (example.features.feature['audio_embedding'].bytes_list.value[0])

readTfRecordSamples('embeddings/01.tfrecord')

提取128維嵌入有什么技巧嗎？ 或者他們真的不在這個數據集中？

Answer 1

解決了它，tfrecord文件需要作為序列示例讀取，而不是作為示例。 以上代碼適用於該行

example = tf.train.Example()

被替換為

example = tf.train.SequenceExample()

然后，只需運行即可查看嵌入和所有其他內容

print(example)

如何從Google的AudioSet中提取音頻嵌入（功能）？

問題描述

1 個解決方案

解決方案1
3 2017-09-14 12:17:00

如何從Google的AudioSet中提取音頻嵌入（功能）？

問題描述

1 個解決方案

解決方案1 3 2017-09-14 12:17:00

解決方案1
3 2017-09-14 12:17:00