每秒获取音频文件的最大幅度

Question

I know there are some similar questions here, but most of them are concerning generating waveform images , which is not what I want.我知道这里有一些类似的问题，但大多数都与生成波形图像有关，这不是我想要的。

My goal is to generate a waveform visualization for an audio file, similar to SoundCloud, but not an image.我的目标是为音频文件生成波形可视化，类似于 SoundCloud，但不是图像。 I'd like to have the max amplitude data for each second (or half second) of an audio clip in an array.我想要数组中音频剪辑的每一秒（或半秒）的最大幅度数据。 I could then use this data to create a CSS-based visualization.然后我可以使用这些数据来创建基于 CSS 的可视化。

Ideally I'd like to get an array that has all the amplitude values for each second as a percentage of the maximum amplitude of the entire audio file.理想情况下，我希望得到一个数组，其中每秒的所有幅度值占整个音频文件最大幅度的百分比。 Here's an example:这是一个例子：

[
    0.0,  # Relative max amplitude of first second of audio clip (0%)
    0.04,  # Relative max amplitude of second second of audio clip (4%)
    0.15,  # Relative max amplitude of third second of audio clip (15%)
    # Some more
    1.0,  # The highest amplitude of the whole audio clip will be 1.0 (100%)
]

I assume I'll have to use at least numpy and Python's wave module, but I'm not sure how to get the data I want.我假设我至少必须使用numpy和 Python 的wave模块，但我不确定如何获得我想要的数据。 I'd like to use Python but I'm not completely against using some kind of command-line tool.我想使用 Python 但我并不完全反对使用某种命令行工具。

Answer 1

If you allow gstreamer, here is a little script that could do the trick.如果您允许 gstreamer，这里有一个小脚本可以解决问题。 It accept any audio file that gstreamer can handle.它接受 gstreamer 可以处理的任何音频文件。

Construct a gstreamer pipeline, use audioconvert to reduce the channels to 1, and use level module to get peaks构建一个gstreamer管道，使用audioconvert将通道减少到1，使用level模块获取峰值
Run the pipeline until EOS is hit运行管道直到 EOS 被命中
Normalize the peaks from the min/max found.从找到的最小值/最大值标准化峰值。

Snippet:片段：

import os, sys, pygst
pygst.require('0.10')
import gst, gobject
gobject.threads_init()

def get_peaks(filename):
    global do_run

    pipeline_txt = (
        'filesrc location="%s" ! decodebin ! audioconvert ! '
        'audio/x-raw-int,channels=1,rate=44100,endianness=1234,'
        'width=32,depth=32,signed=(bool)True !'
        'level name=level interval=1000000000 !'
        'fakesink' % filename)
    pipeline = gst.parse_launch(pipeline_txt)

    level = pipeline.get_by_name('level')
    bus = pipeline.get_bus()
    bus.add_signal_watch()

    peaks = []
    do_run = True

    def show_peak(bus, message):
        global do_run
        if message.type == gst.MESSAGE_EOS:
            pipeline.set_state(gst.STATE_NULL)
            do_run = False
            return
        # filter only on level messages
        if message.src is not level or \
           not message.structure.has_key('peak'):
            return
        peaks.append(message.structure['peak'][0])

    # connect the callback
    bus.connect('message', show_peak)

    # run the pipeline until we got eos
    pipeline.set_state(gst.STATE_PLAYING)
    ctx = gobject.gobject.main_context_default()
    while ctx and do_run:
        ctx.iteration()

    return peaks

def normalize(peaks):
    _min = min(peaks)
    _max = max(peaks)
    d = _max - _min
    return [(x - _min) / d for x in peaks]

if __name__ == '__main__':
    filename = os.path.realpath(sys.argv[1])
    peaks = get_peaks(filename)

    print 'Sample is %d seconds' % len(peaks)
    print 'Minimum is', min(peaks)
    print 'Maximum is', max(peaks)

    peaks = normalize(peaks)
    print peaks

And one output example:还有一个 output 示例：

$ python gstreamerpeak.py 01\ Tron\ Legacy\ Track\ 1.mp3 
Sample is 182 seconds
Minimum is -349.999999922
Maximum is -2.10678956719
[0.0, 0.0, 0.9274581631597019, 0.9528318436488018, 0.9492396611762614,
0.9523404330322813, 0.9471685835966183, 0.9537281219301242, 0.9473486577135167,
0.9479292126411365, 0.9538221105563514, 0.9483845795252251, 0.9536790832823281,
0.9477264933378022, 0.9480077366961968, ...

每秒获取音频文件的最大幅度

问题描述

1 个解决方案

解决方案1
3 已采纳 2012-02-19 00:39:01

每秒获取音频文件的最大幅度

问题描述

1 个解决方案

解决方案1 3 已采纳 2012-02-19 00:39:01

解决方案1
3 已采纳 2012-02-19 00:39:01