简体   繁体   English

Pipe 字节从子进程到 Python 中的类似文件的 object

[英]Pipe bytes from subprocess to file-like object in Python

I'd like accomplish the following in Python.我想在 Python 中完成以下操作。 I want to call a subprocess ( ffmpeg in this case, using the ffmpy3 wrapper) and directly pipe the process' output on to a file-like object that can be consumed by another function's open() call. I want to call a subprocess ( ffmpeg in this case, using the ffmpy3 wrapper) and directly pipe the process' output on to a file-like object that can be consumed by another function's open() call. Since audio and video data can become quite big, I explicitly don't ever want to load the process' output into memory as a whole, but only "stream" it in a buffered fashion.由于音频和视频数据可能变得非常大,我明确不想将进程的 output 作为一个整体加载到 memory 中,而只是以缓冲的方式“流式传输”它。 Here is some example code.这是一些示例代码。

async def convert_and_process(file: FileIO):
    ff = ffmpy3.FFmpeg(
        inputs={str(file.name): None},
        outputs={'pipe:1': '-y -ac 1 -ar 16000 -acodec pcm_s16le -f wav'}
    )

    stdout: StreamReader = (await ff.run_async(stdout=subprocess.PIPE)).stdout

    with wave.open(help_needed, 'rb') as wf:
        # do stuff with wave file
        pass

Here is the code of run_async , it's just a simple wrapper around asyncio.create_subprocess_exec() . run_async的代码,它只是asyncio.create_subprocess_exec()的简单包装。

My problem is basically just to turn the StreamReader returned by run_async() into a file-like object that can be consumed by wave.open() .我的问题基本上只是将run_async()返回的StreamReader转换为可以被wave.open()使用的类似文件的 object 。 Moreover, does this approach actually not load all output into memory, as Popen.wait() or Popen.communicate() would do?此外,这种方法实际上不会像Popen.wait()Popen.communicate()那样将所有 output 加载到 memory 中吗?

I was thinking that os.pipe() might be useful, but I'm not sure how.我在想os.pipe()可能有用,但我不确定如何。

If your example is the true representation of your ultimate goal (read audio samples in blocks) then you can accomplish it much easier just with FFmpeg and its subprocess.Popen.stdout .如果您的示例是您最终目标的真实表示(以块为单位读取音频样本),那么您只需使用 FFmpeg 及其subprocess.Popen.stdout就可以更轻松地完成它。 If there are more to it than using wave library to read a memory-mapped.wav file, then please ignore this answer or clarify.如果除了使用wave库来读取 memory-mapped.wav 文件之外还有更多内容,请忽略此答案或澄清。

First a shameless plug, if you are willing to try another library, my ffmpegio can do what you want to do.先是一个不要脸的插件,如果你愿意尝试别的库,我的ffmpegio可以做你想做的。 Here is an example:这是一个例子:

import ffmpegio

#audio stream reader
with ffmpegio.open(file,'ra', blocksize=1024, ac=1, ar=16000, 
                   sample_fmt='s16le') as f:
    for block in f: # block: [1024xchannels] ndarray
        do_your_thing(block)

blocksize argument sets the number of samples to retrieve at a time (so 1024 audio samples in this example). blocksize参数设置一次检索的样本数(在本例中为 1024 个音频样本)。

This library is still pretty young, and if you have any issues please report on its GitHub Issues board.这个库还很年轻,如果您有任何问题,请在其 GitHub 问题板上报告。

Second, if you prefer to implement it yourself, it's actually fairly straight forward if you know the FFmpeg output stream formats AND you need only one stream (multiple streams could also be done easily under non-Windows, I think). Second, if you prefer to implement it yourself, it's actually fairly straight forward if you know the FFmpeg output stream formats AND you need only one stream (multiple streams could also be done easily under non-Windows, I think). For your example above, try the following:对于上面的示例,请尝试以下操作:

ff = ffmpy3.FFmpeg(
        inputs={str(file.name): None},
        outputs={'pipe:1': '-y -ac 1 -ar 16000 -acodec pcm_s16le -f s16le'}
    )
stdout = (await ff.run_async(stdout=subprocess.PIPE)).stdout

nsamples = 1024 # read 1024 samples
itemsize = 2 # bytes, int16x1channel

while True:
    try:
        b = stdout.read(nsamples*itemsize)
        # you may need to check for len(b)=0 as well, not sure atm
    except BrokenPipeError:
        break
    x = np.frombuffer(b, nsamples, np.int16)
    # do stuff with audio samples in x

Note that I changed -f wav to -f s16le so only the raw samples are sent to stdout .请注意,我将-f wav更改为-f s16le ,因此只有原始样本被发送到stdout Then stdout.read(n) is essentially identical to wave.readframes(n) except for what their n 's mean.然后stdout.read(n)基本上与wave.readframes(n)相同,除了它们的n的含义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM