简体   繁体   中英

Pipe bytes from subprocess to file-like object in Python

I'd like accomplish the following in Python. I want to call a subprocess ( ffmpeg in this case, using the ffmpy3 wrapper) and directly pipe the process' output on to a file-like object that can be consumed by another function's open() call. Since audio and video data can become quite big, I explicitly don't ever want to load the process' output into memory as a whole, but only "stream" it in a buffered fashion. Here is some example code.

async def convert_and_process(file: FileIO):
    ff = ffmpy3.FFmpeg(
        inputs={str(file.name): None},
        outputs={'pipe:1': '-y -ac 1 -ar 16000 -acodec pcm_s16le -f wav'}
    )

    stdout: StreamReader = (await ff.run_async(stdout=subprocess.PIPE)).stdout

    with wave.open(help_needed, 'rb') as wf:
        # do stuff with wave file
        pass

Here is the code of run_async , it's just a simple wrapper around asyncio.create_subprocess_exec() .

My problem is basically just to turn the StreamReader returned by run_async() into a file-like object that can be consumed by wave.open() . Moreover, does this approach actually not load all output into memory, as Popen.wait() or Popen.communicate() would do?

I was thinking that os.pipe() might be useful, but I'm not sure how.

If your example is the true representation of your ultimate goal (read audio samples in blocks) then you can accomplish it much easier just with FFmpeg and its subprocess.Popen.stdout . If there are more to it than using wave library to read a memory-mapped.wav file, then please ignore this answer or clarify.

First a shameless plug, if you are willing to try another library, my ffmpegio can do what you want to do. Here is an example:

import ffmpegio

#audio stream reader
with ffmpegio.open(file,'ra', blocksize=1024, ac=1, ar=16000, 
                   sample_fmt='s16le') as f:
    for block in f: # block: [1024xchannels] ndarray
        do_your_thing(block)

blocksize argument sets the number of samples to retrieve at a time (so 1024 audio samples in this example).

This library is still pretty young, and if you have any issues please report on its GitHub Issues board.

Second, if you prefer to implement it yourself, it's actually fairly straight forward if you know the FFmpeg output stream formats AND you need only one stream (multiple streams could also be done easily under non-Windows, I think). For your example above, try the following:

ff = ffmpy3.FFmpeg(
        inputs={str(file.name): None},
        outputs={'pipe:1': '-y -ac 1 -ar 16000 -acodec pcm_s16le -f s16le'}
    )
stdout = (await ff.run_async(stdout=subprocess.PIPE)).stdout

nsamples = 1024 # read 1024 samples
itemsize = 2 # bytes, int16x1channel

while True:
    try:
        b = stdout.read(nsamples*itemsize)
        # you may need to check for len(b)=0 as well, not sure atm
    except BrokenPipeError:
        break
    x = np.frombuffer(b, nsamples, np.int16)
    # do stuff with audio samples in x

Note that I changed -f wav to -f s16le so only the raw samples are sent to stdout . Then stdout.read(n) is essentially identical to wave.readframes(n) except for what their n 's mean.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM