How to use python subprocess with bytes instead of files

Question

I can convert a mp4 to wav, using ffmpeg , by doing this:

ffmpeg -vn test.wav  -i test.mp4

I can also use subprocess to do the same, as long as my input and output are filepaths.

But what if I wanted to use ffmpeg directly on bytes or a "file-like" object like io.BytesIO() ?

Here's an attempt at it:

import subprocess
from io import BytesIO
b = BytesIO()

with open('test.mp4', 'rb') as stream:
    command = ['ffmpeg', '-i']
    proc = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=b)
    proc.communicate(input=stream.read())
    proc.wait()
    proc.stdin.close()
    proc.stdout.close()

Gives me:

---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-84-0ddce839ebc9> in <module>
      5 with open('test.mp4', 'rb') as stream:
      6     command = ['ffmpeg', '-i']
----> 7     proc = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=b)
...
   1486                 # Assuming file-like object
-> 1487                 c2pwrite = stdout.fileno()
   1488 
   1489             if stderr is None:

UnsupportedOperation: fileno

Of course, I could use temp files to funnel my bytes, but I'd like to be able to avoid writing to the disk (because this step is just one link in a pipeline of transformations).

Answer 1

Base on @thorwhalen's answer, here's how it would work from bytes to bytes. What you were probably missing @thorwhalen, is the actual pipe-to-pipe way to send and get data when interacting with a process. When sending bytes, the stdin should be closed before the process can read from it.

def from_bytes_to_bytes(
        input_bytes: bytes,
        action: str = "-f wav -acodec pcm_s16le -ac 1 -ar 44100")-> bytes or None:
    command = f"ffmpeg -y -i /dev/stdin -f nut {action} -"
    ffmpeg_cmd = subprocess.Popen(
        shlex.split(command),
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        shell=False
    )
    b = b''
    # write bytes to processe's stdin and close the pipe to pass
    # data to piped process
    ffmpeg_cmd.stdin.write(input_bytes)
    ffmpeg_cmd.stdin.close()
    while True:
        output = ffmpeg_cmd.stdout.read()
        if len(output) > 0:
            b += output
        else:
            error_msg = ffmpeg_cmd.poll()
            if error_msg is not None:
                break
    return b

Answer 2

Here is a partial answer: three functions showing how this can be done from file to file (for completeness), from bytes to file, and from file to bytes. The bytes to bytes solution is fighting back though.

import shlex
import subprocess

def from_file_to_file(input_file: str, output_file: str, action="-f wav -acodec pcm_s16le -ac 1 -ar 44100"):
    command = f"ffmpeg -i {input_file} {action} -vn {output_file}"
    subprocess.call(shlex.split(command))


def from_file_to_bytes(input_file: str, action="-f wav -acodec pcm_s16le -ac 1 -ar 44100"):
    command = f"ffmpeg -i {input_file} {action} -"

    ffmpeg_cmd = subprocess.Popen(
        shlex.split(command),
        stdout=subprocess.PIPE,
        shell=False
    )
    b = b''
    while True:
        output = ffmpeg_cmd.stdout.read()
        if len(output) > 0:
            b += output
        else:
            error_msg = ffmpeg_cmd.poll()
            if error_msg is not None:
                break
    return b


def from_bytes_to_file(input_bytes, output_file, action="-f wav -acodec pcm_s16le -ac 1"):
    command = f"ffmpeg -i /dev/stdin {action} -vn {output_file}"
    ffmpeg_cmd = subprocess.Popen(
        shlex.split(command),
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        shell=False
    )
    ffmpeg_cmd.communicate(input_bytes)

Answer 3

This is the solution I came up with recently although I had used AWS and GCP bucket objects as the input and output. I'm not an expert on python by any means but this got me the results I was after.

You need to install ffmpeg on your local machine and add it to the environment variables to have access to ffmpeg.

If you're using the cloud, ffmpeg comes pre-installed on google cloud functions and there is a Lambda Layer on the repository library for AWS that you can leverage.

Hopefully someone gets use out of this. :)

import subprocess

# tested against 'wav', 'mp3', 'flac', 'mp4'
desired_output = 'mp3'
track_input = 'C:\\Users\\.....\\track.wav'
track_output = f'C:\\Users\\......\\output_track.{desired_output}'

encoded_type = ''
format_for_conversion = desired_output 

if desired_output =='m4a':
    encoded_type= '-c:a aac'
    format_for_conversion = 'adts'

with open(track_input, "rb") as in_track_file:
    data = in_track_file.read()

input_track_data= bytearray(data)

# using pipe:0 refers to the stdin, pipe:1 refers to stdout
ffmpeg_command = f'ffmpeg  -i pipe:0 {encoded_type} -f {format_for_conversion} pipe:1 '

ffmpeg_process = subprocess.Popen(ffmpeg_command, stdin=subprocess.PIPE, stdout=subprocess.PIPE)

output_stream = ffmpeg_process.communicate(input_track_data)
# comes back as a tuple
output_bytes = output_stream[0]

with open(track_output, 'ab') as f:
    delete_content(f)
    f.write(output_bytes)

How to use python subprocess with bytes instead of files

Question

3 answers

solution1
8 ACCPTED 2020-05-11 16:24:53

solution2
3 2020-05-11 12:59:05

solution3
2 2020-09-12 13:56:30

How to use python subprocess with bytes instead of files

Question

3 answers

solution1 8 ACCPTED 2020-05-11 16:24:53

solution2 3 2020-05-11 12:59:05

solution3 2 2020-09-12 13:56:30

solution1
8 ACCPTED 2020-05-11 16:24:53

solution2
3 2020-05-11 12:59:05

solution3
2 2020-09-12 13:56:30