简体   繁体   中英

How to convert 16-bit audio created with Android's AudioRecord to 12-bit audio through bit shifting?

I am attempting to convert 16 bit audio into 12 bit audio. However, I am quite inexperienced with such conversions and believe my approach is possibly incorrect or flawed.

The use case, as context for the code snippets below, is an Android app which the user can speak into and that audio is transmitted to an IoT device for immediate playback. The IoT device expects audio in mono 12 bit, 8k sample rate, little endian, unsigned, with the data stored in the first twelve bits (0-11) and final four bits (12-15) are zeroes. Audio data needs to be received in packets of 1000 bytes.

The audio is being created in the Android app through the use of AudioRecord. The instantiation of which is as follows:

int bufferSize = 1000;
        this.audioRecord = new AudioRecord(
                MediaRecorder.AudioSource.MIC,
                8000,
                AudioFormat.CHANNEL_IN_MONO,
                AudioFormat.ENCODING_PCM_16BIT,
                bufferSize
        );

In a while loop, the AudioRecord is being read from by 1000 byte packets and modified to the specifications in the use case. Not sure this part is relevant, but for completeness:

byte[] buffer = new byte[1000];
            audioRecord.read(buffer, 0, buffer.length);
            byte[] modifiedBytes = convert16BitTo12Bit(buffer);

Then the modifiedBytes are sent off to the device.

Here are the methods which modify the bytes. Basically, to conform to the specifications, I am shifting the bits in each 16 bit set (tossing the least significant 4) and adding zeroes to the final four spots. I do this through BitSet.

    /**
     * Takes a byte array presented as 16 bit audio and converts it to 12 bit audio through bit
     * manipulation. Packets must be of 1000 bytes or no manipulation will occur and the input
     * will be immediately returned.
     */
    private byte[] convert16BitTo12Bit(byte[] input) {
        if (input.length == 1000) {
            for (int i = 0; i < input.length; i += 2) {
                Log.d(TAG, "convert16BitTo12Bit: pass #" + (i / 2));
                byte[] chunk = new byte[2];
                System.arraycopy(input, i, chunk, 0, 2);
                if (!isEmptyByteArray(chunk)) {
                    byte[] modifiedBytes = convertChunk(chunk);
                    System.arraycopy(
                            modifiedBytes,
                            0,
                            input,
                            i,
                            modifiedBytes.length
                    );
                }
            }
            return input;
        }
        Log.d(TAG, "convert16BitTo12Bit: Failed - input is not 1000 in length; it is " + input.length);
        return input;
    }

    /**
     * Converts 2 bytes 16 bit audio into 12 bit audio. If the input is not 2 bytes, the input
     * will be returned without manipulation.
     */
    private byte[] convertChunk(byte[] chunk) {
        if (chunk.length == 2) {
            BitSet bitSet = BitSet.valueOf(chunk);
            Log.d(TAG, "convertChunk: bitSet starts as " + bitSet.toString());
            modifyBitSet(bitSet);
            Log.d(TAG, "convertChunk: bitSet ends as " + bitSet.toString());
            return bitSet.toByteArray();
        }
        Log.d(TAG, "convertChunk: Failed = chunk is not 2 in length; it is " + chunk.length);
        return chunk;
    }

    /**
     * Removes the first four bits and shifts the rest to leave the final four bits as 0.
     */
    private void modifyBitSet(BitSet bitSet) {
        for (int i = 4; i < bitSet.length(); i++) {
            bitSet.set(i - 4, bitSet.get(i));
        }
        if (bitSet.length() > 8) {
            bitSet.clear(12, 16);
        } else {
            bitSet.clear(4, 8);
        }
    }

    /**
     * Returns true if the byte array input contains all zero bits.
     */
    private boolean isEmptyByteArray(byte[] input) {
        BitSet bitSet = BitSet.valueOf(input);
        return bitSet.isEmpty();
    }

Unfortunately, this approach produces subpar results. The audio is quite noisy and it is difficult to make out what someone is saying (but you can hear that words are being spoken).

I also have been playing around with just saving the bytes to a file and playing it back on Android through AudioTrack. I noticed that if I just remove the first four bits and do not shift anything, the audio actually sounds good, as such:

        private void modifyBitSet(BitSet bitSet) {
        bitSet.clear(0, 4);
    }

However, when played through the device, it sounds even worse, and I don't even think I can make out any words.

Clearly, my approach is not working here. Central question is how would one convert a 16 bit chunk into 12 bit audio and maintain audio quality given the requirement that the final four bits must be zero? Additionally, given my larger approach of using AudioRecord to obtain the audio, would such a solution for the prior question fit this use case?

Please let me know if there is anything more I can provide to clarify these questions and my intent.

Given that the audio is 16 bits but must be changed to 12 with four zeros at the end, four bits somewhere do have to be tossed.

Yes, of course and there is no other way, is there?

This is something quick that I can comeout with right now. Certainly not fully tested though. Only tested with input of 2 and 4 bytes. I'll leave it to you to test it.

    //Reminder :: Convert as many as possible.
    //Reminder :: To calculate the required size for store: 
    //if((bytes.length & 1) == 0) Math.round((bytes.length * 6) / 8F) : Math.round(((bytes.length - 1) * 6) / 8F).
    //Return :: Amount of converted bytes.
    public static final int convert16BitTo12Bit(final byte[] bytes, final byte[] store) 
    {
        final int size = bytes.length;
        int storeIndex = 0;
        //Copy the first 2 bytes into store.
        store[storeIndex++] = bytes[0]; store[storeIndex] = bytes[1];
        if(size < 4) {
               store[storeIndex] = (byte)(store[storeIndex] & 0xF0);
               return 2;
                }
        final int result;
        final byte tmp;
        //  11111111 11110000 00000000 00000000
        //+              11111111 11110000      (<< 12)
        //= 11111111 11111111 11111111 00000000 (1)
        //-----------------------------------------
        //  11111111 00000000 00000000 00000000 (1)
        //+          11111111 11110000          (<< 16)
        //= 11111111 11111111 11110000 00000000 (2)
        //-----------------------------------------
        //  11110000 00000000 00000000 00000000 (2)
        //+     1111 11111111 0000              (<< 20)
        //= 11111111 11111111 00000000 00000000 (3)
        //-----------------------------------------
        //  00000000 00000000 00000000 00000000 (3)
        //+ 11111111 11110000                   (<< 24)
        //= 11111111 11110000 00000000 00000000
        for(int i=2, shiftBits = 12; i < size; i += 2) {
            if(shiftBits == 24) {
                //Copy 2 bytes from bytes[] into store[] and move on.
                store[storeIndex] = bytes[i];
                //Never store byte 0 (Garbage).
                tmp = (byte)(bytes[i + 1] & 0xF0); //Bit order: 11110000.
                if(tmp != 0) store[++storeIndex] = tmp;
                shiftBits = 12; //Reset
            } else if(shiftBits == 20) {
                result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
                    | (((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));
                store[storeIndex] = (byte)((result >> 24) & 0xFF);
                tmp = (byte)((result >> 16) & 0xFF);
                //Never store byte 0 (Garbage).
                if(tmp != 0) store[++storeIndex] = tmp;
                shiftBits = 24;
            } else if(shiftBits == 16) {
                result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
                    | (((bytes[i] & 0xFF) << 16) | ((bytes[i + 1] & 0xFF) << 8));
                store[storeIndex] = (byte)((result >> 16) & 0xFF);
                tmp = (byte)((result >> 8) & 0xF0);
                //Never store byte 0 (Garbage).
                if(tmp != 0) store[++storeIndex] = tmp;
                shiftBits = 20;
            } else {
                result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
                    | (((bytes[i] & 0xFF) << 12) | ((bytes[i + 1] & 0xFF) << 4));
                store[storeIndex] = (byte)((result >> 16) & 0xFF);
                tmp = (byte)((result >> 8) & 0xFF);
                //Never store byte 0 (Garbage).
                if(tmp != 0) store[++storeIndex] = tmp;
                shiftBits = 16;
            }
        }
        return ++storeIndex;
    }

Explanations

result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
                    | (((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));
  • What this does is basically merge two integers into one.
((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
  • The first one is make an integer with same constant bit position.
(((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));
  • The latter is for 2 current bytes with different bit positions.
(...) | (...)
  • Pipe or vertical bar at the middle is to merge these two integers we've just created into one.

Usage

To use this method is pretty straight forward.

    byte[] buffer = new byte[1000];
    byte[] store;
    if((buffer.length & 1) == 0) { //Even.
        store = new byte[Math.round((bytes.length * 6) / 8F)];
    } else { //Odd.
        store = new byte[Math.round(((bytes.length - 1) * 6) / 8F)]; 
    }
    audioRecord.read(buffer, 0, buffer.length);
    int convertedByteSize = convert16BitTo12Bit(buffer, store);
    System.out.println("size: " + convertedByteSize);

I have discovered a solution that produces clear audio. First, it is important to recount the requirements for the use case, which is 12 bit unsigned mono audio which will be read in little endian by the device in packets of 1000 bytes.

The initialization and configuration of the AudioRecord as described in the question is fine.

Once the 1000 bytes of audio is read from AudioRecord, it can be put into a ByteBuffer and defined as little endian for modification, and then put into a ShortBuffer to do manipulation on the 16 bit level:

        // Audio specifications of device is in little endian.
        ByteBuffer byteBuffer = ByteBuffer.wrap(input).order(ByteOrder.LITTLE_ENDIAN);
        // Turn into a ShortBuffer so bitwise manipulation can occur on the 16 bit level.
        ShortBuffer shortBuffer = byteBuffer.asShortBuffer();

Next, in a loop, take each short and modify it to 12 bit unsigned:

        for (int i = 0; i < shortBuffer.capacity(); i++) {
            short currentShort = shortBuffer.get(i);
            shortBuffer.put(i, convertShortTo12Bit(currentShort));
        }

This can be accomplished by shifting the 16 bits four spaces to the right to turn it into 12 bit signed. Then, to convert to unsigned, add 2048. For our purposes as a safety step, we also mask the least significant four bits as required by device, but given the shifting and adding, it shouldn't be the case that any bits actually remain there:

    private static short convertShortTo12Bit(short input) {
        int inputAsInt = input;
        inputAsInt >>>= 4;
        inputAsInt += 2048;
        input = (short) (inputAsInt & 0B0000111111111111);
        return input;
    }

If one wishes to return 12 bits to 16 bits, do the reverse for each short (subtract 2048 and shift four spaces to the left).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM