RTMP Broadcast packet body structure for Twitch

Question

I'm currently working on a project similar to OBS, where I'm capturing screen data, encoding it with the x264 library, and then broadcasting it to a twitch server.

Currently, the servers are accepting the data, but no video is being played - it buffers for a moment, then returns an error code "2000: network error"

Like OBS Classic, I'm dividing each NAL provided by x264 by its type, and then making changes to each

int frame_size = x264_encoder_encode(encoder, &nals, &num_nals, &pic_in, &pic_out);

    //sort the NAL's into their types and make necessary adjustments

    int timeOffset = int(pic_out.i_pts - pic_out.i_dts);

    timeOffset = htonl(timeOffset);//host to network translation, ensure the bytes are in the right format
    BYTE *timeOffsetAddr = ((BYTE*)&timeOffset) + 1;

    videoSection sect;
    bool foundFrame = false;

    uint8_t * spsPayload = NULL;
    int spsSize = 0;

    for (int i = 0; i<num_nals; i++) {
        //std::cout << "VideoEncoder: EncodedImages Size: " << encodedImages->size() << std::endl;
        x264_nal_t &nal = nals[i];
        //std::cout << "NAL is:" << nal.i_type << std::endl;

        //need to account for pps/sps, seems to always be the first frame sent
        if (nal.i_type == NAL_SPS) {
            spsSize = nal.i_payload;
            spsPayload = (uint8_t*)malloc(spsSize);
            memcpy(spsPayload, nal.p_payload, spsSize);
        } else if (nal.i_type == NAL_PPS){
            //pps always happens after sps
            if (spsPayload == NULL) {
                std::cout << "VideoEncoder: critical error, sps not set" << std::endl;
            }
            uint8_t * payload = (uint8_t*)malloc(nal.i_payload + spsSize);
            memcpy(payload, spsPayload, spsSize);
            memcpy(payload, nal.p_payload + spsSize, nal.i_payload);
            sect = { nal.i_payload + spsSize, payload, nal.i_type };
            encodedImages->push(sect);
        } else if (nal.i_type == NAL_SEI || nal.i_type == NAL_FILLER) { 
            //these need some bytes at the start removed
            BYTE *skip = nal.p_payload;
            while (*(skip++) != 0x1);
            int skipBytes = (int)(skip - nal.p_payload);

            int newPayloadSize = (nal.i_payload - skipBytes);

            uint8_t * payload = (uint8_t*)malloc(newPayloadSize);
            memcpy(payload, nal.p_payload + skipBytes, newPayloadSize);
            sect = { newPayloadSize, payload, nal.i_type };
            encodedImages->push(sect);

        } else if (nal.i_type == NAL_SLICE_IDR || nal.i_type == NAL_SLICE) { 
            //these packets need an additional section at the start
            BYTE *skip = nal.p_payload;
            while (*(skip++) != 0x1);
            int skipBytes = (int)(skip - nal.p_payload);

            std::vector<BYTE> bodyData;
            if (!foundFrame) {
                if (nal.i_type == NAL_SLICE_IDR) { bodyData.push_back(0x17); } else { bodyData.push_back(0x27); } //add a 17 or a 27 as appropriate
                bodyData.push_back(1);
                bodyData.push_back(*timeOffsetAddr);

                foundFrame = true;
            }

            //put into the payload the bodyData followed by the nal payload
            uint8_t * bodyDataPayload = (uint8_t*)malloc(bodyData.size());
            memcpy(bodyDataPayload, bodyData.data(), bodyData.size() * sizeof(BYTE));

            int newPayloadSize = (nal.i_payload - skipBytes);

            uint8_t * payload = (uint8_t*)malloc(newPayloadSize + sizeof(bodyDataPayload));
            memcpy(payload, bodyDataPayload, sizeof(bodyDataPayload));
            memcpy(payload + sizeof(bodyDataPayload), nal.p_payload + skipBytes, newPayloadSize);
            int totalSize = newPayloadSize + sizeof(bodyDataPayload);
            sect = { totalSize, payload, nal.i_type };
            encodedImages->push(sect);
        } else {
            std::cout << "VideoEncoder: Nal type did not match expected" << std::endl;
            continue;
        }
    }

The NAL payload data is then put into a struct, VideoSection, in a queue buffer

//used to transfer encoded data
struct videoSection {
    int frameSize;
    uint8_t* payload;
    int type;
};

After which it is picked up by the broadcaster, a few more changes are made, and then I call rtmp_send()

videoSection sect = encodedImages->front();
encodedImages->pop();

//std::cout << "Broadcaster: Frame Size: " << sect.frameSize << std::endl;

//two methods of sending RTMP data, _sendpacket and _write. Using sendpacket for greater control

RTMPPacket * packet;

unsigned char* buf = (unsigned char*)sect.payload;

int type = buf[0]&0x1f; //I believe &0x1f sets a 32bit limit
int len = sect.frameSize;
long timeOffset = GetTickCount() - rtmp_start_time;

//assign space packet will need
packet = (RTMPPacket *)malloc(sizeof(RTMPPacket)+RTMP_MAX_HEADER_SIZE + len + 9);
memset(packet, 0, sizeof(RTMPPacket) + RTMP_MAX_HEADER_SIZE);

packet->m_body = (char *)packet + sizeof(RTMPPacket) + RTMP_MAX_HEADER_SIZE;
packet->m_nBodySize = len + 9;

//std::cout << "Broadcaster: Packet Size: " << sizeof(RTMPPacket) + RTMP_MAX_HEADER_SIZE + len + 9 << std::endl;
//std::cout << "Broadcaster: Packet Body Size: " << len + 9 << std::endl;

//set body to point to the packetbody
unsigned char *body = (unsigned char *)packet->m_body;
memset(body, 0, len + 9);



//NAL_SLICE_IDR represents keyframe
//first element determines packet type
body[0] = 0x27;//inter-frame h.264
if (sect.type == NAL_SLICE_IDR) {
    body[0] = 0x17; //h.264 codec id
}


//-------------------------------------------------------------------------------
//this section taken from https://stackoverflow.com/questions/25031759/using-x264-and-librtmp-to-send-live-camera-frame-but-the-flash-cant-show
//in an effort to understand packet format. it does not resolve my previous issues formatting the data for twitch to play it

//sets body to be NAL unit
body[1] = 0x01;
body[2] = 0x00;
body[3] = 0x00;
body[4] = 0x00;

//>> is a shift right
//shift len to the right, and AND it
/*body[5] = (len >> 24) & 0xff;
body[6] = (len >> 16) & 0xff;
body[7] = (len >> 8) & 0xff;
body[8] = (len) & 0xff;*/

//end code sourced from https://stackoverflow.com/questions/25031759/using-x264-and-librtmp-to-send-live-camera-frame-but-the-flash-cant-show
//-------------------------------------------------------------------------------

//copy from buffer into rest of body
memcpy(&body[9], buf, len);

//DEBUG

//save individual packet body to a file with name rtmp[packetnum]
//determine why some packets do not have 0x27 or 0x17 at the start
//still happening, makes no sense given the above code

/*std::string fileLocation = "rtmp" + std::to_string(packCount++);
std::cout << fileLocation << std::endl;
const char * charConversion = fileLocation.c_str();

FILE* saveFile = NULL;
saveFile = fopen(charConversion, "w+b");//open as write and binary
if (!fwrite(body, len + 9, 1, saveFile)) {
    std::cout << "VideoEncoder: Error while trying to write to file" << std::endl;
}
fclose(saveFile);*/

//END DEBUG

//other packet details
packet->m_hasAbsTimestamp = 0;
packet->m_packetType = RTMP_PACKET_TYPE_VIDEO;
if (rtmp != NULL) {
    packet->m_nInfoField2 = rtmp->m_stream_id;
}
packet->m_nChannel = 0x04;
packet->m_headerType = RTMP_PACKET_SIZE_LARGE;
packet->m_nTimeStamp = timeOffset;

//send the packet
if (rtmp != NULL) {
    RTMP_SendPacket(rtmp, packet, TRUE);
}

I can see that Twitch is receiving the data in the inspector, at a steady 3kbps. so I'm sure something is wrong with how I'm adjusting the data before sending it. Can anyone advise me on what I'm doing wrong here?

Answer 1

The problems start before the code you included even. When you configure x264 be sure to set:

b_aud = 0;
b_repeat_headers = 0;
b_annexb = 0;

This will tell x264 to generate the format needed by rtmp, Then you can skip all the per-nal preprocessing.

For sps/pps use x264_encoder_headers to retrieve them after x264_encoder_open . Encode them into an "extradata" buffer as documented here Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream . This extradata goes into an rtmp "sequence header" packet before any frames are sent. Set the frame the AVCPacketType accordingly body[1] in your case, 0 for sequence header 1 for everything else,

body[0] = 0x27;
body[1] = 0;
body[2] = 0;
body[3] = 0;
body[4] = 0;
memcpy(&body[5], extradata, extradata_size);

body[2] through body[4] MUST be set to the frame cts ( pts - dts ) if you have b frames. If you want to set it to zero, configure x264 for baseline profile, but this will result in reduced image quality. Use the return code from x264_encoder_encode as the frame size, and write the whole frame in one go.

int frame_size = x264_encoder_encode(encoder, &nals, &num_nals, &pic_in, &pic_out);
if(frame_size) {
    int cts = pic_out->i_pts - pic_out->i_dts;
    body[0] = pic_out->b_keyframe ? 0x27 : 0x17;
    body[1] = 1;
    body[2] = cts>>16;
    body[3] = cts>>8;
    body[4] = cts;
    memcpy(&body[5], nals->p_payload, frame_size);
}

Finally, Twitch requires you also send an AAC audio stream. and be sure to set the keyframe interval to 2 seconds.

RTMP Broadcast packet body structure for Twitch

Question

1 answers

solution1
0 ACCPTED 2018-05-22 14:56:57

RTMP Broadcast packet body structure for Twitch

Question

1 answers

solution1 0 ACCPTED 2018-05-22 14:56:57

solution1
0 ACCPTED 2018-05-22 14:56:57