简体   繁体   中英

Faster encoding of realtime 3d graphics with opengl and x264

I am working on a system that sends a compressed video to a client from 3d graphics that are done in the server as soon as they are rendered. I already have the code working, but I feel it could be much faster (and it is already a bottleneck in the system)

Here is what I am doing:

First I grab the framebuffer

glReadBuffer( GL_FRONT );
glReadPixels( 0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, buffer ); 

Then I flip the framebuffer, because there is a weird bug with swsScale (which I am using for colorspace conversion) that flips the image vertically when I convert. I am flipping in advance, nothing fancy.

void VerticalFlip(int width, int height, byte* pixelData, int bitsPerPixel)
{
byte* temp = new byte[width*bitsPerPixel];
height--; //remember height array ends at height-1


for (int y = 0; y < (height+1)/2; y++) 
{
    memcpy(temp,&pixelData[y*width*bitsPerPixel],width*bitsPerPixel);
    memcpy(&pixelData[y*width*bitsPerPixel],&pixelData[(height-y)*width*bitsPerPixel],width*bitsPerPixel);
    memcpy(&pixelData[(height-y)*width*bitsPerPixel],temp,width*bitsPerPixel);
}
delete[] temp;
}

Then I convert it to YUV420p

convertCtx = sws_getContext(width, height, PIX_FMT_RGB24, width, height, PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL);
uint8_t *src[3]= {buffer, NULL, NULL}; 

sws_scale(convertCtx, src, &srcstride, 0, height, pic_in.img.plane, pic_in.img.i_stride);

Then I pretty much just call the x264 encoder. I am already using the zerolatency preset.

int frame_size = x264_encoder_encode(_encoder, &nals, &i_nals, _inputPicture, &pic_out);

My guess is that there should be a faster way to do this. Capturing the frame and converting it to YUV420p. It would be nice to convert it to YUV420p in the GPU and only after that copying it to system memory, and hopefully there is a way to do color conversion without the need to flip.

If there is no better way, at least this question may help someone trying to do this, to do it the same way I did.

First , use async texture read using PBOs.Here is example It speeds ups the read by using 2 PBOs which work asynchronously without stalling the pipeline like readPixels does when used directly.In my app I got 80% performance boost when switched to PBOs. Additionally , on some GPUs glGetTexImage() works faster than glReadPixels() so try it out.

But if you really want to take the video encoding to the next level you can do it via CUDA using Nvidia Codec Library .I recently asked the same question so this can be helpful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM