简体   繁体   English

iPhone openGLES性能调优

[英]iPhone openGLES performance tuning

I'm trying now for quite a while to optimize the framerate of my game without really making progress. 我现在正在尝试很长一段时间来优化我的游戏帧速率而不会取得进展。 I'm running on the newest iPhone SDK and have a iPhone 3G 3.1.2 device. 我正在运行最新的iPhone SDK并拥有iPhone 3G 3.1.2设备。

I invoke arround 150 drawcalls, rendering about 1900 Triangles in total (all objects are textured using two texturelayers and multitexturing. most textures come from the same textureAtlasTexture stored in pvrtc 2bpp compressed texture). 我调用了大约150个drawcalls,总共渲染了大约1900个三角形(所有对象都使用两个纹理层和多纹理进行纹理化。大多数纹理来自存储在pvrtc 2bpp压缩纹理中的相同textureAtlasTexture)。 This renders on my phone at arround 30 fps, which appears to me to be way too low for only 1900 triangles. 这在我的手机上以30 fps的速度呈现,在我看来,这对于仅1900个三角形来说太低了。

I tried many things to optimize the performance, including batching together the objects, transforming the vertices on the CPU and rendering them in a single drawcall. 我尝试了许多方法来优化性能,包括将对象批处理,在CPU上转换顶点以及在单个drawcall中渲染它们。 this yelds 8 drawcalls (as oposed to 150 drawcalls), but performance is about the same (fps drop to arround 26fps) 这个yelds 8 drawcalls(选择150个drawcalls),但性能大致相同(fps降至26fps左右)

I'm using 32byte vertices stored in an interleaved array (12bytes position, 12bytes normals, 8bytes uv). 我使用存储在交错数组中的32字节顶点(12字节位,12字节法线,8字节uv)。 I'm rendering triangleLists and the vertices are ordered in TriStrip order. 我正在渲染triangleLists,顶点按TriStrip顺序排序。

I did some profiling but I don't really know how to interprete it. 我做了一些分析,但我真的不知道如何解释它。

  1. instruments-sampling using Instruments and Sampling yelds this result: http://neo.cycovery.com/instruments_sampling.gif telling me that a lot of time is spent in "mach_msg_trap". 仪器 - 使用仪器采样和采样这样的结果: http ://neo.cycovery.com/instruments_sampling.gif告诉我在“mach_msg_trap”中花了很多时间。 I googled for it and it seems this function is called in order to wait for some other things. 我用谷歌搜索它,似乎调用这个函数是为了等待其他一些事情。 But wait for what?? 但等待什么?

  2. instruments-openGL instruments with the openGL module yelds this result: http://neo.cycovery.com/intstruments_openglES_debug.gif but here i have really no idea what those numbers are telling me 使用openGL模块的仪器 - openGL仪器大喊这个结果: http ://neo.cycovery.com/intstruments_openglES_debug.gif但是我真的不知道这些数字告诉我的是什么

  3. shark profiling: profiling with shark didn't tell me much either: http://neo.cycovery.com/shark_profile_release.gif the largest number is 10%, spent by DrawTriangles - and the whole rest is spent in very small percentage functions 鲨鱼剖析:使用鲨鱼进行剖析并没有告诉我太多: http//neo.cycovery.com/shark_profile_release.gif最大数字是10%,由DrawTriangles使用 - 其余部分用于非常小的百分比函数

Can anyone tell me what else I could do in order to figure out the bottleneck and could help me to interprete those profiling information? 任何人都可以告诉我,我还能做些什么来找出瓶颈并帮助我解释这些分析信息?

Thanks a lot! 非常感谢!

You're probably CPU-bound. 你可能是受CPU限制的。 The tiler/renderer utilization statistics in the OpenGL ES instrument show that the duty cycle of the GPU is between 20-30% for rendering at 20-30 fps, which suggests that the GPU could run at 60 fps if fed fast enough. OpenGL ES仪器中的平铺器/渲染器利用率统计显示GPU的占空比在20-30 fps之间呈现在20-30%之间,这表明如果足够快,GPU可以以60 fps运行。 It looks like there are a few things that you could do to get more information out of Instruments and Shark about what to pursue: 看起来你可以采取一些措施从仪器和鲨鱼那里获得更多信息:

By default, Sampler shows every sample from every thread, which means that mostly-idle helper threads created by system frameworks will dominate your view. 默认情况下,Sampler显示来自每个线程的每个样本,这意味着系统框架创建的大多数空闲辅助线程将主导您的视图。 To get a better idea of what the CPU is actually doing, make sure the Detail View is showing (third button from the left in the lower left corner) and change Sample Perspective to Running Sample Times to exclude samples where a thread is idle/blocked. 为了更好地了解CPU实际执行的操作,请确保显示详细信息视图(左下角左侧的第三个按钮)并将Sample Perspective更改为Running Sample Times以排除线程空闲/阻塞的示例。

I don't see any samples in the Shark trace from your app itself. 我从你的应用程序本身看不到Shark跟踪中的任何样本。 That may well be because your code is fast enough that it doesn't appear anywhere in the list of hot functions, but it might also be because Shark can't find symbols for your application. 这可能是因为您的代码足够快,以至于它不会出现在热门函数列表中的任何位置,但也可能是因为Shark无法为您的应用程序找到符号。 You might need to configure the search paths in its preferences or manually point Shark at your app binary. 您可能需要在其首选项中配置搜索路径,或者手动将Shark指向应用程序二进制文件。 Also, Shark defaults to showing a list of functions ordered by how much CPU time is spent in them. 此外,Shark默认显示按照它们花费多少CPU时间排序的函数列表。 It may be useful to change the view to something more like a regular call tree, so you can visualize how your overall render loop spends its time. 将视图更改为更像常规调用树的内容可能很有用,因此您可以直观地了解整个渲染循环的时间。 To do this, change the View option in the lower-right corner to “Tree (Top-Down).” (If you don't see your app name or functions here either, then Shark is definitely missing your symbols.) 要执行此操作,请将右下角的“视图”选项更改为“树(自上而下)。”(如果您在此处未看到应用程序名称或功能,则Shark肯定会丢失您的符号。)

I am unfortunately not well versed in OpenGL, but here are some things to stand out at me from the three results: 遗憾的是,我并不精通OpenGL,但从以下三个结果中我可以看到一些突出的东西:

1) From the Sampling instrument, you might have some kind of background web connection going? 1)从采样仪器,您可能会有某种背景网络连接?

2) The rendered utilization percentages seem low to me (though I don't know how to improve them). 2)渲染的利用率对我来说似乎很低(虽然我不知道如何改进它们)。

3) Even though 10% seems low, that seems like a good attack point - however it's almost equally suspect there is so much time spent in memcpy. 3)即使10%看起来很低,这似乎是一个很好的攻击点 - 然而几乎同样怀疑在memcpy上花了那么多时间。 Also ValidateState is kind of a largish amount and might be holding you back. 此外,ValidateState有点大,可能会阻止你。

Tool wise I think you are using the right tools to examine performance, you just need to think more about what those mean to your application. 工具方面我认为您正在使用正确的工具来检查性能,您只需要更多地考虑这些对您的应用程序意味着什么。

Without the full source, It's difficult to tell exactly what's happening. 没有完整的来源,很难确切地说出发生了什么。 The Instruments trace shows a 20% Render Utilization, which is a bit low. 仪器跟踪显示20%的渲染利用率,这有点低。 This probably means you're CPU bound. 这可能意味着你受CPU限制。 However, if this was the case I would expect to see more application specific sample points in your first trace. 但是,如果是这种情况,我希望在您的第一个跟踪中看到更多特定于应用程序的采样点。

My advice is to roll your own timing class. 我的建议是推出自己的计时课程。 Something like this (c++): 像这样的东西(c ++):

#include <sys/time.h>

class Timer
{
public:
    Timer()
    {
        gettimeofday(&m_time, NULL);
    }
    void Reset()
    {
        gettimeofday(&m_time, NULL);
    }
    // returns time since construction or Reset in microseconds.
    unsigned long GetTime() const
    {
        timeval now;
        gettimeofday(&now, NULL);
        unsigned long micros = (now.tv_sec-m_time.tv_sec)*1000000+
                               (now.tv_usec-m_time.tv_usec);
        return micros;
    }
    protected:
        timeval m_time;
};

Time your sections of code to know exactly where your time is being spent. 为您的代码部分计算时间,以准确了解您的时间花在哪里。

Also another quick fix is to disable the Thumb instruction set. 另一个快速解决方法是禁用Thumb指令集。 This could help your floating point performance 20% or more, at the expense of your executable size. 这可以帮助您的浮点性能达到20%或更高,但代价是可执行文件大小。

如果您使用glFlush或glFinish,请删除所有这些。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM