![](/img/trans.png)
[英]Eclipse and Android NDK compile C++ Code in another folder than JNI
[英]Code translated to C/NDK/JNI less efficient than Java original
这是我第一次不太深入地研究 NDK。
出于性能目的,我想将此代码重写为 NDK。 我的c
文件如下所示:
#include <jni.h>
#include <stdbool.h>
#include <stdio.h>
#include <time.h>
#include <android/log.h>
JNIEXPORT jbyteArray JNICALL
Java_com_company_app_tools_NV21FrameRotator_rotateNV21(JNIEnv *env, jclass thiz,
jbyteArray data, jbyteArray output,
jint width, jint height, jint rotation) {
clock_t start, end;
double cpu_time_used;
start = clock();
jbyte *dataPtr = (*env)->GetByteArrayElements(env, data, NULL);
jbyte *outputPtr = (*env)->GetByteArrayElements(env, output, NULL);
unsigned int frameSize = width * height;
bool swap = rotation % 180 != 0;
bool xflip = rotation % 270 != 0;
bool yflip = rotation >= 180;
for (unsigned int j = 0; j < height; j++) {
for (unsigned int i = 0; i < width; i++) {
unsigned int yIn = j * width + i;
unsigned int uIn = frameSize + (j >> 1u) * width + (i & ~1u);
unsigned int vIn = uIn + 1;
unsigned int wOut = swap ? height : width;
unsigned int hOut = swap ? width : height;
unsigned int iSwapped = swap ? j : i;
unsigned int jSwapped = swap ? i : j;
unsigned int iOut = xflip ? wOut - iSwapped - 1 : iSwapped;
unsigned int jOut = yflip ? hOut - jSwapped - 1 : jSwapped;
unsigned int yOut = jOut * wOut + iOut;
unsigned int uOut = frameSize + (jOut >> 1u) * wOut + (iOut & ~1u);
unsigned int vOut = uOut + 1;
outputPtr[yOut] = (jbyte) (0xff & dataPtr[yIn]);
outputPtr[uOut] = (jbyte) (0xff & dataPtr[uIn]);
outputPtr[vOut] = (jbyte) (0xff & dataPtr[vIn]);
}
}
(*env)->ReleaseByteArrayElements(env, data, dataPtr, 0);
(*env)->ReleaseByteArrayElements(env, output, outputPtr, 0);
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
char str[10];
sprintf(str, "%f", cpu_time_used * 1000);
__android_log_write(ANDROID_LOG_ERROR, "NV21FrameRotator", str);
return output;
}
这两个片段(链接 Java 及以上)都运行良好,但是当我测量处理持续时间时,它看起来在同一设备上 Java 版本大约需要 7 毫秒( Log.i(
Java 侧日志)和 C 12-13 毫秒......不应该更快,为什么不是?问题在哪里?
long micros = System.nanoTime() / 1000;
// ~7ms, Java
//data = rotateNV21(inputData, width, height, rotateCameraDegrees);
// ~12-13ms, C
NV21FrameRotator.rotateNV21(inputData, data, width, height, rotateCameraDegrees);
Log.d(TAG, "Last frame processing duration: " + (System.nanoTime() / 1000 - micros) + "µs");
附言。 Java 日志有时显示的持续时间比c
文件中的本机clock()
测量更短...示例日志:
NV21FrameRotator: 7.942000
NV21RotatorJava: Last frame processing duration: 7403µs
NV21FrameRotator: 7.229000
NV21RotatorJava: Last frame processing duration: 7166µs
NV21FrameRotator: 16.918000
NV21RotatorJava: Last frame processing duration: 20644µs
NV21FrameRotator: 19.594000
NV21RotatorJava: Last frame processing duration: 20479µs
NV21FrameRotator: 9.484000
NV21RotatorJava: Last frame processing duration: 7274µs
编辑: armeabi-v7a
compile_commands.json
(旧设备,我只构建这个)
[
{
"directory": "...app/.cxx/cmake/basicRelease/armeabi-v7a",
"command": "...sdk\\ndk\\21.0.6113669\\toolchains\\llvm\\prebuilt\\windows-x86_64\\bin\\clang.exe --target=armv7-none-linux-androideabi21 --gcc-toolchain=...sdk/ndk/21.0.6113669/toolchains/llvm/prebuilt/windows-x86_64 --sysroot=...sdk/ndk/21.0.6113669/toolchains/llvm/prebuilt/windows-x86_64/sysroot -DNV21FrameRotator_EXPORTS -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -march=armv7-a -mthumb -Wformat -Werror=format-security -Oz -DNDEBUG -fPIC -o CMakeFiles\\NV21FrameRotator.dir\\NV21FrameRotator.c.o -c ...app\\src\\main\\cpp\\NV21FrameRotator.c",
"file": "...app\\src\\main\\cpp\\NV21FrameRotator.c"
}
]
CMakeFile
:
cmake_minimum_required(VERSION 3.4.1)
add_library(NV21FrameRotator SHARED
NV21FrameRotator.c)
find_library(log-lib
log )
target_link_libraries(NV21FrameRotator
${log-lib} )
JNI 的开销非常高,尤其是在传递非 POD 类型或缓冲区时。 所以经常调用 JNI function 可能比 java 版本慢得多。
考虑改为传递 java.nio.ByteBuffer 以避免字节数组的潜在副本。
在真实设备上比较 Java 和 C 的性能,模拟器不会产生可靠的结果。
比较 Java 和 C 在发布版本上的性能,C 中的调试速度很慢,而 Java 仍然得到完整的 JIT(和 AOT)优化。
您可能会为您的场景寻找最佳优化选择。 默认情况下,版本将使用-Oz
。 为了更喜欢速度而不是大小,您可以添加到您的build.gradle :
android { buildTypes { release { externalNativeBuild.cmake.cFlags "-O3" } } }
您的 C 代码(实际上是原始的 Java 代码)需要进行一些优化。 主要的低效率(据我所知)是您重新计算每个 U 和 V 值四次。 简单的解决方法是拆分循环。
进一步优化可以避免内循环的乘法运算(在外循环中也可以去掉,但影响可以忽略不计):
#include <jni.h>
#include <stdbool.h>
#include <stdio.h>
#include <time.h>
#include <android/log.h>
JNIEXPORT jbyteArray JNICALL
Java_com_company_app_tools_NV21FrameRotator_rotateNV21(JNIEnv *env, jclass thiz,
jbyteArray data, jbyteArray output,
jint width, jint height, jint rotation) {
clock_t start, end;
double cpu_time_used;
start = clock();
jbyte *dataPtr = (*env)->GetByteArrayElements(env, data, NULL);
jbyte *outputPtr = (*env)->GetByteArrayElements(env, output, NULL);
unsigned int frameSize = width * height;
bool swap = rotation % 180 != 0;
bool xflip = rotation % 270 != 0;
bool yflip = rotation >= 180;
unsigned int wOut = swap ? height : width;
unsigned int hOut = swap ? width : height;
unsigned int yIn = 0;
for (unsigned int j = 0; j < height; j++) {
unsigned int iSwapped = swap ? j : 0;
unsigned int jSwapped = swap ? 0 : j;
unsigned int iOut = xflip ? wOut - iSwapped - 1 : iSwapped;
unsigned int jOut = yflip ? hOut - jSwapped - 1 : jSwapped;
unsigned int yOut = jOut * wOut + iOut;
for (unsigned int i = 0; i < width; i++) {
outputPtr[yOut] = dataPtr[yIn];
if (swap) {
yOut += yflip ? -wOut : wOut;
} else {
yOut += xflip ? -1 : 1;
}
yIn++;
}
}
unsigned int uIn = frameSize;
for (unsigned int j = 0; j < height; j+=2) {
unsigned int iSwapped = swap ? j : 0;
unsigned int jSwapped = swap ? 0 : j;
unsigned int iOut = xflip ? wOut - iSwapped - 1 : iSwapped;
unsigned int jOut = yflip ? hOut - jSwapped - 1 : jSwapped;
unsigned int uOut = frameSize + (jOut / 2) * wOut + (iOut & ~1u);
for (unsigned int i = 0; i < width; i+=2) {
unsigned int vIn = uIn + 1;
unsigned int vOut = uOut + 1;
outputPtr[uOut] = dataPtr[uIn];
outputPtr[vOut] = dataPtr[vIn];
if (swap) {
uOut += yflip ? -wOut : wOut;
} else {
uOut += xflip ? -2 : 2;
}
uIn += 2;
}
}
(*env)->ReleaseByteArrayElements(env, data, dataPtr, JNI_ABORT);
(*env)->ReleaseByteArrayElements(env, output, outputPtr, 0);
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
__android_log_print(ANDROID_LOG_ERROR, "NV21FrameRotator", "%.1f ms", cpu_time_used * 1000);
return output;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.