[英]How to know which malloc is used?
The way I understand it, there exist many different malloc implementations: 我理解它的方式,存在许多不同的malloc实现:
Is there any way to determine which malloc is actually used on my (linux) system? 有没有办法确定我的(linux)系统上实际使用了哪个malloc?
I read that "due to ptmalloc2's threading support, it became the default memory allocator for linux." 我读到“由于ptmalloc2的线程支持,它成为了linux的默认内存分配器。” Is there any way for me to check this myself?
我有什么方法可以自己检查一下吗?
I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below: 我问,因为我似乎没有通过在下面的代码中对malloc循环进行并列化来加快速度:
for (int i = 1; i <= 16; i += 1 ) {
parallelMalloc(i);
}
void parallelMalloc(int parallelism, int mallocCnt = 10000000) {
omp_set_num_threads(parallelism);
std::vector<char*> ptrStore(mallocCnt);
boost::posix_time::ptime t1 = boost::posix_time::microsec_clock::local_time();
#pragma omp parallel for
for (int i = 0; i < mallocCnt; i++) {
ptrStore[i] = ((char*)malloc(100 * sizeof(char)));
}
boost::posix_time::ptime t2 = boost::posix_time::microsec_clock::local_time();
#pragma omp parallel for
for (int i = 0; i < mallocCnt; i++) {
free(ptrStore[i]);
}
boost::posix_time::ptime t3 = boost::posix_time::microsec_clock::local_time();
boost::posix_time::time_duration malloc_time = t2 - t1;
boost::posix_time::time_duration free_time = t3 - t2;
std::cout << " parallelism = " << parallelism << "\t itr = " << mallocCnt << "\t malloc_time = " <<
malloc_time.total_milliseconds() << "\t free_time = " << free_time.total_milliseconds() << std::endl;
}
which gives me an output of 这给了我一个输出
parallelism = 1 itr = 10000000 malloc_time = 1225 free_time = 1517
parallelism = 2 itr = 10000000 malloc_time = 1614 free_time = 1112
parallelism = 3 itr = 10000000 malloc_time = 1619 free_time = 687
parallelism = 4 itr = 10000000 malloc_time = 2325 free_time = 620
parallelism = 5 itr = 10000000 malloc_time = 2233 free_time = 550
parallelism = 6 itr = 10000000 malloc_time = 2207 free_time = 489
parallelism = 7 itr = 10000000 malloc_time = 2778 free_time = 398
parallelism = 8 itr = 10000000 malloc_time = 1813 free_time = 389
parallelism = 9 itr = 10000000 malloc_time = 1997 free_time = 350
parallelism = 10 itr = 10000000 malloc_time = 1922 free_time = 291
parallelism = 11 itr = 10000000 malloc_time = 2480 free_time = 257
parallelism = 12 itr = 10000000 malloc_time = 1614 free_time = 256
parallelism = 13 itr = 10000000 malloc_time = 1387 free_time = 289
parallelism = 14 itr = 10000000 malloc_time = 1481 free_time = 248
parallelism = 15 itr = 10000000 malloc_time = 1252 free_time = 297
parallelism = 16 itr = 10000000 malloc_time = 1063 free_time = 281
I read that "due to ptmalloc2's threading support, it became the default memory allocator for linux."
我读到“由于ptmalloc2的线程支持,它成为了linux的默认内存分配器。” Is there any way for me to check this myself?
我有什么方法可以自己检查一下吗?
glibc
internally uses ptmalloc2
and this isn't a recent development. glibc
内部使用ptmalloc2
,这不是最近的开发。 Either way, it's not terribly difficult to do getconf GNU_LIBC_VERSION
, then cross-check the version to see if ptmalloc2
is used in that version or not, but I'm willing to bet you'd be wasting your time. 无论哪种方式,做
getconf GNU_LIBC_VERSION
并不是非常困难,然后交叉检查版本以查看是否在该版本中使用了ptmalloc2
,但是我愿意打赌你会浪费你的时间。
I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below
我问,因为我似乎没有通过在下面的代码中对malloc循环进行并列化来加快速度
Turning your example into an MVCE (omitting code here for brevity), and compiling with g++ -Wall -pedantic -O3 -pthread -fopenmp
, with g++ 5.3.1
here are my results. 将您的示例转换为MVCE (此处为了简洁省略代码),并使用
g++ 5.3.1
编译g++ -Wall -pedantic -O3 -pthread -fopenmp
,这是我的结果。
With OpenMP: 使用OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 746 free_time = 263
parallelism = 2 itr = 10000000 malloc_time = 541 free_time = 267
parallelism = 3 itr = 10000000 malloc_time = 405 free_time = 259
parallelism = 4 itr = 10000000 malloc_time = 324 free_time = 221
parallelism = 5 itr = 10000000 malloc_time = 330 free_time = 242
parallelism = 6 itr = 10000000 malloc_time = 287 free_time = 244
parallelism = 7 itr = 10000000 malloc_time = 257 free_time = 226
parallelism = 8 itr = 10000000 malloc_time = 270 free_time = 225
parallelism = 9 itr = 10000000 malloc_time = 253 free_time = 225
parallelism = 10 itr = 10000000 malloc_time = 236 free_time = 226
parallelism = 11 itr = 10000000 malloc_time = 225 free_time = 239
parallelism = 12 itr = 10000000 malloc_time = 276 free_time = 258
parallelism = 13 itr = 10000000 malloc_time = 241 free_time = 228
parallelism = 14 itr = 10000000 malloc_time = 254 free_time = 225
parallelism = 15 itr = 10000000 malloc_time = 278 free_time = 272
parallelism = 16 itr = 10000000 malloc_time = 235 free_time = 220
23.87 user
2.11 system
0:10.41 elapsed
249% CPU
Without OpenMP: 没有OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 748 free_time = 263
parallelism = 2 itr = 10000000 malloc_time = 344 free_time = 256
parallelism = 3 itr = 10000000 malloc_time = 751 free_time = 254
parallelism = 4 itr = 10000000 malloc_time = 339 free_time = 262
parallelism = 5 itr = 10000000 malloc_time = 748 free_time = 253
parallelism = 6 itr = 10000000 malloc_time = 330 free_time = 256
parallelism = 7 itr = 10000000 malloc_time = 734 free_time = 260
parallelism = 8 itr = 10000000 malloc_time = 334 free_time = 259
parallelism = 9 itr = 10000000 malloc_time = 750 free_time = 256
parallelism = 10 itr = 10000000 malloc_time = 339 free_time = 255
parallelism = 11 itr = 10000000 malloc_time = 743 free_time = 267
parallelism = 12 itr = 10000000 malloc_time = 342 free_time = 261
parallelism = 13 itr = 10000000 malloc_time = 739 free_time = 252
parallelism = 14 itr = 10000000 malloc_time = 333 free_time = 252
parallelism = 15 itr = 10000000 malloc_time = 740 free_time = 252
parallelism = 16 itr = 10000000 malloc_time = 330 free_time = 252
13.38 user
4.66 system
0:18.08 elapsed
99% CPU
Parallelism seems to be faster by about 8 seconds. 并行性似乎更快约8秒。 Still not convinced?
还是不相信? OK.
好。 I went ahead and grabbed
dlmalloc
, ran make
to produce libmalloc.a
. 我继续抓住
dlmalloc
,运行make
来生成libmalloc.a
。 My new command like is g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc
我的新命令就像
g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc
With OpenMP: 使用OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 814 free_time = 277
I CTRL - C 'd after 37 seconds. 我在37秒后按CTRL - C.
Without OpenMP: 没有OpenMP:
parallelism = 1 itr = 10000000 malloc_time = 772 free_time = 271
parallelism = 2 itr = 10000000 malloc_time = 780 free_time = 272
parallelism = 3 itr = 10000000 malloc_time = 783 free_time = 272
parallelism = 4 itr = 10000000 malloc_time = 792 free_time = 277
parallelism = 5 itr = 10000000 malloc_time = 813 free_time = 281
parallelism = 6 itr = 10000000 malloc_time = 800 free_time = 275
parallelism = 7 itr = 10000000 malloc_time = 795 free_time = 277
parallelism = 8 itr = 10000000 malloc_time = 790 free_time = 273
parallelism = 9 itr = 10000000 malloc_time = 788 free_time = 277
parallelism = 10 itr = 10000000 malloc_time = 784 free_time = 276
parallelism = 11 itr = 10000000 malloc_time = 786 free_time = 284
parallelism = 12 itr = 10000000 malloc_time = 807 free_time = 279
parallelism = 13 itr = 10000000 malloc_time = 791 free_time = 277
parallelism = 14 itr = 10000000 malloc_time = 790 free_time = 273
parallelism = 15 itr = 10000000 malloc_time = 785 free_time = 276
parallelism = 16 itr = 10000000 malloc_time = 787 free_time = 275
6.48 user
11.27 system
0:17.81 elapsed
99% CPU
Pretty significant difference. 相当显着的差异。 I suspect that the issue lies within your more complicated code, or something's wrong with your benchmark.
我怀疑问题出在你更复杂的代码中,或者你的基准测试出了什么问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.