简体   繁体   English

如何知道使用哪个malloc?

[英]How to know which malloc is used?

The way I understand it, there exist many different malloc implementations: 我理解它的方式,存在许多不同的malloc实现:

  • dlmalloc – General purpose allocator dlmalloc - 通用分配器
  • ptmalloc2 – glibc ptmalloc2 - glibc
  • jemalloc – FreeBSD and Firefox jemalloc - FreeBSD和Firefox
  • tcmalloc – Google tcmalloc - 谷歌
  • libumem – Solaris libumem - Solaris

Is there any way to determine which malloc is actually used on my (linux) system? 有没有办法确定我的(linux)系统上实际使用了哪个malloc?

I read that "due to ptmalloc2's threading support, it became the default memory allocator for linux." 我读到“由于ptmalloc2的线程支持,它成为了linux的默认内存分配器。” Is there any way for me to check this myself? 我有什么方法可以自己检查一下吗?

I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below: 我问,因为我似乎没有通过在下面的代码中对malloc循环进行并列化来加快速度:

for (int i = 1; i <= 16; i += 1 ) {
    parallelMalloc(i);
}

 void parallelMalloc(int parallelism, int mallocCnt = 10000000) {

    omp_set_num_threads(parallelism);

    std::vector<char*> ptrStore(mallocCnt);

    boost::posix_time::ptime t1 = boost::posix_time::microsec_clock::local_time();

    #pragma omp parallel for
    for (int i = 0; i < mallocCnt; i++) {
        ptrStore[i] = ((char*)malloc(100 * sizeof(char)));
    }

    boost::posix_time::ptime t2 = boost::posix_time::microsec_clock::local_time();

    #pragma omp parallel for
    for (int i = 0; i < mallocCnt; i++) {
        free(ptrStore[i]);
    }

    boost::posix_time::ptime t3 = boost::posix_time::microsec_clock::local_time();


    boost::posix_time::time_duration malloc_time = t2 - t1;
    boost::posix_time::time_duration free_time   = t3 - t2;

    std::cout << " parallelism = "  << parallelism << "\t itr = " << mallocCnt <<  "\t malloc_time = " <<
            malloc_time.total_milliseconds() << "\t free_time = " << free_time.total_milliseconds() << std::endl;
}

which gives me an output of 这给了我一个输出

 parallelism = 1         itr = 10000000  malloc_time = 1225      free_time = 1517
 parallelism = 2         itr = 10000000  malloc_time = 1614      free_time = 1112
 parallelism = 3         itr = 10000000  malloc_time = 1619      free_time = 687
 parallelism = 4         itr = 10000000  malloc_time = 2325      free_time = 620
 parallelism = 5         itr = 10000000  malloc_time = 2233      free_time = 550
 parallelism = 6         itr = 10000000  malloc_time = 2207      free_time = 489
 parallelism = 7         itr = 10000000  malloc_time = 2778      free_time = 398
 parallelism = 8         itr = 10000000  malloc_time = 1813      free_time = 389
 parallelism = 9         itr = 10000000  malloc_time = 1997      free_time = 350
 parallelism = 10        itr = 10000000  malloc_time = 1922      free_time = 291
 parallelism = 11        itr = 10000000  malloc_time = 2480      free_time = 257
 parallelism = 12        itr = 10000000  malloc_time = 1614      free_time = 256
 parallelism = 13        itr = 10000000  malloc_time = 1387      free_time = 289
 parallelism = 14        itr = 10000000  malloc_time = 1481      free_time = 248
 parallelism = 15        itr = 10000000  malloc_time = 1252      free_time = 297
 parallelism = 16        itr = 10000000  malloc_time = 1063      free_time = 281

I read that "due to ptmalloc2's threading support, it became the default memory allocator for linux." 我读到“由于ptmalloc2的线程支持,它成为了linux的默认内存分配器。” Is there any way for me to check this myself? 我有什么方法可以自己检查一下吗?

glibc internally uses ptmalloc2 and this isn't a recent development. glibc内部使用ptmalloc2 ,这不是最近的开发。 Either way, it's not terribly difficult to do getconf GNU_LIBC_VERSION , then cross-check the version to see if ptmalloc2 is used in that version or not, but I'm willing to bet you'd be wasting your time. 无论哪种方式,做getconf GNU_LIBC_VERSION并不是非常困难,然后交叉检查版本以查看是否在该版本中使用了ptmalloc2 ,但是我愿意打赌你会浪费你的时间。

I am asking because I do not seem to get any speed up by paralellizing my malloc loop in the code below 我问,因为我似乎没有通过在下面的代码中对malloc循环进行并列化来加快速度

Turning your example into an MVCE (omitting code here for brevity), and compiling with g++ -Wall -pedantic -O3 -pthread -fopenmp , with g++ 5.3.1 here are my results. 将您的示例转换为MVCE (此处为了简洁省略代码),并使用g++ 5.3.1编译g++ -Wall -pedantic -O3 -pthread -fopenmp ,这是我的结果。

With OpenMP: 使用OpenMP:

 parallelism = 1     itr = 10000000  malloc_time = 746   free_time = 263
 parallelism = 2     itr = 10000000  malloc_time = 541   free_time = 267
 parallelism = 3     itr = 10000000  malloc_time = 405   free_time = 259
 parallelism = 4     itr = 10000000  malloc_time = 324   free_time = 221
 parallelism = 5     itr = 10000000  malloc_time = 330   free_time = 242
 parallelism = 6     itr = 10000000  malloc_time = 287   free_time = 244
 parallelism = 7     itr = 10000000  malloc_time = 257   free_time = 226
 parallelism = 8     itr = 10000000  malloc_time = 270   free_time = 225
 parallelism = 9     itr = 10000000  malloc_time = 253   free_time = 225
 parallelism = 10    itr = 10000000  malloc_time = 236   free_time = 226
 parallelism = 11    itr = 10000000  malloc_time = 225   free_time = 239
 parallelism = 12    itr = 10000000  malloc_time = 276   free_time = 258
 parallelism = 13    itr = 10000000  malloc_time = 241   free_time = 228
 parallelism = 14    itr = 10000000  malloc_time = 254   free_time = 225
 parallelism = 15    itr = 10000000  malloc_time = 278   free_time = 272
 parallelism = 16    itr = 10000000  malloc_time = 235   free_time = 220

23.87 user 
2.11 system 
0:10.41 elapsed 
249% CPU

Without OpenMP: 没有OpenMP:

 parallelism = 1     itr = 10000000  malloc_time = 748   free_time = 263
 parallelism = 2     itr = 10000000  malloc_time = 344   free_time = 256
 parallelism = 3     itr = 10000000  malloc_time = 751   free_time = 254
 parallelism = 4     itr = 10000000  malloc_time = 339   free_time = 262
 parallelism = 5     itr = 10000000  malloc_time = 748   free_time = 253
 parallelism = 6     itr = 10000000  malloc_time = 330   free_time = 256
 parallelism = 7     itr = 10000000  malloc_time = 734   free_time = 260
 parallelism = 8     itr = 10000000  malloc_time = 334   free_time = 259
 parallelism = 9     itr = 10000000  malloc_time = 750   free_time = 256
 parallelism = 10    itr = 10000000  malloc_time = 339   free_time = 255
 parallelism = 11    itr = 10000000  malloc_time = 743   free_time = 267
 parallelism = 12    itr = 10000000  malloc_time = 342   free_time = 261
 parallelism = 13    itr = 10000000  malloc_time = 739   free_time = 252
 parallelism = 14    itr = 10000000  malloc_time = 333   free_time = 252
 parallelism = 15    itr = 10000000  malloc_time = 740   free_time = 252
 parallelism = 16    itr = 10000000  malloc_time = 330   free_time = 252

13.38 user 
4.66 system 
0:18.08 elapsed 
99% CPU 

Parallelism seems to be faster by about 8 seconds. 并行性似乎更快约8秒。 Still not convinced? 还是不相信? OK. 好。 I went ahead and grabbed dlmalloc , ran make to produce libmalloc.a . 我继续抓住dlmalloc ,运行make来生成libmalloc.a My new command like is g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc 我的新命令就像g++ -Wall -pedantic -O3 -pthread -fopenmp -L$HOME/Development/test/dlmalloc/lib test.cpp -lmalloc

With OpenMP: 使用OpenMP:

parallelism = 1  itr = 10000000  malloc_time = 814   free_time = 277

I CTRL - C 'd after 37 seconds. 我在37秒后按CTRL - C.

Without OpenMP: 没有OpenMP:

 parallelism = 1     itr = 10000000  malloc_time = 772   free_time = 271
 parallelism = 2     itr = 10000000  malloc_time = 780   free_time = 272
 parallelism = 3     itr = 10000000  malloc_time = 783   free_time = 272
 parallelism = 4     itr = 10000000  malloc_time = 792   free_time = 277
 parallelism = 5     itr = 10000000  malloc_time = 813   free_time = 281
 parallelism = 6     itr = 10000000  malloc_time = 800   free_time = 275
 parallelism = 7     itr = 10000000  malloc_time = 795   free_time = 277
 parallelism = 8     itr = 10000000  malloc_time = 790   free_time = 273
 parallelism = 9     itr = 10000000  malloc_time = 788   free_time = 277
 parallelism = 10    itr = 10000000  malloc_time = 784   free_time = 276
 parallelism = 11    itr = 10000000  malloc_time = 786   free_time = 284
 parallelism = 12    itr = 10000000  malloc_time = 807   free_time = 279
 parallelism = 13    itr = 10000000  malloc_time = 791   free_time = 277
 parallelism = 14    itr = 10000000  malloc_time = 790   free_time = 273
 parallelism = 15    itr = 10000000  malloc_time = 785   free_time = 276
 parallelism = 16    itr = 10000000  malloc_time = 787   free_time = 275

6.48 user 
11.27 system 
0:17.81 elapsed 
99% CPU

Pretty significant difference. 相当显着的差异。 I suspect that the issue lies within your more complicated code, or something's wrong with your benchmark. 我怀疑问题出在你更复杂的代码中,或者你的基准测试出了什么问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM