简体   繁体   English

使用malloc为2D数组获取分段错误

[英]Getting segmentation fault using malloc for 2D array

I had initialized a 2D array using malloc for adjacency matrix of a large graph and then initializing each index with 0 or 1 depending upon the edge list.But I m getting a segmentation fault. 我已经使用malloc初始化了一个大图的邻接矩阵的2D数组,然后根据边缘列表将每个索引初始化为0或1,但是我遇到了分段错误。 Here is my code. 这是我的代码。

#include <stdio.h>
#include <stdlib.h>
int MAX = 50000;
void clustering(int **adj);

int main()
{
  int i, j, k;  
  FILE *ptr_file1;
  int **adj;

  adj = (int **)malloc(sizeof(int *)*MAX);
  for(i=0;i<MAX;++i)
  adj[i] = (int *)malloc(sizeof(int)*MAX);

  struct adjacency
  {
     int node1;
     int node2;
  };
  struct adjacency a;

  ptr_file1 = fopen("Email-Enron.txt","r"); //Opening file containing edgelist of approx  37000 nodes.

  if (!ptr_file1)
    return 1;

  while(fscanf(ptr_file1,"%d %d",&a.node1, &a.node2)!=EOF)
  {
     adj[a.node1][a.node2] = 1;                   //Getting segmentation fault here   
     adj[a.node2][a.node1] = 1; 

  printf("adj[%d][%d] = %d   adj[%d][%d] = %d\n",a.node1,a.node2,adj[a.node1][a.node2],a.node2,a.node1,adj[a.node2][a.node1]);  
  }
  clustering(adj);
  return (0);
 }

Here is my output 这是我的输出

......
......
adj[85][119] = 1   adj[119][85] = 1
adj[85][154] = 1   adj[154][85] = 1
adj[85][200] = 1   adj[200][85] = 1
adj[85][528] = 1   adj[528][85] = 1
adj[85][604] = 1   adj[604][85] = 1
adj[85][661] = 1   adj[661][85] = 1
adj[85][662] = 1   adj[662][85] = 1
adj[85][686] = 1   adj[686][85] = 1
adj[85][727] = 1   adj[727][85] = 1
adj[85][1486] = 1   adj[1486][85] = 1
adj[85][1615] = 1   adj[1615][85] = 1
adj[85][2148] = 1   adj[2148][85] = 1
adj[85][2184] = 1   adj[2184][85] = 1
adj[85][2189] = 1   adj[2189][85] = 1
adj[85][2190] = 1   adj[2190][85] = 1
adj[85][2211] = 1   adj[2211][85] = 1
adj[85][3215] = 1   adj[3215][85] = 1
adj[85][4583] = 1   adj[4583][85] = 1
adj[85][4585] = 1   adj[4585][85] = 1
adj[85][4586] = 1   adj[4586][85] = 1
adj[85][4589] = 1   adj[4589][85] = 1
adj[85][4590] = 1   adj[4590][85] = 1
Segmentation fault (core dumped)

What is wrong here.Pls help... 这是怎么了,请帮助...

The problem must be coming from the memory allocation. 问题一定来自内存分配。 On a classic computer, sizeof(int) is 4 and sizeof(int*) can be 4 (32 bits OS) or 8 (64 bits OS). 在经典计算机上, sizeof(int)为4, sizeof(int*)可以为4(32位OS)或8(64位OS)。

There, you are first allocating the room for 50000 pointers, thus 50000*4 = 200000 bytes at least. 在那里,您首先要为50000个指针分配空间,因此至少50000 * 4 = 200000字节。

Then, you loop through this in order to allocate 50.000*50.000*4 = 10.000.000.000 bytes = 10 GB ! 然后,您可以循环遍历以便分配50.000 * 50.000 * 4 = 10.000.000.000字节= 10 GB!

Since you do NOT check on malloc() return value, my guess is that at some point in this loop : 由于您检查malloc()返回值,我的猜测是在此循环的某个时刻:

for(i=0;i<MAX;++i)
    adj[i] = (int *)malloc(sizeof(int)*MAX);

malloc always returns NULL . malloc始终返回NULL Let denote M such an index. 让M表示这样的索引。 In your case I can guess that M ≥ 4591. 在您的情况下,我可以猜测M≥4591。

Later, when reading the data from your file, you try to access a NULL pointer if M ≤ a.node1 . 稍后,当从文件中读取数据时,如果a.node1 ,则尝试访问NULL指针。

By the way, you could allocate 2D arrays like this : 顺便说一下,您可以像这样分配2D数组:

int **array, i;
if(NULL == (array = malloc(sizeof(int*)*MAX))) {
    printf("Oops, not enough memory ...\n");
    return EXIT_FAILURE;
}
if(NULL == (array[0] = malloc(sizeof(int)*MAX*MAX))) {
    printf("Oops, not enough memory ...\n");
    free(array);
    return EXIT_FAILURE;
}
for(i = 1; i < MAX; i++)
    array[i] = array[0]+i;
// At this point, array is ready to use.
do_stuff();
// When you are done, freeing the memory is not tiresome :
free(array[0]);
free(array);

(Notice that in C, you never cast the return of malloc.) (请注意,在C语言中,您永远不会转换malloc的返回。)

What is the difference between this allocation and yours ? 此分配与您的分配有什么区别? In yours, each adj[i] point to a dynamically allocated chunk of data. 在您中,每个adj[i]指向动态分配的数据块。 As a consequence, there is little chance that these chunks of data will be contiguous in memory. 结果,这些数据块在内存中连续的可能性很小。 In the one I propose, there is only 2 memory allocations and in the end the chunks of data pointed by adj[i] and adj[i+1] are contiguous. 在我提出的方案中,只有2个内存分配,最后adj[i]adj[i+1]指向的数据块是连续的。

NB : 注意:

adjacency matrix of a large graph 大图的邻接矩阵

Although adjacency matrix is a perfectly valid way to store a graph in memory, when the graph tends to be large, you should use adjacency list instead. 尽管邻接矩阵是将图形存储在内存中的一种完全有效的方法,但是当图形往往很大时,您应该使用邻接列表。

50000 * 50000 ints is quite a lot. 50000 * 50000整数是很多。 Namely, it is 9Gb memory for 4 byte integer. 即,它是4字节整数的9Gb内存。 Are you quite sure you get all the memory allocated? 您确定要分配所有内存吗?

Add a check: 添加支票:

if (!adj[i])
   return 2;

Note that you have to compile for x64 and run on an x64 machine for it to work. 请注意,您必须为x64编译并在x64机器上运行才能运行。 Most probably you don't need that much data. 最有可能您不需要那么多数据。

In your specific case, there is no need to allocate an array of pointers pointing to arrays of ints. 在您的特定情况下,无需分配指向整数数组的指针数组。 Just allocate one single array of ints with the size of MAX*MAX. 只需分配一个整数数组,其大小为MAX * MAX。

Firstly, add a debug printf before the error: 首先,在错误之前添加一个调试printf:

  while(fscanf(ptr_file1,"%d %d",&a.node1, &a.node2)!=EOF)
  {
     printf("%d %d\n", a.node1, a.node2);

     adj[a.node1][a.node2] = 1;                   //Getting segmentation fault here   
     adj[a.node2][a.node1] = 1; 
  }

That way you can see if your array indices are out of range before your program crashes. 这样,您可以查看程序崩溃之前数组索引是否超出范围。

This is just a quick fix for debugging purposes though - really you should have proper error checking: 不过,这只是出于调试目的的快速解决方案-实际上,您应该进行正确的错误检查:

  while(fscanf(ptr_file1,"%d %d",&a.node1, &a.node2)!=EOF)
  {
     if (a.node1 >= MAX || a.node2 >= MAX)
     {
         fprintf(stderr, "range error: a.node1 = %d, a.node2 = %d\n", a.node1, a.node2);
         exit(1);
     }

     adj[a.node1][a.node2] = 1;                   //Getting segmentation fault here   
     adj[a.node2][a.node1] = 1; 
  }

for comment. 征求意见。 use one dimensions bitmap, but one dimension may be use as two dimention and can be useful for graphs 使用一维位图,但一维可以用作二维空间,对图形很有用

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

#define MAX 4000000

unsigned char *bitmapinit(int n);
unsigned char chkbit(unsigned char *map, int n);
void setbit(unsigned char *map, int n);
void unsetbit(unsigned char *map, int n);

int main(int argc, char *argv[])
{
        unsigned int i;
        unsigned char *bitmap = bitmapinit(MAX);
        if (!bitmap) {
                perror("malloc: ");
                exit(EXIT_FAILURE);
        }
        for (i = 0; i < MAX; i++) {
                setbit(bitmap, i);
        }
        for (i = 0; i < MAX; i += 5) {
                 unsetbit(bitmap, i);
        }
        for (i = 0; i < MAX; i++) {
                printf("bit #%d = %d\n", i, (chkbit(bitmap, i))?1:0);
        }
        return 0;
}
unsigned char *bitmapinit(int n)
{
        return calloc(sizeof(unsigned char), n / 8 + 1);
}
unsigned char chkbit(unsigned char *map, int n)
{
        return (unsigned char)map[n / 8] & (1 << (n % 8));
}
void setbit(unsigned char *map, int n)
{
        map[n / 8] = map[n / 8] | (1 << (n % 8));
}
void unsetbit(unsigned char *map, int n)
{
        map[n / 8] = map[n / 8] & ~(1 << (n % 8));
}

I can explain how it is used for graphs, if you need. 如果需要,我可以解释如何将其用于图形。

space-saving 8x. 节省空间8倍。 For matrix from 50000 x 50000 you need ~300MB, graph can be oriented, but not multiply-linked 对于50000 x 50000的矩阵,您需要约300MB,图形可以定向,但不能多重链接

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdbool.h>    
#include <errno.h>

#define ROW 50
#define COL 55

unsigned int *bitmapinit(int, int);
bool chkbit(unsigned int *, int, int, int);
void setbit(unsigned int *, int, int, int);
void unsetbit(unsigned int *, int, int, int);


int main(int argc, char *argv[])
{
    unsigned int i, j;
    unsigned int *bitmap = bitmapinit(ROW, COL);
    if (!bitmap) {
        perror("malloc: ");
        exit(EXIT_FAILURE);
    }
    for (i = 0; i < ROW; i+=2)
        for (j = 0; j < COL; j+=2)
            setbit(bitmap, i, j, COL);    

    for (i = 0; i < ROW; i++) {
        for (j = 0; j < COL; j++) {
            printf("%d ",(chkbit(bitmap, i, j, COL)) ? 1 : 0);
        }
        printf("\n");
    }
    printf("\n");
    for (i = 0; i < ROW; i++)
        for (j = 0; j < COL; j++)
            setbit(bitmap, i, j, COL);

    for (i = 0; i < ROW; i += 3)
        for (j = 0; j < COL; j += 3)
            unsetbit(bitmap, i, j, COL);    

    for (i = 0; i < ROW; i++) {
        for (j = 0; j < COL; j++) {
            printf("%d ",(chkbit(bitmap, i, j, COL)) ? 1 : 0);
        }
        printf("\n");
    }
    return 0;
}

unsigned int *bitmapinit(int row, int col) //n it is ROWS, m it is COLUMNS
{
    return calloc(sizeof(unsigned int), (row * col) / 32 + 1);
}
bool chkbit(unsigned int *map, int row, int col, int n)
{
    return map[(row * n + col) / 32] & (1 << (row * n + col) % 32);
}
void setbit(unsigned int *map, int row, int col, int n)
{
    map[(row * n + col) / 32] = map[(row * n + col) / 32] | (1 << (row * n + col) % 32);
}
void unsetbit(unsigned int *map, int row, int col, int n)
{
    map[(row * n + col) / 32] = map[(row * n + col) / 32] & ~(1 << (row * n + col) % 32);
}

program is not complicated, in fact it is a 2-dimensional array, but each elements of the array can be set to only 0 or 1 程序并不复杂,实际上它是一个二维数组,但是数组的每个元素只能设置为0或1

but with values ​​50000 * 50000 will work for a long time 但是值50000 * 50000可以长时间工作

respectively, to set the bit XY you need to call the setbit(unsigned char *map, int Y, int X, int LenOfRow); 要分别设置XY位,您需要调用setbit(unsigned char *map, int Y, int X, int LenOfRow); to clear the bit XY unsetbit(unsigned char *map, int Y, int X, int LenOfRow); 清除位XY unsetbit(unsigned char *map, int Y, int X, int LenOfRow); and obtain the values ​​of bit XY checkbit(unsigned char *map, int Y, int X, int LenOfRow); 并获取XY校验位的值checkbit(unsigned char *map, int Y, int X, int LenOfRow);

once again remind you that the value of LenOfRow should not change within a one array 再次提醒您, LenOfRow的值不应在一个数组中更改

As the others have noted, your problem is very likely the sheer size of your 2D array. 正如其他人指出的那样,您的问题很可能是2D阵列的绝对大小。 So you have three options: 因此,您有三种选择:

  1. Optimize the size of your adjacency matrix. 优化邻接矩阵的大小。 You can cut the memory consumption by a factor of four (on most systems) by using int8_t instead of int . 使用int8_t而不是int可以将内存消耗减少四倍(在大多数系统上)。 You can cut it by another factor of eight by using individual bits of the integers that comprise the matrix. 您可以使用构成矩阵的整数的各个位将其减少八分之一。 That's a factor of 32 which should be enough to get your matrix down to manageable size. 这是32的因数,应该足以使矩阵减小到可管理的大小。

    You could use accessors like this: 您可以使用如下访问器:

     void setAdjacent(int32_t** matrix, int x, int y) { matrix[x][y/32] |= (1 << (y & 31)); } int isAdjacent(int32_t** matrix, int x, int y) { return matrix[x][y/32] & (1 << (y & 31)); } 
  2. Exploit the fact that your adjacency matrix is sparse. 利用您的邻接矩阵稀疏这一事实。 For each node, store a list of all the other nodes that it is adjacent to. 对于每个节点,存储其相邻的所有其他节点的列表。

  3. Buy more RAM. 购买更多RAM。


You can also use a true 2D array like this: 您也可以像这样使用真正的2D数组:

int32_t (*matrix)[MAX] = malloc(MAX*sizeof(*matrix));

This avoids the hassle of allocating an array for each line, and avoids the overhead of one pointer indirection. 这避免了为每行分配数组的麻烦,并且避免了一个指针间接的开销。 You would just need to change the signature of the accessors accordingly, their contents does not change at all. 您只需要相应地更改访问器的签名,它们的内容就不会改变。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM