Edited to reflect correction in comments, thank you Ben.
I have looked at Pthreads matrix multiplication error , Dynamic Matrix Multiplication with Pthreads , and Matrix multiplication using pthreads but none seemed to address the issue I'm having. I am attempting to adapt the following serial version of a matrix multiplier (the kij method) to a threaded version using pthreads. Both deal solely with square matrices of size N * N; matmul takes N as an argument at runtime, matmul_threaded takes N and t, where t is the number of threads desired. Here is matmul.c:
#include <stdio.h>
#include <stdlib.h>
void initmat(double *,long);
void kij(double *a, double *b, double *c, long N);
int main(int argc, char *argv[]) {
double *a,*b,*c;
long N = atol(argv[1]);
// Allocate N-by-N matrix in the heap.
a = (double *) malloc(N * N * sizeof(double));
b = (double *) malloc(N * N * sizeof(double));
c = (double *) malloc(N * N * sizeof(double));
// Initialize the matrix.
initmat(a,N);
initmat(b,N);
kij(a,b,c,N);
}
void initmat(double *mat, long N) {
int i, j;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
mat[i*N+j] = i + j;
}
void kij(double *a, double *b, double *c, long N) {
double r;
int i, j, k;
for (k = 0; k < N; k++)
for (i = 0; i < N; i++) {
r = a[N*i+k];
for (j = 0; j < N; j++)
c[N*i+j] += r*b[N*k+j];
}
}
This works as expected up to N = 1000. (my edge case for this exercise) My strategy with matmul_threaded was to split the k dimension up over t many threads, then iterate over i and j as usual, and include pointers to the arrays in each thread package. matmul_threaded works correctly up to N = 129. For the values I've tested greater than 129, (130...140, 200, 300, 500, 999, 1000) it seg faults.
I am almost certain that I am accessing the arrays incorrectly, but I cannot for the life of me spot where. Here is matmul_threaded.c:
// matmul_threaded.c
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int tid;
long N, nInt;
long *a, *b, *c;
} PKG;
void *kij(void *pkg);
int main(int argc, char *argv[]) {
long *a, *b, *c;
int thread_count,p,q,tid;
long N = atol(argv[1]); // moved here from below mallocs, solved
pthread_t *thread;
PKG *package;
a = (long *) malloc(N * N * sizeof(long));
b = (long *) malloc(N * N * sizeof(long));
c = (long *) malloc(N * N * sizeof(long));
for (p = 0; p < N; p++)
for (q = 0; q < N; q++)
{
a[N*p+q] = p + q;
b[N*p+q] = p + q;
}
thread_count = atoi(argv[2]);
thread = (pthread_t *) malloc (thread_count*sizeof(pthread_t));
package = (PKG *) malloc(thread_count*sizeof(PKG));
for (tid = 0; tid < thread_count; tid++) {
package[tid].tid = tid;
package[tid].N = N;
package[tid].nInt = N/thread_count;
package[tid].a = a;
package[tid].b = b;
package[tid].c = c;
pthread_create(&thread[tid], NULL, kij, (void *) &package[tid]);
}
for (tid = 0; tid < thread_count; tid++) {
pthread_join(thread[tid], NULL);
}
free(thread);
return 0;
}
void *kij(void *pkg) {
PKG *mypkg = (PKG *) pkg;
long i,j,k,r,n = mypkg->N;
for(k = (mypkg->tid)*(mypkg->nInt)+1; k <= (mypkg->tid + 1)*(mypkg->nInt); k++) {
for(i = 0; i < n; i++) {
r = mypkg->a[n*i+k];
for(j = 0; j < n; j++) {
mypkg->c[n*i+j] += r * mypkg->b[n*k+j];
}
}
}
return NULL;
}
Hopefully I have been specific and clear enough, and thank you in advance for your time! Edit: to clarify, this doesn't print/return anything because it doesn't need to; executing it using the "time" command is sufficient, as we're just comparing the time to completion.
You're using N before you initialize it. – Ben
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.