C 线程程序

Question

I wrote a program based on the idea of Riemann's sum to find out the integral value.我根据黎曼求和的思想写了一个程序来求积分值。 It uses several threads, but the performance of it (the algorithm), compared to sequential program i wrote later, is subpar.它使用多个线程，但与我后来编写的顺序程序相比，它的性能（算法）低于标准。 Algorithm-wise they are identical except the threads stuff, so the question is what's wrong with it?在算法方面，除了线程之外，它们是相同的，所以问题是它有什么问题？ pthread_join is not the case, i assume, because if one thread will finish sooner than the other thread, that join wait on, it will simply skip it in the future.我认为pthread_join并非如此，因为如果一个线程比另一个线程完成得更快，则该连接等待，将来它会简单地跳过它。 Is that correct?那是对的吗？ The free call is probably wrong and there is no error check upon creation of threads, i'm aware of it, i deleted it along the way of testing various stuff. free调用可能是错误的，并且在创建线程时没有错误检查，我知道这一点，我在测试各种东西的过程中删除了它。 Sorry for bad english and thanks in advance.抱歉英语不好，提前致谢。

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/types.h>
#include <time.h>


int counter = 0;
float sum = 0;
pthread_mutex_t mutx;

float function_res(float);


struct range {
    float left_border;
    int steps;
    float step_range;
};

void *calcRespectiveRange(void *ranges) {
    struct range *rangs = ranges;
    float left_border = rangs->left_border;
    int steps = rangs->steps;
    float step_range = rangs->step_range;
    free(rangs);
    //printf("left: %f steps: %d step range: %f\n", left_border, steps, step_range);
    int i;
    float temp_sum = 0;
    for(i = 0; i < steps; i++) {
        temp_sum += step_range * function_res(left_border);
        left_border += step_range;
    }
    sum += temp_sum;
    pthread_exit(NULL);
}


int main() {
    clock_t begin, end;

    if(pthread_mutex_init(&mutx, NULL) != 0) {
        printf("mutex error\n");
    }
    printf("enter range, amount of steps and threads: \n");
    float left_border, right_border;

    int steps_count;
    int threads_amnt;
    scanf("%f %f %d %d", &left_border, &right_border, &steps_count, &threads_amnt);
    float step_range = (right_border - left_border) / steps_count;
    int i;
    pthread_t tid[threads_amnt];
    float chunk = (right_border - left_border) / threads_amnt;
    int steps_per_thread = steps_count / threads_amnt;
    begin = clock();
    for(i = 0; i < threads_amnt; i++) {
        struct range *ranges;
        ranges = malloc(sizeof(ranges));
        ranges->left_border = i * chunk + left_border;
        ranges->steps = steps_per_thread;
        ranges->step_range = step_range;
        pthread_create(&tid[i], NULL, calcRespectiveRange, (void*) ranges);
    }
    for(i = 0; i < threads_amnt; i++) {
        pthread_join(tid[i], NULL);
    }
    end = clock();
    pthread_mutex_destroy(&mutx);
    printf("\n%f\n", sum);

    double time_spent = (double) (end - begin) / CLOCKS_PER_SEC;
    printf("Time spent: %lf\n", time_spent);
    return(0);
}

float function_res(float lb) {
    return(lb * lb + 4 * lb + 3);
}

Edit: in short - can it be improved to reduce execution time (with mutexes, for example)?编辑：简而言之 - 是否可以改进以减少执行时间（例如，使用互斥锁）？

Answer 1

The execution time will be shortened, provided you you have multiple hardware threads available.如果您有多个可用的硬件线程，执行时间将会缩短。

The problem is in how you measure time: clock returns the processor time used by the program .问题在于您如何测量时间： clock返回程序使用的处理器时间。 That means, it sums the time taken by all the threads.这意味着，它汇总了所有线程所花费的时间。 If your program uses 2 threads, and it's linear execution time is 1 second, that means that each thread has used 1 second of CPU time, and clock will return the equivalent of 2 seconds.如果你的程序使用了 2 个线程，并且它的线性执行时间是 1 秒，那意味着每个线程使用了 1 秒的 CPU 时间， clock将返回相当于 2 秒的时间。

To get the actual time used (on Linux), use gettimeofday .要获取实际使用时间（在 Linux 上），请使用gettimeofday 。 I modified your code by adding我通过添加修改了您的代码

#include <sys/time.h>

and capturing the start time before the loop:并在循环之前捕获开始时间：

struct timeval tv_start;
gettimeofday( &tv_start, NULL );

and after:之后：

struct timeval tv_end;
gettimeofday( &tv_end, NULL );

and calculating the difference in seconds:并计算以秒为单位的差异：

printf("CPU Time:    %lf\nTime passed: %lf\n",
    time_spent,
    ((tv_end.tv_sec * 1000*1000.0 + tv_end.tv_usec) -
    (tv_start.tv_sec * 1000*1000.0 + tv_start.tv_usec)) / 1000/1000
);

(I also fixed the malloc from malloc(sizeof(ranges)) which allocates the size of a pointer (4 or 8 bytes for 32/64 bit CPU) to malloc(sizeof(struct range)) (12 bytes)). （我还修复了 malloc malloc(sizeof(ranges))的 malloc，它将指针的大小（32/64 位 CPU 为 4 或 8 字节malloc(sizeof(ranges))分配给malloc(sizeof(struct range)) （12 字节））。

When running with the input parameters 0 1000000000 1000000000 1 , that is, 1 billion iterations in 1 thread, the output on my machine is:当使用输入参数0 1000000000 1000000000 1 ，即在 1 个线程中进行 10 亿次迭代，我机器上的输出为：

CPU Time:    4.352000
Time passed: 4.400006

When running with 0 1000000000 1000000000 2 , that is, 1 billion iterations spread over 2 threads (500 million iterations each), the output is:当以0 1000000000 1000000000 2运行时，即 10 亿次迭代分布在 2 个线程上（每个 5 亿次迭代），输出为：

CPU Time:    4.976000
Time passed: 2.500003

For completeness sake, I tested it with the input 0 1000000000 1000000000 4 :为了完整起见，我使用输入0 1000000000 1000000000 4对其进行了测试：

CPU Time:    8.236000
Time passed: 2.180114

It is a little faster, but not twice as fast as with 2 threads, and it uses double the CPU time.它有点快，但不是 2 个线程的两倍，并且它使用两倍的 CPU 时间。 This is because my CPU is a Core i3, a dual-core with hyperthreading, which aren't true hardware threads.这是因为我的 CPU 是 Core i3，具有超线程的双核，这不是真正的硬件线程。

C 线程程序

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-11-29 13:12:37

C 线程程序

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-11-29 13:12:37

解决方案1
2 已采纳 2015-11-29 13:12:37