简体   繁体   English

如何确保Java线程在不同的核心上运行

[英]How to ensure Java threads run on different cores

I am writing a multi-threaded application in Java in order to improve performance over the sequential version. 我正在用Java编写一个多线程应用程序,以提高顺序版本的性能。 It is a parallel version of the dynamic programming solution to the 0/1 knapsack problem. 它是0/1背包问题的动态编程解决方案的并行版本。 I have an Intel Core 2 Duo with both Ubuntu and Windows 7 Professional on different partitions. 我有一个Intel Core 2 Duo,在不同的分区上同时使用Ubuntu和Windows 7 Professional。 I am running in Ubuntu. 我在Ubuntu中运行。

My problem is that the parallel version actually takes longer than the sequential version. 我的问题是并行版本实际上需要比顺序版本更长的时间。 I am thinking this may be because the threads are all being mapped to the same kernel thread or that they are being allocated to the same core. 我想这可能是因为线程都被映射到同一个内核线程或者它们被分配到同一个内核。 Is there a way I could ensure that each Java thread maps to a separate core? 有没有办法确保每个Java线程映射到一个单独的核心?

I have read other posts about this problem but nothing seems to help. 我已经阅读了有关此问题的其他帖子,但似乎没有任何帮助。

Here is the end of main() and all of run() for the KnapsackThread class (which extends Thread). 这是KnapsackThread类(扩展Thread)的main()和run()的结束。 Notice that they way I use slice and extra to calculate myLowBound and myHiBound ensure that each thread will not overlap in domain of the dynProgMatrix. 请注意,我使用slice和extra来计算myLowBound,myHiBound确保每个线程不会在dynProgMatrix的域中重叠。 Therefore there will be no race conditions. 因此没有竞争条件。

    dynProgMatrix = new int[totalItems+1][capacity+1];
    for (int w = 0; w<= capacity; w++)
        dynProgMatrix[0][w] = 0;
    for(int i=0; i<=totalItems; i++)
        dynProgMatrix[i][0] = 0;
    slice = Math.max(1,
            (int) Math.floor((double)(dynProgMatrix[0].length)/threads.length));
    extra = (dynProgMatrix[0].length) % threads.length;

    barrier = new CyclicBarrier(threads.length);
    for (int i = 0; i <  threads.length; i++){
        threads[i] = new KnapsackThread(Integer.toString(i));
    }
    for (int i = 0; i < threads.length; i++){
        threads[i].start();
    }

    for (int i = 0; i < threads.length; i++){
        try {
            threads[i].join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

public void run(){
    int myRank = Integer.parseInt(this.getName());

    int myLowBound;
    int myHiBound;

    if (myRank < extra){
        myLowBound = myRank * (slice + 1);
        myHiBound = myLowBound + slice;
    }
    else{
        myLowBound = myRank * slice + extra;
        myHiBound = myLowBound + slice - 1;
    }

    if(myHiBound > capacity){
        myHiBound = capacity;
    }

    for(int i = 1; i <= totalItems; i++){
        for (int w = myLowBound; w <= myHiBound; w++){

            if (allItems[i].weight <= w){
               if (allItems[i].profit + dynProgMatrix[i-1][w-allItems[i].weight]
                        > dynProgMatrix[i-1][w])
                {
                    dynProgMatrix[i][w] = allItems[i].profit +
                                      dynProgMatrix[i-1][w- allItems[i].weight];
                }
                else{
                    dynProgMatrix[i][w] = dynProgMatrix[i-1][w];
                }
            }
            else{
                dynProgMatrix[i][w] = dynProgMatrix[i-1][w];
            }
        }
        // now place a barrier to sync up the threads
        try {
            barrier.await(); 
        } catch (InterruptedException ex) { 
            ex.printStackTrace();
            return;
        } catch (BrokenBarrierException ex) { 
            ex.printStackTrace(); 
            return;
        }
    }
}

Update: 更新:

I have written another version of the knapsack that uses brute force. 我写了另一个使用蛮力的背包版本。 This version has very little synchronization because I only need to update a bestSoFar variable at the end of a single thread's execution. 这个版本的同步很少,因为我只需要在单个线程的执行结束时更新bestSoFar变量。 Therefore, each thread pretty much should execute completely in parallel except for that small critical section at the end. 因此,每个线程几乎应该完全并行执行,除了最后的那个小关键部分。

I ran this versus the sequential brute force and still it takes longer. 我对此顺序蛮力运行,但仍然需要更长时间。 I don't see any other explanation than that my threads are being run sequentially, either because they are being mapped to the same core or to the same native thread. 我没有看到任何其他解释,因为我的线程正在顺序运行,因为它们被映射到相同的核心或相同的本机线程。

Does anybody have any insight? 有人有任何见解吗?

I doubt that it will be due to using the same core for all threads. 我怀疑它是由于为所有线程使用相同的核心。 The scheduling is up to the OS, but you should be able to see what's going on if you bring up the performance manager for the OS - it will typically show how busy each core is. 调度由操作系统决定,但如果您启动操作系统的性能管理器,您应该能够看到正在发生的事情 - 它通常会显示每个内核的繁忙程度。

Possible reasons for it taking longer: 延长时间的可能原因:

  • Lots of synchronization (either necessary or unnecessary) 大量同步(必要或不必要)
  • The tasks taking such a short time that thread creation is taking a significant proportion of the time 这些任务花费的时间很短,以至于线程创建占用了很大一部分时间
  • Context switching, if you're creating too many threads - for CPU intensive tasks, create as many as threads as you have cores. 上下文切换,如果您创建了太多线程 - 对于CPU密集型任务,请创建与内核一样多的线程。

I was having the same problem for a while. 我有一段时间遇到同样的问题。 I had a CPU-hungry program that I divided in 2 threads (double core CPU), but one beautifull day, while processing some more data, it just stopped using both cores. 我有一个CPU饥饿的程序,我分为2个线程(双核CPU),但一个美好的一天,处理更多的数据,它只是停止使用两个核心。 I just raised the heap mem size ( -Xmx1536m in my case), and it worked fine again. 我刚刚提高了堆内存大小(在我的情况下为-Xmx1536m ),它再次正常工作。

I suggest you take a look at how long it takes for each of your worker threads before they terminate. 我建议你看一下你的每个工作线程在终止之前需要多长时间。 Perhaps one of the threads has a much more difficult task. 也许其中一个线程有一个更困难的任务。 If that's the case, then the overhead caused by synchronization and so on will easily eat up what you've gained from threading. 如果是这种情况,那么由同步等引起的开销将很容易吞噬你从线程获得的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM