Java線程創建開銷

Question

傳統觀點告訴我們，大批量企業Java應用程序應優先使用線程池來生成新的工作線程。 使用java.util.concurrent使這很簡單。

但是，確實存在線程池不適合的情況。 我目前正在努力解決的具體示例是使用InheritableThreadLocal ，它允許ThreadLocal變量“傳遞”到任何生成的線程。 使用線程池時，此機制會中斷，因為工作線程通常不是從請求線程生成的，而是預先存在的。

現在有辦法解決這個問題（線程本地可以顯式傳入），但這並不總是合適或實際的。 最簡單的解決方案是按需生成新的工作線程，並讓InheritableThreadLocal完成它的工作。

這讓我們回到了這個問題 - 如果我有一個高容量站點，用戶請求線程每個都會產生六個工作線程（即不使用線程池），這是否會給JVM帶來問題？ 我們可能會談論每秒創建幾百個新線程，每個線程持續不到一秒鍾。 現代JVM是否能很好地優化這一點？ 我記得在Java中需要對象池的日子，因為對象創建很昂貴。 從此變得不必要了。 我想知道是否同樣適用於線程池。

如果我知道要測量什么，我會對它進行基准測試，但我擔心的是問題可能比用剖析器測量的更微妙。

注意：使用線程本地的智慧不是問題所在，所以請不要建議我不要使用它們。

Answer 1

這是一個示例微基准測試：

public class ThreadSpawningPerformanceTest {
static long test(final int threadCount, final int workAmountPerThread) throws InterruptedException {
    Thread[] tt = new Thread[threadCount];
    final int[] aa = new int[tt.length];
    System.out.print("Creating "+tt.length+" Thread objects... ");
    long t0 = System.nanoTime(), t00 = t0;
    for (int i = 0; i < tt.length; i++) { 
        final int j = i;
        tt[i] = new Thread() {
            public void run() {
                int k = j;
                for (int l = 0; l < workAmountPerThread; l++) {
                    k += k*k+l;
                }
                aa[j] = k;
            }
        };
    }
    System.out.println(" Done in "+(System.nanoTime()-t0)*1E-6+" ms.");
    System.out.print("Starting "+tt.length+" threads with "+workAmountPerThread+" steps of work per thread... ");
    t0 = System.nanoTime();
    for (int i = 0; i < tt.length; i++) { 
        tt[i].start();
    }
    System.out.println(" Done in "+(System.nanoTime()-t0)*1E-6+" ms.");
    System.out.print("Joining "+tt.length+" threads... ");
    t0 = System.nanoTime();
    for (int i = 0; i < tt.length; i++) { 
        tt[i].join();
    }
    System.out.println(" Done in "+(System.nanoTime()-t0)*1E-6+" ms.");
    long totalTime = System.nanoTime()-t00;
    int checkSum = 0; //display checksum in order to give the JVM no chance to optimize out the contents of the run() method and possibly even thread creation
    for (int a : aa) {
        checkSum += a;
    }
    System.out.println("Checksum: "+checkSum);
    System.out.println("Total time: "+totalTime*1E-6+" ms");
    System.out.println();
    return totalTime;
}

public static void main(String[] kr) throws InterruptedException {
    int workAmount = 100000000;
    int[] threadCount = new int[]{1, 2, 10, 100, 1000, 10000, 100000};
    int trialCount = 2;
    long[][] time = new long[threadCount.length][trialCount];
    for (int j = 0; j < trialCount; j++) {
        for (int i = 0; i < threadCount.length; i++) {
            time[i][j] = test(threadCount[i], workAmount/threadCount[i]); 
        }
    }
    System.out.print("Number of threads ");
    for (long t : threadCount) {
        System.out.print("\t"+t);
    }
    System.out.println();
    for (int j = 0; j < trialCount; j++) {
        System.out.print((j+1)+". trial time (ms)");
        for (int i = 0; i < threadCount.length; i++) {
            System.out.print("\t"+Math.round(time[i][j]*1E-6));
        }
        System.out.println();
    }
}
}

在Intel Core2 Duo E6400 @ 2.13 GHz上使用32位Sun的Java 1.6.0_21客戶端VM的64位Windows 7上的結果如下：

Number of threads  1    2    10   100  1000 10000 100000
1. trial time (ms) 346  181  179  191  286  1229  11308
2. trial time (ms) 346  181  187  189  281  1224  10651

結論：由於我的計算機有兩個核心，因此兩個線程的工作速度幾乎是一個線程的兩倍。 我的計算機每秒可以生成近10000個線程，即線程創建開銷為0.1毫秒 。 因此，在這樣的機器上，每秒幾百個新線程構成可忽略的開銷（通過比較2和100個線程的列中的數字也可以看出）。

Answer 2

首先，這當然很大程度上取決於您使用的JVM。 操作系統也將發揮重要作用。 假設Sun JVM（嗯，我們還稱它為嗎？）：

一個主要因素是分配給每個線程的堆棧內存，您可以使用-Xssn JVM參數進行調整 - 您將需要使用可以獲得的最低值。

這只是一個猜測，但我認為“每秒幾百個新線程”絕對超出了JVM設計的舒適性。 我懷疑一個簡單的基准測試會很快揭示出相當不容置妥的問題。

Answer 3

對於您的基准測試，您可以使用JMeter +一個分析器，它可以讓您直接了解在如此繁重的環境中的行為。 只是讓它運行一個小時，監視內存，CPU等。如果沒有什么中斷，CPU沒有過熱，那沒關系:)
也許你可以通過添加一些代碼來獲得一個線程池，或者自定義（擴展）你正在使用的那個，以便每次從線程池中獲取一個Thread都設置相應的InheritableThreadLocal 。 每個Thread都有這些包私有屬性：
```
 /* ThreadLocal values pertaining to this thread. This map is maintained * by the ThreadLocal class. */ ThreadLocal.ThreadLocalMap threadLocals = null; /* * InheritableThreadLocal values pertaining to this thread. This map is * maintained by the InheritableThreadLocal class. */ ThreadLocal.ThreadLocalMap inheritableThreadLocals = null; 
```
您可以將這些（使用反射）與Thread.currentThread()結合使用以獲得所需的行為。 然而，這是一個廣告，而且，我無法判斷它（與反射）是否不會引入比創建線程更大的開銷。

Answer 4

我想知道是否有必要在每個用戶請求上生成新線程，如果它們的典型生命周期短至一秒鍾。 你可以使用某種Notify / Wait隊列來產生給定數量的（守護進程）線程，它們都會等到有任務要解決。 如果任務隊列變長，則會產生其他線程，但不會產生1-1比率。 它最有可能表現得更好，然后產生數百個生命周期如此短的新線程。

Java線程創建開銷

問題描述

4 個解決方案

解決方案1
37 已采納 2010-12-06 22:19:03

解決方案2
9 2010-01-22 12:23:44

解決方案3
1 2010-01-22 12:31:33

解決方案4
0 2010-01-22 12:38:45

Java線程創建開銷

問題描述

4 個解決方案

解決方案1 37 已采納 2010-12-06 22:19:03

解決方案2 9 2010-01-22 12:23:44

解決方案3 1 2010-01-22 12:31:33

解決方案4 0 2010-01-22 12:38:45

解決方案1
37 已采納 2010-12-06 22:19:03

解決方案2
9 2010-01-22 12:23:44

解決方案3
1 2010-01-22 12:31:33

解決方案4
0 2010-01-22 12:38:45