如何將1億行加載到內存中

Question

我需要從MySQL數據庫加載1億多行到內存中。 我的java程序失敗了java.lang.OutOfMemoryError: Java heap space我的機器中有8GB RAM，我在JVM選項中給了-Xmx6144m。

這是我的代碼

public List<Record> loadTrainingDataSet() {

    ArrayList<Record> records = new ArrayList<Record>();
    try {
        Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
        s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings");
        ResultSet rs = s.getResultSet();
        int count = 0;
        while (rs.next()) {

知道如何克服這個問題嗎？

UPDATE

我發現了這篇文章 ，並根據下面的評論更新了我的代碼。 我似乎能夠以相同的-Xmx6144m數量將數據加載到內存中，但這需要很長時間。

這是我的代碼。

...
import org.apache.mahout.math.SparseMatrix;
...

@Override
public SparseMatrix loadTrainingDataSet() {
    long t1 = System.currentTimeMillis();
    SparseMatrix ratings = new SparseMatrix(NUM_ROWS,NUM_COLS);
    int REC_START = 0;
    int REC_END = 0;

    try {
        for (int i = 1; i <= 101; i++) {
            long t11 = System.currentTimeMillis();
            REC_END = 1000000 * i;
            Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
                    java.sql.ResultSet.CONCUR_READ_ONLY);
            s.setFetchSize(Integer.MIN_VALUE);
            ResultSet rs = s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT " + REC_START + "," + REC_END);//100480507
            while (rs.next()) {
                int movieId = rs.getInt("movie_id");
                int customerId = rs.getInt("customer_id");
                byte rating = (byte) rs.getInt("rating");
                ratings.set(customerId,movieId,rating);
            }
            long t22 = System.currentTimeMillis();
            System.out.println("Round " + i + " completed " + (t22 - t11) / 1000 + " seconds");
            rs.close();
            s.close();
        }

    } catch (Exception e) {
        System.err.println("Cannot connect to database server " + e);
    } finally {
        if (conn != null) {
            try {
                conn.close();
                System.out.println("Database connection terminated");
            } catch (Exception e) { /* ignore close errors */ }
        }
    }
    long t2 = System.currentTimeMillis();
    System.out.println(" Took " + (t2 - t1) / 1000 + " seconds");
    return ratings;
}

要加載前100,000行，需要2秒鍾。 要加載29個100,000行，需要46秒。 我在中間停止了這個過程，因為它耗費了太多時間。 這些可接受的時間是多少？ 有沒有辦法提高這段代碼的性能？ 我在8GB RAM 64位Windows機器上運行它。

Answer 1

一億條記錄意味着每條記錄最多可占用50個字節，以便適合6 GB以內的額外空間用於其他分配。 在Java中，50字節不算什么; 僅僅Object[]每個元素需要32個字節。 您必須找到一種方法來立即在while (rs.next())循環中使用結果， while (rs.next())不是完全保留它們。

Answer 2

問題是我在s.executeQuery中得到了java.lang.OutOfMemoryError（自行排序）

您可以將查詢拆分為多個：

    s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT 0,300"); //shows the first 300 results
    //process this first result
    s.executeQuery("SELECT movie_id,customer_id,rating FROM ratings LIMIT 300,600");//shows 300 results starting from the 300th one
    //process this second result
    //etc

當找不到更多結果時，您可以暫停一段時間

Answer 3

你可以調用stmt.setFetchSize(50); 和conn.setAutoCommitMode(false); 避免將整個ResultSet讀入內存。

這是文檔所說的內容：

根據游標獲取結果

默認情況下，驅動程序立即收集查詢的所有結果。 這對於大型數據集來說可能不方便，因此JDBC驅動程序提供了一種將ResultSet基於數據庫游標並僅獲取少量行的方法。

在連接的客戶端緩存少量行，當用盡時，通過重新定位光標來檢索下一行行。

注意：

基於游標的ResultSet不能在所有情況下使用。 有許多限制會使驅動程序無聲地回退到同時獲取整個ResultSet。
與服務器的連接必須使用V3協議。 這是服務器版本7.4及更高版本的默認設置（僅受支持）.-
Connection不能處於自動提交模式。 后端在事務結束時關閉游標，因此在自動提交模式下，后端將關閉游標，然后才能從中獲取任何內容.-
必須使用ResultSet類型ResultSet.TYPE_FORWARD_ONLY創建Statement。 這是默認值，因此不需要重寫代碼以利用此功能，但這也意味着您無法向后滾動或以其他方式在ResultSet中跳轉.-
給出的查詢必須是單個語句，而不是與分號串在一起的多個語句。

示例：設置提取size以打開和關閉光標。

將代碼更改為游標模式就像將Statement的獲取大小設置為適當的大小一樣簡單。 將獲取大小設置為0將導致所有行被緩存（默認行為）。

Class.forName("com.mysql.jdbc.Driver");
Connection conn = DriverManager.getConnection("jdbc:mysql://localhost/test?useCursorFetch=true&user=root");
// make sure autocommit is off 
conn.setAutoCommit(false); 
Statement st = conn.createStatement();

// Turn use of the cursor on. 
st.setFetchSize(50);
ResultSet rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("a row was returned.");
} 
rs.close();

// Turn the cursor off. 
st.setFetchSize(0);
rs = st.executeQuery("SELECT * FROM mytable");
while (rs.next()) {
   System.out.print("many rows were returned.");
} 
rs.close();

// Close the statement. 
st.close();

Answer 4

您必須重新設計並以塊的形式將數據加載到內存中。

例

1）使用適當的SQL（sql僅選擇100萬）從數據庫加載前100萬條記錄，然后處理2）加載另一個類似的塊。

單獨的setFetchSize不會解決這個問題。

如何將1億行加載到內存中

問題描述

UPDATE

4 個解決方案

解決方案1
11 已采納 2013-01-26 10:09:34

解決方案2
3 2013-01-26 10:25:46

解決方案3
2 2016-04-05 13:53:13

解決方案4
0 2018-08-03 06:41:27

如何將1億行加載到內存中

問題描述

UPDATE

4 個解決方案

解決方案1 11 已采納 2013-01-26 10:09:34

解決方案2 3 2013-01-26 10:25:46

解決方案3 2 2016-04-05 13:53:13

解決方案4 0 2018-08-03 06:41:27

解決方案1
11 已采納 2013-01-26 10:09:34

解決方案2
3 2013-01-26 10:25:46

解決方案3
2 2016-04-05 13:53:13

解決方案4
0 2018-08-03 06:41:27