提高從數據庫加載 100,000 條記錄的性能

Question

我們創建了一個程序，以便在其他程序中更輕松地使用數據庫。 所以我顯示的代碼被用於多個其他程序。

其中一個程序從我們的一位客戶那里獲得了大約 10,000 條記錄，並且必須檢查這些記錄是否已經在我們的數據庫中。 如果不是，我們將它們插入數據庫（它們也可以更改並且必須在那時更新）。

為了方便起見，我們從整個表中加載所有條目（目前為 120,000），為我們獲得的每個條目創建一個 class，並將它們全部放入 Hashmap。

以這種方式加載整個表大約需要 5 分鍾。 此外，我們有時不得不重新啟動程序，因為我們在有限的硬件上工作時遇到了 GC 開銷錯誤。 您知道我們如何提高性能嗎？

這是加載所有條目的代碼（我們對每個查詢有 10.000 個條目的全局限制，因此我們使用循環）：

public Map<String, IMasterDataSet> getAllInformationObjects(ISession session) throws MasterDataException {
    IQueryExpression qe;
    IQueryParameter qp;
    
    // our main SDP class
    Constructor<?> constructorForSDPbaseClass = getStandardConstructor();
    
    SimpleDateFormat itaTimestampFormat = new SimpleDateFormat("yyyyMMddHHmmssSSS");
    
    // search in standard time range (modification date!)
    Calendar cal = Calendar.getInstance();
    cal.set(2010, Calendar.JANUARY, 1);
    Date startDate = cal.getTime();
    Date endDate = new Date();
    Long startDateL = Long.parseLong(itaTimestampFormat.format(startDate));
    Long endDateL = Long.parseLong(itaTimestampFormat.format(endDate));

    IDescriptor modDesc = IBVRIDescriptor.ModificationDate.getDescriptor(session);

    // count once before to determine initial capacities for hash map/set
    IBVRIArchiveClass SDP_ARCHIVECLASS = getMasterDataPropertyBag().getSDP_ARCHIVECLASS();
    qe = SDP_ARCHIVECLASS.getQueryExpression(session);
    qp = session.getDocumentServer().getClassFactory()
            .getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);        
    qp.setExpression(qe);  
    qp.setHitLimitThreshold(0);
    qp.setHitLimit(0);
    int nrOfHitsTotal = session.getDocumentServer().queryCount(session, qp, "*");
    int initialCapacity = (int) (nrOfHitsTotal / 0.75 + 1);

    // MD sets; and objects already done (here: document ID)
    HashSet<String> objDone = new HashSet<>(initialCapacity); 
    HashMap<String, IMasterDataSet> objRes = new HashMap<>(initialCapacity); 
    
    qp.close();
    
    // do queries until hit count is smaller than 10.000
    // use modification date
    
    boolean keepGoing = true;
    while(keepGoing) {
        // construct query expression
        // - basic part: Modification date & class type
        // a. doc. class type
        qe = SDP_ARCHIVECLASS.getQueryExpression(session);
        // b. ID
        qe = SearchUtil.appendQueryExpressionWithANDoperator(session, qe, 
                   new PlainExpression(modDesc.getQueryLiteral() + " BETWEEN " + startDateL + " AND " + endDateL));
        
        // 2. Query Parameter: set database; set expression
        qp = session.getDocumentServer().getClassFactory()
                .getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
        
        qp.setExpression(qe);  
        
        // order by modification date; hitlimit = 0 -> no hitlimit, but the usual 10.000 max
        qp.setOrderByExpression(session.getDocumentServer().getClassFactory().getOrderByExpressionInstance(modDesc, true));
        qp.setHitLimitThreshold(0);
        qp.setHitLimit(0);

        // Do not sort by modification date;
        qp.setHints("+NoDefaultOrderBy");
        
        keepGoing = false;
        IInformationObject[] hits = null;
        IDocumentHitList hitList = null;
        hitList = session.getDocumentServer().query(qp, session);
        IDocument doc;
        if (hitList.getTotalHitCount() > 0) {
            hits = hitList.getInformationObjects();
            for (IInformationObject hit : hits) {
                String objID = hit.getID();
                if(!objDone.contains(objID)) {
                    // do something with this object and the class
                    // here: construct a new SDP sub class object and give it back via interface
                    doc = (IDocument) hit;
                    IMasterDataSet mdSet;
                    try {
                        mdSet = (IMasterDataSet) constructorForSDPbaseClass.newInstance(session, doc);
                    } catch (Exception e) {
                        // cause for this
                        String cause = (e.getCause() != null) ? e.getCause().toString() : MasterDataException.ERRMSG_PART_UNKNOWN;                            
                        throw new MasterDataException(MasterDataException.ERRMSG_NOINSTANCE_POSSIBLE, this.getClass().getSimpleName(), e.toString(), cause);
                    }                        
                    objRes.put(mdSet.getID(), mdSet);
                    objDone.add(objID);
                }                       
            }
            doc = (IDocument) hits[hits.length - 1];
            Date lastModDate = ((IDateValue) doc.getDescriptor(modDesc).getValues()[0]).getValue();
            startDateL = Long.parseLong(itaTimestampFormat.format(lastModDate));
        
            keepGoing = (hits.length >= 10000 || hitList.isResultSetTruncated());
        }
        qp.close();
    }   
    return objRes;
}

Answer 1

每次加載 120,000 行（以及更多）不會很好地擴展，並且隨着記錄大小的增長，您的解決方案將來可能無法正常工作。 而是讓數據庫服務器處理問題。

您的表需要具有基於記錄列的主鍵或唯一鍵。 遍歷執行 JDBC SQL 更新的 10,000 條記錄，以使用 where 子句修改所有字段值以完全匹配主鍵/唯一鍵。

update BLAH set COL1 = ?, COL2 = ? where PKCOL = ?; // ... AND PKCOL2 =? ...

這會修改現有行或根本不執行任何操作 - JDBC executeUpate()將返回 0 或 1，指示已更改的行數。 如果更改的行數為零，則您檢測到一條不存在的新記錄，因此僅對該新記錄執行插入。

insert into BLAH (COL1, COL2, ... PKCOL) values (?,?, ..., ?);

您可以決定是運行 10,000 次更新，然后再執行多少次插入，或者執行更新+可選插入，記住 JDBC 批處理語句/自動提交關閉可能有助於加快速度。

提高從數據庫加載 100,000 條記錄的性能

問題描述

1 個解決方案

解決方案1
0 2021-12-09 15:40:58

提高從數據庫加載 100,000 條記錄的性能

問題描述

1 個解決方案

解決方案1 0 2021-12-09 15:40:58

解決方案1
0 2021-12-09 15:40:58