提高从数据库加载 100,000 条记录的性能

Question

我们创建了一个程序，以便在其他程序中更轻松地使用数据库。 所以我显示的代码被用于多个其他程序。

其中一个程序从我们的一位客户那里获得了大约 10,000 条记录，并且必须检查这些记录是否已经在我们的数据库中。 如果不是，我们将它们插入数据库（它们也可以更改并且必须在那时更新）。

为了方便起见，我们从整个表中加载所有条目（目前为 120,000），为我们获得的每个条目创建一个 class，并将它们全部放入 Hashmap。

以这种方式加载整个表大约需要 5 分钟。 此外，我们有时不得不重新启动程序，因为我们在有限的硬件上工作时遇到了 GC 开销错误。 您知道我们如何提高性能吗？

这是加载所有条目的代码（我们对每个查询有 10.000 个条目的全局限制，因此我们使用循环）：

public Map<String, IMasterDataSet> getAllInformationObjects(ISession session) throws MasterDataException {
    IQueryExpression qe;
    IQueryParameter qp;
    
    // our main SDP class
    Constructor<?> constructorForSDPbaseClass = getStandardConstructor();
    
    SimpleDateFormat itaTimestampFormat = new SimpleDateFormat("yyyyMMddHHmmssSSS");
    
    // search in standard time range (modification date!)
    Calendar cal = Calendar.getInstance();
    cal.set(2010, Calendar.JANUARY, 1);
    Date startDate = cal.getTime();
    Date endDate = new Date();
    Long startDateL = Long.parseLong(itaTimestampFormat.format(startDate));
    Long endDateL = Long.parseLong(itaTimestampFormat.format(endDate));

    IDescriptor modDesc = IBVRIDescriptor.ModificationDate.getDescriptor(session);

    // count once before to determine initial capacities for hash map/set
    IBVRIArchiveClass SDP_ARCHIVECLASS = getMasterDataPropertyBag().getSDP_ARCHIVECLASS();
    qe = SDP_ARCHIVECLASS.getQueryExpression(session);
    qp = session.getDocumentServer().getClassFactory()
            .getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);        
    qp.setExpression(qe);  
    qp.setHitLimitThreshold(0);
    qp.setHitLimit(0);
    int nrOfHitsTotal = session.getDocumentServer().queryCount(session, qp, "*");
    int initialCapacity = (int) (nrOfHitsTotal / 0.75 + 1);

    // MD sets; and objects already done (here: document ID)
    HashSet<String> objDone = new HashSet<>(initialCapacity); 
    HashMap<String, IMasterDataSet> objRes = new HashMap<>(initialCapacity); 
    
    qp.close();
    
    // do queries until hit count is smaller than 10.000
    // use modification date
    
    boolean keepGoing = true;
    while(keepGoing) {
        // construct query expression
        // - basic part: Modification date & class type
        // a. doc. class type
        qe = SDP_ARCHIVECLASS.getQueryExpression(session);
        // b. ID
        qe = SearchUtil.appendQueryExpressionWithANDoperator(session, qe, 
                   new PlainExpression(modDesc.getQueryLiteral() + " BETWEEN " + startDateL + " AND " + endDateL));
        
        // 2. Query Parameter: set database; set expression
        qp = session.getDocumentServer().getClassFactory()
                .getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
        
        qp.setExpression(qe);  
        
        // order by modification date; hitlimit = 0 -> no hitlimit, but the usual 10.000 max
        qp.setOrderByExpression(session.getDocumentServer().getClassFactory().getOrderByExpressionInstance(modDesc, true));
        qp.setHitLimitThreshold(0);
        qp.setHitLimit(0);

        // Do not sort by modification date;
        qp.setHints("+NoDefaultOrderBy");
        
        keepGoing = false;
        IInformationObject[] hits = null;
        IDocumentHitList hitList = null;
        hitList = session.getDocumentServer().query(qp, session);
        IDocument doc;
        if (hitList.getTotalHitCount() > 0) {
            hits = hitList.getInformationObjects();
            for (IInformationObject hit : hits) {
                String objID = hit.getID();
                if(!objDone.contains(objID)) {
                    // do something with this object and the class
                    // here: construct a new SDP sub class object and give it back via interface
                    doc = (IDocument) hit;
                    IMasterDataSet mdSet;
                    try {
                        mdSet = (IMasterDataSet) constructorForSDPbaseClass.newInstance(session, doc);
                    } catch (Exception e) {
                        // cause for this
                        String cause = (e.getCause() != null) ? e.getCause().toString() : MasterDataException.ERRMSG_PART_UNKNOWN;                            
                        throw new MasterDataException(MasterDataException.ERRMSG_NOINSTANCE_POSSIBLE, this.getClass().getSimpleName(), e.toString(), cause);
                    }                        
                    objRes.put(mdSet.getID(), mdSet);
                    objDone.add(objID);
                }                       
            }
            doc = (IDocument) hits[hits.length - 1];
            Date lastModDate = ((IDateValue) doc.getDescriptor(modDesc).getValues()[0]).getValue();
            startDateL = Long.parseLong(itaTimestampFormat.format(lastModDate));
        
            keepGoing = (hits.length >= 10000 || hitList.isResultSetTruncated());
        }
        qp.close();
    }   
    return objRes;
}

Answer 1

每次加载 120,000 行（以及更多）不会很好地扩展，并且随着记录大小的增长，您的解决方案将来可能无法正常工作。 而是让数据库服务器处理问题。

您的表需要具有基于记录列的主键或唯一键。 遍历执行 JDBC SQL 更新的 10,000 条记录，以使用 where 子句修改所有字段值以完全匹配主键/唯一键。

update BLAH set COL1 = ?, COL2 = ? where PKCOL = ?; // ... AND PKCOL2 =? ...

这会修改现有行或根本不执行任何操作 - JDBC executeUpate()将返回 0 或 1，指示已更改的行数。 如果更改的行数为零，则您检测到一条不存在的新记录，因此仅对该新记录执行插入。

insert into BLAH (COL1, COL2, ... PKCOL) values (?,?, ..., ?);

您可以决定是运行 10,000 次更新，然后再执行多少次插入，或者执行更新+可选插入，记住 JDBC 批处理语句/自动提交关闭可能有助于加快速度。

提高从数据库加载 100,000 条记录的性能

问题描述

1 个解决方案

解决方案1
0 2021-12-09 15:40:58

提高从数据库加载 100,000 条记录的性能

问题描述

1 个解决方案

解决方案1 0 2021-12-09 15:40:58

解决方案1
0 2021-12-09 15:40:58