简体   繁体   English

用Java批量查询ORACLE数据库的最快方法

[英]Fastest way to batch query an ORACLE Database with Java

In a project I am working on, I am given a list that contains a little under 1 million lines. 在我正在进行的一个项目中,我得到的列表包含少于一百万行。 The data maps all possible origins (000-999) to all possible destinations (000-999). 数据将所有可能的起点(000-999)映射到所有可能的目的地(000-999)。

For each combination, I need to be able to look at the database and determine if there exists a record with the same combination. 对于每种组合,我需要能够查看数据库并确定是否存在具有相同组合的记录。 If no record exists, then it will be added to the database. 如果不存在任何记录,那么它将被添加到数据库中。 If the record does exist, then the record will be updated with the new information. 如果记录确实存在,那么将使用新信息更新记录。

The origin and destination are the primary key of the table, and are also indexes. 源和目标是表的主键,也是索引。 This is all on an ORACLE database. 所有这些都在ORACLE数据库上。

Given that I have to do this 1 million times, what is the best possible solution? 鉴于我必须这样做一百万次,最佳的解决方案是什么? My current method is taking upwards of an hour to process all records. 我当前的方法需要一个多小时才能处理所有记录。

For actually inserting and updating the records, I am using a batch query process that doesn't take much time at all. 对于实际插入和更新记录,我使用的是批量查询过程,该过程根本不需要花费很多时间。

The part that appears to be taking the most amount of time is querying the database for the count of existing records. 看起来花费最多时间的部分是查询数据库中现有记录的数量。

public String batchUpdateModes(List records, String user) throws TransactionException {
    String message = "";
    ArrayList updateList = new ArrayList();
    ArrayList insertList = new ArrayList();
    Connection conn = null;
    try {
        conn = getDao().getConnection();
    } catch (SQLException e1) {
        e1.printStackTrace();
    }
    for (int i = 0; i < records.size(); i++) {
        BatchFileCommand record = (BatchFileCommand)records.get(i);
        String origin = record.getOrigZip().trim();
        String dest = record.getDestZip().trim();
        String pri = record.getPriMode().trim();
        String fcm = record.getFcmMode().trim();
        String per = record.getPerMode().trim();
        String pkg = record.getPkgMode().trim();
        String std = record.getStdMode().trim();
        String effDate = record.getEffDate();
        String discDate = "";

        TransModeObj obj = new TransModeObj(origin, dest, pri, fcm, per, pkg, std, effDate, discDate);
        obj.setUserId(user);
        try {
            Statement stmt = null;
            String findExisting = "select count(*) from trans_mode where orig_zip = " + origin + " " +
                    "and dest_zip = " + dest;
            stmt = conn.createStatement();
            ResultSet rs = stmt.executeQuery(findExisting);
            int count = 0;
            while (rs.next()) {
                count = rs.getInt(1);
            }
            if (count > 0) {
                updateList.add(obj);
            }
            else {
                insertList.add(obj);
            }
            rs.close();
            stmt.close();


        } catch (SQLException e) {
            e.printStackTrace();
            message = e.getMessage();
        }
    }
    try {
        conn.close();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    boolean success = false;
    recordCount[0] = updateList.size();
    recordCount[1] = insertList.size();
    success = insertTransModes(insertList);
    System.out.println("Inserts Complete");
    success = updateTransModes(updateList);
    System.out.println("Updates Complete");

    if (success) {
        message = "success";
    }
    else {
        message = "The changes or additions submitted could not be completed.";
    }

    return message;

The easiest solution is to ditch the counts and just use a MERGE statement. 最简单的解决方案是放弃计数,而仅使用MERGE语句。 This allows the database to figure out whether to insert or update in a single SQL transaction. 这使数据库可以确定是在单个SQL事务中插入还是更新。 Find out more. 了解更多。

The one drawback with MERGE is that the rowcount doesn't distinguish between rows updated and rows inserted. MERGE的一个缺点是,行数不能区分更新的行和插入的行。 This is probably a cheap price to pay for the overall time saved. 这可能是为节省整体时间付出的便宜价格。 Although if you really can't do without separate counts, Adrian Billington has a workaround for you . 尽管如果真的不能没有单独的统计, Adrian Billington可以为您提供解决方法

As APC mentioned - MERGE is a good option when you need to either insert or update. 正如APC所述-当您需要插入或更新时,MERGE是一个不错的选择。 But that may update records you didn't wish to update. 但这可能会更新您不希望更新的记录。

First question is what's the primary key that uniquely identifies your records (is it a composition of several fields)? 第一个问题是唯一标识您的记录的主键是什么(它是由多个字段组成的)吗?

Another approach could be to load all existing records' primary keys into memory beforehand and rule out duplicated from records list (taking into consideration that you posses the required amount of RAM) 另一种方法是预先将所有现有记录的主键加载到内存中,并从记录列表中排除重复项(考虑到您拥有所需的RAM数量)

Also take a look at this and that options. 还要看看这个那个选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM