简体   繁体   English

Java:使用JAXB解析xml文件后生成SQL并插入到数据库中而不重复的最佳方法?

[英]Java: Best way to generate SQLs after parsing a xml file Using JAXB and insert to database without duplicates?

I've been assigned a task to unmarshal a XML file using JAXB and generate corresponding SQLs and fire to database. 我被分配了使用JAXB解封XML文件并生成相应的SQL并发射到数据库的任务。 I've used following method to generate the list of SQLS. 我使用以下方法生成SQLS列表。

public List<String> getSqlOfNationalityList(File file)throws JAXBException, FileNotFoundException, UnsupportedEncodingException {
List<String> unNationalityList = new ArrayList<String>();
JAXBContext jaxbcontext = JAXBContext.newInstance(ObjectFactory.class);
Unmarshaller unmarshaller = jaxbcontext.createUnmarshaller();

CONSOLIDATEDLIST consolidate = (CONSOLIDATEDLIST) unmarshaller.unmarshal(file);
    // accessing individuals properties
    INDIVIDUALS individuals = consolidate.getINDIVIDUALS();
    List<INDIVIDUAL> list = individuals.getINDIVIDUAL();

    for (INDIVIDUAL individual : list) {
        NATIONALITY nationality = individual.getNATIONALITY();
        if (nationality != null) {
            List<String> values = nationality.getVALUE();
            if (values != null) {
                for (String value : values) {
                    String string2 = "";
                    StringBuffer builder = new StringBuffer();
                    builder.append("INSERT INTO LIST_UN_NATIONALITY");
                    builder.append("(" + "\"DATAID\"" + "," + "\"VALUE\"" + ")");
                    builder.append(" " + "VALUES(");
                    string2 = string2.concat("'" + individual.getDATAID() + "'" + ",");
                    if ("null ".contentEquals(value + " ")) {
                        string2 = string2.concat("' '" + ",");
                    } else {
                        string2 = string2.concat("'" + value.replace("'", "/") + "'" + ",");
                    }

                    if (string2.length() > 0) {
                        builder.append(string2.substring(0, string2.length() - 1));
                    }
                    builder.append(");");
                    builder.append("\r\n");
                    unNationalityList.add(builder.toString());
                }
            }
        }
    }
    return unNationalityList;

}// end of file nationality List

I have used following method to read from the list and insert into database. 我使用以下方法从列表中读取并插入数据库。

private void readListAndInsertToDb(List<String> list) {

        int duplicateCount = 0;
        int totalCount = 0;

        try {
                for (String sql : list) {

                    try {

                        int i = jdbcTemplate.update(sql);

                    } catch (DuplicateKeyException dke) {
                        // dke.printStackTrace();
                        duplicateCount++;

                    } catch (DataAccessException e) {
                        e.printStackTrace();
                    }
                    totalCount++;

                } // end of for


        } catch (SQLException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } 

        System.out.println("\r\nTotal : " + totalCount);
        System.out.println("Total duplicate : " + duplicateCount);


    }

Now the issue is, I've about 13-14 similar type of lists. 现在的问题是,我大约有13-14个类似类型的列表。 And the xml file consists of records which may already exist in database. xml文件由数据库中可能已经存在的记录组成。

  1. How can I fire queries without making duplicate entries in the PostGres database. 如何在不对PostGres数据库进行重复输入的情况下触发查询。
  2. How it could be done in best optimal way? 如何以最佳的最佳方式完成? It would be great if executed in batch 如果分批执行,那就太好了

Nono, don't generate a list of SQL statements. 否,请勿生成SQL语句列表。 Especially don't interpolate them as strings! 尤其不要将它们作为字符串插入! . Awoogah, awoogah, SQL injection alert. Awoogah,awoogah, SQL注入警报。

Don't use a try/catch approach for duplicate handling either. 也不要使用try / catch方法进行重复处理。

Improvements are, from simple and easy to harder but best: 从简单到容易,再到最困难但又最好的改进是:

  • At bare minimum use a PreparedStatement with bind parameters. 至少要使用带有绑定参数的PreparedStatement Prepare it once. 准备一次。 Then execute it for each input, with the parameters from the current data row. 然后使用当前数据行中的参数对每个输入执行该命令。

    You cannot rely on drivers throwing DuplicateKeyException and you should also catch SQLException and check the SQLSTATE. 您不能依赖于引发DuplicateKeyException驱动程序,还应该捕获SQLException并检查SQLSTATE。 Unless of course you plan on using one specific DBMS and your code checks that you're using the expected driver + version. 当然,除非您计划使用一种特定的DBMS,并且您的代码会检查您使用的是预期的驱动程序+版本。

  • Better, use PostgreSQL's INSERT ... ON CONFLICT DO NOTHING feature to handle conflicts without needing exception handling. 更好的是,使用PostgreSQL的INSERT ... ON CONFLICT DO NOTHING功能来处理冲突,而无需进行异常处理。 This lets you batch your inserts, doing many per transaction for better performance. 这样一来,您就可以批量处理插入内容,每笔交易要处理很多次以获得更好的性能。

  • Further improve performance by using a multi-row VALUES list for INSERT ... ON CONFLICT DO NOTHING . 通过将多行VALUES列表用于INSERT ... ON CONFLICT DO NOTHING进一步提高性能。

  • Even better, COPY all the data, including duplicates, into a TEMPORARY table using PgJDBC's CopyManager interface (see PGconnection.getCopyAPI() ), create an index on the key used for duplicate detection, then LOCK the destination table and do a bulk 更妙的是, COPY所有的数据,包括重复,变成了TEMPORARY表使用PgJDBC的CopyManager接口(见PGconnection.getCopyAPI()创建用于重复检测的关键指标,则LOCK目标表,并做了散装

      INSERT INTO real_table SELECT ... FROM temp_table WHERE NOT EXISTS (SELECT 1 FROM real_table WHERE temp_table.key = real_table.key) 

    or similar. 或类似。 This will be way faster. 这将是这样更快。 You can use INSERT ... ON DUPLICATE NO ACTION instead if you're on a new enough PostgreSQL. 如果您使用的是足够新的PostgreSQL,则可以改用INSERT ... ON DUPLICATE NO ACTION

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM