在 Java 中将数据插入到 map 列表中花费了太多时间

Question

I have a task to send the automated report daily.我的任务是每天发送自动报告。 So, I'm basically writing the query result set/collection into list of map so that I can write that data to excel.所以，我基本上将查询结果集/集合写入 map 列表，以便我可以将该数据写入 excel。 Here, I have the following method that inserts collection data into list of map.在这里，我有以下方法将收集数据插入到 map 列表中。 The problem is this method takes 1 hour 20 minutes to insert data from the collection which has 3000-3500 rows and 14 columns.问题是这种方法需要 1 小时 20 分钟才能从具有 3000-3500 行和 14 列的集合中插入数据。 In my code I have 5 similar queries to run and for each query it is taking same amount of time.在我的代码中，我有 5 个类似的查询要运行，并且每个查询都需要相同的时间。 Can you please help me in optimizing the code so that it takes less amount of time?您能帮我优化代码以减少时间吗？

// avoided following method

public static List<Map<String, Object>> insertAttrValues(IDfCollection dfCollection, List<String> attributes) throws DfException {

    if (dfCollection == null || attributes == null) {
        throw new MissingParameterException("collection and attributes");
    }

    List<Map<String, Object>> dataList = new ArrayList<>();

    while (dfCollection.next()) {
        Map<String, Object> map = new LinkedHashMap<>(attributes.size());

        for (String attribute: attributes) {
            map.put(attribute, dfCollection.getString(attribute));
        }
        dataList.add(map);
    }

    return dataList;
}

Edit: Sorry, placed important parts of code and used collection directly instead of inserting values in map and process later.编辑：对不起，放置代码的重要部分并直接使用集合，而不是在 map 中插入值并稍后处理。

Starting Point:初始点：

@SpringBootApplication
public class ImmsClinicalReportApplication {

    public static void main(String[] args) {
        ApplicationContext applicationContext = SpringApplication.run(ImmsClinicalReportApplication.class, args);
        init(applicationContext);
    }

    private static void init(@NotNull ApplicationContext applicationContext) {
        ClinicalReportController clinicalReportController = (ClinicalReportController) applicationContext.getBean("clinicalReportController");

        IDfSession dfSession = null;

        try {
            dfSession = clinicalReportController.getSession();
            clinicalReportController.execute(dfSession);
            sendEmail(applicationContext, clinicalReportController);
        } catch (DfException | IOException e) {
            e.printStackTrace();
        } finally {
            try {
                clinicalReportController.cleanSession(dfSession);
            } catch (DfException e) {
                e.printStackTrace();
            }
        }
    }
}

@Controller("clinicalReportController")
@PropertySource("classpath:application.properties")
public class ClinicalReportController {

    private static final Logger logger = Logger.getLogger(ClinicalReportController.class);

    private final SessionHelper sessionHelper;
    private final DqlHelper dqlHelper;
    private final AppProperties appProperties;

    @Value("${report_path}")
    private String XLSX_FILE_PATH;

    private static final String[] moduleTypes = {
        "Clin Protocol", "Clin Investigator Brochure", "Clin Core Text",
        "Clin Process Documentation", "Clin Supporting Information"
    };

    @Autowired
    public ClinicalReportController(DqlHelper dqlHelper, SessionHelper sessionHelper, AppProperties appProperties) {
        this.dqlHelper = dqlHelper;
        this.sessionHelper = sessionHelper;
        this.appProperties = appProperties;
    }

    /**
     * Method that processes the report
     * @param dfSession dfSession
     * @throws DfException DfException
     * @throws IOException IOException
     */
    public void execute(IDfSession dfSession) throws DfException, IOException {

        StopWatch timer = new StopWatch();

        for (int i = 0; i < moduleTypes.length; i++) {
            // start timer
            timer.start();
            IDfCollection dfCollection = dqlHelper.query(dfSession, QueryConstant.immsQueries[i]);

            List<String> attributes = new ArrayList<>(dfCollection.getAttrCount());

            for (int j = 0; j < dfCollection.getAttrCount(); j++) {
                attributes.add(dfCollection.getAttr(j).getName());
            }

            // stop timer
            timer.stop();
            // Each query takes 20 mins of time
            /* Sample query: select d.r_object_id, d.object_name, d.title,
            d.imms_extreleased_date, d.imms_extreleased_reason, d.imms_extreleaser,
            d.imms_protocol_number, d.imms_protocol_number_rep, d.keywords,
            d.imms_compound_number, d.imms_module_type, d.imms_prereleaser,
            d.imms_prereleased_date, f.r_folder_path from imms_document d,
            dm_folder f where d.i_folder_id=f.r_object_id and i_cabinet_id='0c0033ec80000700'
            and d.imms_module_type = 'Clin Protocol' and d.imms_extreleased_date >
            date('31/12/2016', 'dd/mm/yyyy') and f.r_folder_path is not nullstring enable (ROW_BASED)*/
            logger.info("Time taken to run query: " + QueryConstant.immsQueries[i] + ": " +
                    timer.getTotalTimeSeconds()/60 + " minutes");

            // List<Map<String, Object>> resultSet = ImmsUtils.insertAttrValues(dfCollection, attributes);

            if (i == 0) {
                processReport(dfCollection, moduleTypes[i], attributes);
            } else {
                updateReport(dfCollection, moduleTypes[i], attributes);
            }
            cleanCollection(dfCollection);
        }
    }

    /**
     * Method process for remaining queries/sheets
     * @param resultSet resultSet
     * @param objectType objectType
     * @param attributes attributes
     * @throws IOException IOException
     */
    private void updateReport(IDfCollection resultSet, String objectType, List<String> attributes) throws IOException, DfException {
        Workbook workbook = new XSSFWorkbook(new FileInputStream(XLSX_FILE_PATH));
        excelWriterAndOperateOutputStream(resultSet, objectType, workbook, attributes);
    }

    /**
     * Method that writes data to excel sheets
     * @param dfCollection dfCollection
     * @param sheet2 sheet2
     * @param workbook workbook
     * @param attributes 
 
     * Using collection directly. Not sure where is the issue in following method, writing data to sheet is also taking 50 minutes of time
     */
     private void writeToSheet(@NotNull IDfCollection dfCollection, Sheet sheet2, Workbook workbook, List<String> attributes) throws DfException {
        Sheet sheet;
        Row row;

        sheet = sheet2;

        Object[] values = new Object[attributes.size()];
        StopWatch timer = new StopWatch();
        
        // moved outside of loop 
        // TODO: avoid regex, use other logic 
        String dateRegex = "^([0-9]{4})/([0-1][0-9])/([0-3][0-9])\\s([0-1][0-9]|[2][0-3]):([0-5][0-9]):([0-5][0-9])$";
        Pattern datePattern = Pattern.compile(dateRegex);
        // avoid SDF and Date and
        // TODO: use java.time - maybe LocalDate
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
        Date date = null;

        CellStyle dateCellStyle = workbook.createCellStyle();
        dateCellStyle.setDataFormat(workbook.getCreationHelper().createDataFormat().getFormat("yyyy/MM/dd HH:mm:ss"));

        timer.start();
        while (dfCollection.next()) {
            for (int i = 0; i < attributes.size(); i++) {
                values[i] = dfCollection.getString(attributes.get(i));
            }

            int lastRow = sheet.getLastRowNum();
            row = sheet.createRow(++lastRow);
            int cellNum = 0;


            for (Object value: values) {
                Cell cell = row.createCell(cellNum++);
                if (datePattern.matcher(value.toString()).matches()) {
                    try {
                        date = simpleDateFormat.parse(value.toString());
                    } catch (ParseException e) {
                        e.printStackTrace();
                    }
                    cell.setCellValue(date);
                    cell.setCellStyle(dateCellStyle);
                } else {
                    cell.setCellValue(value.toString());
                }
            }
        }
        timer.stop();
        // Taking 50 mins of time to write collection data
        // Log: Time taken for writing data 54.567404175 minutes
        logger.info("Time taken for writing data " + timer.getTotalTimeSeconds()/60 + " minutes");


        // Resize all columns to fit the content size
        for (int i = 0; i < attributes.size(); i++) {
            sheet.autoSizeColumn(i);
        }
    }

    /**
     * Method to create sheet, set fonts and colors
     * @param moduleType moduleType
     * @param workbook workbook
     * @return Sheet
     */
     private Sheet createSheet(String moduleType, Workbook workbook) {
        return workbook.createSheet(moduleType);
     }

    /**
     * Method to process first query/sheet
     * @param dfCollection dfCollection
     * @param moduleType moduleType
     * @param attributes attributes
     * @throws IOException IOException
     */
     private void processReport(IDfCollection dfCollection, String moduleType, List<String> attributes) throws IOException, DfException {
        // Create a Workbook - for xlsx
        Workbook workbook = new XSSFWorkbook();

        /*CreationHelper helps us create instances of various things like DataFormat,
          Hyperlink, RichTextString etc, in a format (HSSF, XSSF) independent way*/
        
        workbook.getCreationHelper();

        excelWriterAndOperateOutputStream(dfCollection, moduleType, workbook, attributes);
    }

    /**
     * Method that writes and saves data to file
     * @param resultSet resultSet
     * @param moduleType  moduleType
     * @param workbook workbook
     * @param attributes attributes
     * @throws IOException IOException
     */
    private void excelWriterAndOperateOutputStream(IDfCollection resultSet, String moduleType, Workbook workbook, List<String> attributes) throws IOException, DfException {
        Sheet sheet = createSheet(moduleType, workbook);

        CellStyle cellStyle = setFontsAndColors(workbook);

        // Create a Row
        Row headerRow = sheet.createRow(0);
        // Create cells
        for (int i = 0; i < attributes.size(); i++) {
            Cell cell = headerRow.createCell(i);
            cell.setCellValue(attributes.get(i));
            cell.setCellStyle(cellStyle);
        }

        writeToSheet(resultSet, workbook.getSheet(moduleType), workbook, attributes);
        // Write the output to the file
        FileOutputStream fileOutputStream = new FileOutputStream(XLSX_FILE_PATH);
        workbook.write(fileOutputStream);
        // close the file
        fileOutputStream.close();
        // close the workbook
        workbook.close();
    }

    @NotNull
    private CellStyle setFontsAndColors(Workbook workbook) {
        CellStyle cellStyle = workbook.createCellStyle();

        // Create a Font for styling header cells
        Font headerFont = workbook.createFont();
        headerFont.setBold(false);
        headerFont.setFontHeightInPoints((short) 12);
        headerFont.setColor(IndexedColors.GREEN.getIndex());
        cellStyle.setFont(headerFont);
        return cellStyle;
   }

    /**
     * Get IDfSession object
     * @return IDfSession
     * @throws DfException DfException
     */
    public IDfSession getSession() throws DfException {
        IDfSession dfSession;

        IDfSessionManager sessionManager = sessionHelper.getDfSessionManager(appProperties.getRepository(), appProperties.getUsername(), appProperties.getPassword());
        dfSession = sessionManager.getSession(appProperties.getRepository());
        return dfSession;
    }

    /**
     * Clean IDfCollection
     * @param dfCollection dfCollection
     */
    public void cleanCollection(IDfCollection dfCollection) {
        dqlHelper.cleanup(dfCollection);
    }

    /**
     * Clean IDfSession
     * @param dfSession dfSession
     */
    public void cleanSession(IDfSession dfSession) throws DfException {
        sessionHelper.cleanSession(dfSession);
    }
    }

Answer 1

You can do these improvements:您可以进行以下改进：

Fill in the POI structures directly from IDfCollection and do not copy collection data into List<Map<String, Object>> .直接从IDfCollection填写 POI 结构，不要将集合数据复制到List<Map<String, Object>>中。
Use collection.getTime(attribute) to obtain time value instead of regex parsing of each record.使用collection.getTime(attribute)获取时间值，而不是对每条记录进行正则表达式解析。 You can use collection.getAttrDataType(attribute) == IDfAttr.DF_TIME condition to resolve if the value is time or not.您可以使用collection.getAttrDataType(attribute) == IDfAttr.DF_TIME条件来解析值是否为时间。
Then you can use date directly without parsing like this: cell.setCellValue(collection.getTime(attribute).getDate())然后你可以直接使用日期而不用像这样解析： cell.setCellValue(collection.getTime(attribute).getDate())
But the same is for numbers and then you can have better result in excel sheet.但是数字也是如此，然后您可以在 excel 表中获得更好的结果。 It means to use collection.getInt(attribute) and collection.getDouble(attribute) instead of collection.getString(attribute) .这意味着使用collection.getInt(attribute)和collection.getDouble(attribute)而不是collection.getString(attribute) 。 Constants like IDfAttr.DM_INTEGER and IDfAttr.DM_DOUBLE will also help here. IDfAttr.DM_INTEGER和IDfAttr.DM_DOUBLE等常量在这里也有帮助。
Move int last_row outside of the for loop and and do last_row++ inside of the loop.将int last_row for 循环并在循环内执行last_row++ 。 Calling of sheet.getLastRowNum() is not necessary then.那么就不需要调用sheet.getLastRowNum()了。 BTW: camel case name lastRow would be nicer in Java world;-)顺便说一句：骆驼案例名称lastRow在 Java 世界中会更好；-)

Another thing is that you are calling the whole process in another loop for 5 similar queries , so there can be probably another space for improvements like conversions of all queries into one using for example better conditions, UNION if possible, wider condition + filter in application logic, ...).另一件事是，您在另一个循环中为5 个类似查询调用整个过程，因此可能还有另一个改进空间，例如使用更好的条件将所有查询转换为一个，如果可能，使用UNION ，更广泛的条件 + 应用程序中的过滤器逻辑，...）。

Answer 2

I think the main problem is with query.我认为主要问题是查询。 Try below steps:尝试以下步骤：

Instead of giving individual attributes in select query, use * .不要在select查询中给出单个属性，而是使用* 。 See the query execution time.查看查询执行时间。 If the execution is fast without taking minutes of time, try the next steps.如果执行速度很快，无需花费几分钟时间，请尝试后续步骤。

select * from imms_document d, dm_folder f where d.i_folder_id=f.r_object_id and i_cabinet_id='0c0033ec80000700' and d.imms_module_type = 'Clin Protocol' and d.imms_extreleased_date > date('31/12/2016', 'dd/mm/yyyy') and f.r_folder_path is not nullstring enable (ROW_BASED)

As you are using Spring Boot, include required attributes in application.properties as shown below.当您使用 Spring 启动时，请在application.properties中包含所需的属性，如下所示。 You may not want all.你可能不想要所有。

included_attributes=r_object_id,object_name,title,imms_extreleased_date,imms_extreleased_reason,imms_extreleaser,imms_protocol_number,imms_protocol_number_rep,keywords,imms_compound_number,imms_module_type,imms_prereleaser,imms_prereleased_date,r_folder_path

Do the following in your AppProperties class file:在您的AppProperties class 文件中执行以下操作：

@Component
public class AppProperties {

   /**
    *other fields
    */

    @Getter
    @Value("${included_attributes}")
    private String[] includedAttributes;

}

Now in your execute() method, modify the code to use only the attributes you need to get the data for.现在在您的execute()方法中，修改代码以仅使用获取数据所需的属性。

public void execute(IDfSession dfSession) throws DfException, IOException {

    StopWatch timer = new StopWatch();

    for (int i = 0; i < moduleTypes.length; i++) {
        // start timer
        timer.start();
        IDfCollection dfCollection = dqlHelper.query(dfSession, QueryConstant.immsQueries[i]);
        // stop timer
        timer.stop();
        logger.info("Time taken to run query: " + QueryConstant.immsQueries[i] + ": " +
                timer.getTotalTimeSeconds() + " seconds");    
        // attributes to be added
        List<String> attributes = new ArrayList<>();
        // Get included attributes as list
        List<String> includedAttributes = Arrays.asList(appProperties.getIncludedAttributes());

        for (int j = 0; j < dfCollection.getAttrCount(); j++) {
            // check for the attribute in included attributes and add if exists
            if (hasAttribute(includedAttributes, dfCollection.getAttr(j).getName())) {
                attributes.add(dfCollection.getAttr(j).getName());
            }
        }


        if (i == 0) {
            processReport(dfCollection, moduleTypes[i], attributes);
        } else {
            updateReport(dfCollection, moduleTypes[i], attributes);
        }
        cleanCollection(dfCollection);
    }
}

public static boolean hasAttribute(@NotNull List<String> attributes, String attribute) {
    for(String attr : attributes){
        if(attribute.contains(attr)){
            return true;
        }
    }
    return false;
}

Directly use collection for POI structures and no need to insert data in array and iterate over it again.直接对 POI 结构使用集合，无需在数组中插入数据并再次对其进行迭代。

private void writeToSheet(@NotNull IDfCollection dfCollection, Sheet sheet2,
                              @NotNull Workbook workbook, List<String> attributes) throws DfException {
        Sheet sheet;
        Row row;

        sheet = sheet2;

        StopWatch timer = new StopWatch();

        String dateRegex = "^([0-9]{4})/([0-1][0-9])/([0-3][0-9])\\s([0-1][0-9]|[2][0-3]):([0-5][0-9]):([0-5][0-9])$";
        Pattern datePattern = Pattern.compile(dateRegex);

        DateTimeFormatter timeFormatter = DateTimeFormatter.ofPattern("yyyy/MM/dd HH:mm:ss");

        CellStyle dateCellStyle = workbook.createCellStyle();
        dateCellStyle.setDataFormat(workbook.getCreationHelper().createDataFormat().getFormat("yyyy/MM/dd HH:mm:ss"));

        int lastRow = 0;

        timer.start();
        while (dfCollection.next()) {
            row = sheet.createRow(++lastRow);
            int cellNum = 0;

            for (String attribute : attributes) {
                Object value = dfCollection.getString(attribute);

                Cell cell = row.createCell(cellNum++);

                if (datePattern.matcher(value.toString()).matches()) {
                    cell.setCellValue(LocalDateTime.parse(value.toString(), timeFormatter));
                    cell.setCellStyle(dateCellStyle);
                } else {
                    cell.setCellValue(value.toString());
                }
            }
        }
        timer.stop();
        logger.info("Time taken for writing data " + timer.getTotalTimeSeconds()/60 + " minutes");


        // Resize all columns to fit the content size
        for (int i = 0; i < attributes.size(); i++) {
            sheet.autoSizeColumn(i);
        }
    }

Answer 3

You can try the forkjoinPoll or use jdk's stream parallel:), use your cpu's multiple core processer.你可以试试 forkjoinPoll 或者使用 jdk 的 stream 并行:)，使用你的 cpu 的多核处理器。 For forkjoinpool example please refer https://www.baeldung.com/java-fork-join forkjoinpool 示例请参考https://www.baeldung.com/java-fork-join

  public static List<Map<String, Object>> insertAttrValues(Stream<Object> stream, List<String> attributes) throws RuntimeException {
    if (stream == null || attributes == null) {
        throw new RuntimeException("collection and attributes");
    }
    final int size = attributes.size();
    return stream.parallel().map(item -> {
        Map<String, Object> map = new LinkedHashMap<>(size);
        for (String attribute : attributes) {
            //map.put(attribute, item.getString(attribute));
        }
        return map;
    }).collect(Collectors.toList());
}

在 Java 中将数据插入到 map 列表中花费了太多时间

问题描述

3 个解决方案

解决方案1
4 2020-07-17 10:47:02

解决方案2
1 已采纳 2020-07-18 14:05:33

解决方案3
0 2020-07-17 07:58:46

在 Java 中将数据插入到 map 列表中花费了太多时间

问题描述

3 个解决方案

解决方案1 4 2020-07-17 10:47:02

解决方案2 1 已采纳 2020-07-18 14:05:33

解决方案3 0 2020-07-17 07:58:46

解决方案1
4 2020-07-17 10:47:02

解决方案2
1 已采纳 2020-07-18 14:05:33

解决方案3
0 2020-07-17 07:58:46