簡體   English   中英

在 Java 中將數據插入到 map 列表中花費了太多時間

[英]Inserting data into list of map taking too much time in Java

我的任務是每天發送自動報告。 所以,我基本上將查詢結果集/集合寫入 map 列表,以便我可以將該數據寫入 excel。 在這里,我有以下方法將收集數據插入到 map 列表中。 問題是這種方法需要 1 小時 20 分鍾才能從具有 3000-3500 行和 14 列的集合中插入數據。 在我的代碼中,我有 5 個類似的查詢要運行,並且每個查詢都需要相同的時間。 您能幫我優化代碼以減少時間嗎?

// avoided following method

public static List<Map<String, Object>> insertAttrValues(IDfCollection dfCollection, List<String> attributes) throws DfException {

    if (dfCollection == null || attributes == null) {
        throw new MissingParameterException("collection and attributes");
    }

    List<Map<String, Object>> dataList = new ArrayList<>();

    while (dfCollection.next()) {
        Map<String, Object> map = new LinkedHashMap<>(attributes.size());

        for (String attribute: attributes) {
            map.put(attribute, dfCollection.getString(attribute));
        }
        dataList.add(map);
    }

    return dataList;
}

編輯:對不起,放置代碼的重要部分並直接使用集合,而不是在 map 中插入值並稍后處理。

初始點:


@SpringBootApplication
public class ImmsClinicalReportApplication {

    public static void main(String[] args) {
        ApplicationContext applicationContext = SpringApplication.run(ImmsClinicalReportApplication.class, args);
        init(applicationContext);
    }

    private static void init(@NotNull ApplicationContext applicationContext) {
        ClinicalReportController clinicalReportController = (ClinicalReportController) applicationContext.getBean("clinicalReportController");

        IDfSession dfSession = null;

        try {
            dfSession = clinicalReportController.getSession();
            clinicalReportController.execute(dfSession);
            sendEmail(applicationContext, clinicalReportController);
        } catch (DfException | IOException e) {
            e.printStackTrace();
        } finally {
            try {
                clinicalReportController.cleanSession(dfSession);
            } catch (DfException e) {
                e.printStackTrace();
            }
        }
    }
}

@Controller("clinicalReportController")
@PropertySource("classpath:application.properties")
public class ClinicalReportController {

    private static final Logger logger = Logger.getLogger(ClinicalReportController.class);

    private final SessionHelper sessionHelper;
    private final DqlHelper dqlHelper;
    private final AppProperties appProperties;

    @Value("${report_path}")
    private String XLSX_FILE_PATH;

    private static final String[] moduleTypes = {
        "Clin Protocol", "Clin Investigator Brochure", "Clin Core Text",
        "Clin Process Documentation", "Clin Supporting Information"
    };

    @Autowired
    public ClinicalReportController(DqlHelper dqlHelper, SessionHelper sessionHelper, AppProperties appProperties) {
        this.dqlHelper = dqlHelper;
        this.sessionHelper = sessionHelper;
        this.appProperties = appProperties;
    }

    /**
     * Method that processes the report
     * @param dfSession dfSession
     * @throws DfException DfException
     * @throws IOException IOException
     */
    public void execute(IDfSession dfSession) throws DfException, IOException {

        StopWatch timer = new StopWatch();

        for (int i = 0; i < moduleTypes.length; i++) {
            // start timer
            timer.start();
            IDfCollection dfCollection = dqlHelper.query(dfSession, QueryConstant.immsQueries[i]);

            List<String> attributes = new ArrayList<>(dfCollection.getAttrCount());

            for (int j = 0; j < dfCollection.getAttrCount(); j++) {
                attributes.add(dfCollection.getAttr(j).getName());
            }

            // stop timer
            timer.stop();
            // Each query takes 20 mins of time
            /* Sample query: select d.r_object_id, d.object_name, d.title,
            d.imms_extreleased_date, d.imms_extreleased_reason, d.imms_extreleaser,
            d.imms_protocol_number, d.imms_protocol_number_rep, d.keywords,
            d.imms_compound_number, d.imms_module_type, d.imms_prereleaser,
            d.imms_prereleased_date, f.r_folder_path from imms_document d,
            dm_folder f where d.i_folder_id=f.r_object_id and i_cabinet_id='0c0033ec80000700'
            and d.imms_module_type = 'Clin Protocol' and d.imms_extreleased_date >
            date('31/12/2016', 'dd/mm/yyyy') and f.r_folder_path is not nullstring enable (ROW_BASED)*/
            logger.info("Time taken to run query: " + QueryConstant.immsQueries[i] + ": " +
                    timer.getTotalTimeSeconds()/60 + " minutes");

            // List<Map<String, Object>> resultSet = ImmsUtils.insertAttrValues(dfCollection, attributes);

            if (i == 0) {
                processReport(dfCollection, moduleTypes[i], attributes);
            } else {
                updateReport(dfCollection, moduleTypes[i], attributes);
            }
            cleanCollection(dfCollection);
        }
    }

    /**
     * Method process for remaining queries/sheets
     * @param resultSet resultSet
     * @param objectType objectType
     * @param attributes attributes
     * @throws IOException IOException
     */
    private void updateReport(IDfCollection resultSet, String objectType, List<String> attributes) throws IOException, DfException {
        Workbook workbook = new XSSFWorkbook(new FileInputStream(XLSX_FILE_PATH));
        excelWriterAndOperateOutputStream(resultSet, objectType, workbook, attributes);
    }

    /**
     * Method that writes data to excel sheets
     * @param dfCollection dfCollection
     * @param sheet2 sheet2
     * @param workbook workbook
     * @param attributes 
 
     * Using collection directly. Not sure where is the issue in following method, writing data to sheet is also taking 50 minutes of time
     */
     private void writeToSheet(@NotNull IDfCollection dfCollection, Sheet sheet2, Workbook workbook, List<String> attributes) throws DfException {
        Sheet sheet;
        Row row;

        sheet = sheet2;

        Object[] values = new Object[attributes.size()];
        StopWatch timer = new StopWatch();
        
        // moved outside of loop 
        // TODO: avoid regex, use other logic 
        String dateRegex = "^([0-9]{4})/([0-1][0-9])/([0-3][0-9])\\s([0-1][0-9]|[2][0-3]):([0-5][0-9]):([0-5][0-9])$";
        Pattern datePattern = Pattern.compile(dateRegex);
        // avoid SDF and Date and
        // TODO: use java.time - maybe LocalDate
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
        Date date = null;

        CellStyle dateCellStyle = workbook.createCellStyle();
        dateCellStyle.setDataFormat(workbook.getCreationHelper().createDataFormat().getFormat("yyyy/MM/dd HH:mm:ss"));

        timer.start();
        while (dfCollection.next()) {
            for (int i = 0; i < attributes.size(); i++) {
                values[i] = dfCollection.getString(attributes.get(i));
            }

            int lastRow = sheet.getLastRowNum();
            row = sheet.createRow(++lastRow);
            int cellNum = 0;


            for (Object value: values) {
                Cell cell = row.createCell(cellNum++);
                if (datePattern.matcher(value.toString()).matches()) {
                    try {
                        date = simpleDateFormat.parse(value.toString());
                    } catch (ParseException e) {
                        e.printStackTrace();
                    }
                    cell.setCellValue(date);
                    cell.setCellStyle(dateCellStyle);
                } else {
                    cell.setCellValue(value.toString());
                }
            }
        }
        timer.stop();
        // Taking 50 mins of time to write collection data
        // Log: Time taken for writing data 54.567404175 minutes
        logger.info("Time taken for writing data " + timer.getTotalTimeSeconds()/60 + " minutes");


        // Resize all columns to fit the content size
        for (int i = 0; i < attributes.size(); i++) {
            sheet.autoSizeColumn(i);
        }
    }

    /**
     * Method to create sheet, set fonts and colors
     * @param moduleType moduleType
     * @param workbook workbook
     * @return Sheet
     */
     private Sheet createSheet(String moduleType, Workbook workbook) {
        return workbook.createSheet(moduleType);
     }

    /**
     * Method to process first query/sheet
     * @param dfCollection dfCollection
     * @param moduleType moduleType
     * @param attributes attributes
     * @throws IOException IOException
     */
     private void processReport(IDfCollection dfCollection, String moduleType, List<String> attributes) throws IOException, DfException {
        // Create a Workbook - for xlsx
        Workbook workbook = new XSSFWorkbook();

        /*CreationHelper helps us create instances of various things like DataFormat,
          Hyperlink, RichTextString etc, in a format (HSSF, XSSF) independent way*/
        
        workbook.getCreationHelper();

        excelWriterAndOperateOutputStream(dfCollection, moduleType, workbook, attributes);
    }

    /**
     * Method that writes and saves data to file
     * @param resultSet resultSet
     * @param moduleType  moduleType
     * @param workbook workbook
     * @param attributes attributes
     * @throws IOException IOException
     */
    private void excelWriterAndOperateOutputStream(IDfCollection resultSet, String moduleType, Workbook workbook, List<String> attributes) throws IOException, DfException {
        Sheet sheet = createSheet(moduleType, workbook);

        CellStyle cellStyle = setFontsAndColors(workbook);

        // Create a Row
        Row headerRow = sheet.createRow(0);
        // Create cells
        for (int i = 0; i < attributes.size(); i++) {
            Cell cell = headerRow.createCell(i);
            cell.setCellValue(attributes.get(i));
            cell.setCellStyle(cellStyle);
        }

        writeToSheet(resultSet, workbook.getSheet(moduleType), workbook, attributes);
        // Write the output to the file
        FileOutputStream fileOutputStream = new FileOutputStream(XLSX_FILE_PATH);
        workbook.write(fileOutputStream);
        // close the file
        fileOutputStream.close();
        // close the workbook
        workbook.close();
    }

    @NotNull
    private CellStyle setFontsAndColors(Workbook workbook) {
        CellStyle cellStyle = workbook.createCellStyle();

        // Create a Font for styling header cells
        Font headerFont = workbook.createFont();
        headerFont.setBold(false);
        headerFont.setFontHeightInPoints((short) 12);
        headerFont.setColor(IndexedColors.GREEN.getIndex());
        cellStyle.setFont(headerFont);
        return cellStyle;
   }

    /**
     * Get IDfSession object
     * @return IDfSession
     * @throws DfException DfException
     */
    public IDfSession getSession() throws DfException {
        IDfSession dfSession;

        IDfSessionManager sessionManager = sessionHelper.getDfSessionManager(appProperties.getRepository(), appProperties.getUsername(), appProperties.getPassword());
        dfSession = sessionManager.getSession(appProperties.getRepository());
        return dfSession;
    }

    /**
     * Clean IDfCollection
     * @param dfCollection dfCollection
     */
    public void cleanCollection(IDfCollection dfCollection) {
        dqlHelper.cleanup(dfCollection);
    }

    /**
     * Clean IDfSession
     * @param dfSession dfSession
     */
    public void cleanSession(IDfSession dfSession) throws DfException {
        sessionHelper.cleanSession(dfSession);
    }
    }

您可以進行以下改進:

  1. 直接從IDfCollection填寫 POI 結構,不要將集合數據復制到List<Map<String, Object>>中。
  2. 使用collection.getTime(attribute)獲取時間值,而不是對每條記錄進行正則表達式解析。 您可以使用collection.getAttrDataType(attribute) == IDfAttr.DF_TIME條件來解析值是否為時間。
  3. 然后你可以直接使用日期而不用像這樣解析: cell.setCellValue(collection.getTime(attribute).getDate())
  4. 但是數字也是如此,然后您可以在 excel 表中獲得更好的結果。 這意味着使用collection.getInt(attribute)collection.getDouble(attribute)而不是collection.getString(attribute) IDfAttr.DM_INTEGERIDfAttr.DM_DOUBLE等常量在這里也有幫助。
  5. int last_row for 循環並在循環內執行last_row++ 那么就不需要調用sheet.getLastRowNum()了。 順便說一句:駱駝案例名稱lastRow在 Java 世界中會更好;-)

另一件事是,您在另一個循環中為5 個類似查詢調用整個過程,因此可能還有另一個改進空間,例如使用更好的條件將所有查詢轉換為一個,如果可能,使用UNION ,更廣泛的條件 + 應用程序中的過濾器邏輯,...)。

我認為主要問題是查詢。 嘗試以下步驟:

  • 不要在select查詢中給出單個屬性,而是使用* 查看查詢執行時間。 如果執行速度很快,無需花費幾分鍾時間,請嘗試后續步驟。

select * from imms_document d, dm_folder f where d.i_folder_id=f.r_object_id and i_cabinet_id='0c0033ec80000700' and d.imms_module_type = 'Clin Protocol' and d.imms_extreleased_date > date('31/12/2016', 'dd/mm/yyyy') and f.r_folder_path is not nullstring enable (ROW_BASED)

  • 當您使用 Spring 啟動時,請在application.properties中包含所需的屬性,如下所示。 你可能不想要所有。

included_attributes=r_object_id,object_name,title,imms_extreleased_date,imms_extreleased_reason,imms_extreleaser,imms_protocol_number,imms_protocol_number_rep,keywords,imms_compound_number,imms_module_type,imms_prereleaser,imms_prereleased_date,r_folder_path

在您的AppProperties class 文件中執行以下操作:

@Component
public class AppProperties {

   /**
    *other fields
    */

    @Getter
    @Value("${included_attributes}")
    private String[] includedAttributes;

}

現在在您的execute()方法中,修改代碼以僅使用獲取數據所需的屬性。

public void execute(IDfSession dfSession) throws DfException, IOException {

    StopWatch timer = new StopWatch();

    for (int i = 0; i < moduleTypes.length; i++) {
        // start timer
        timer.start();
        IDfCollection dfCollection = dqlHelper.query(dfSession, QueryConstant.immsQueries[i]);
        // stop timer
        timer.stop();
        logger.info("Time taken to run query: " + QueryConstant.immsQueries[i] + ": " +
                timer.getTotalTimeSeconds() + " seconds");    
        // attributes to be added
        List<String> attributes = new ArrayList<>();
        // Get included attributes as list
        List<String> includedAttributes = Arrays.asList(appProperties.getIncludedAttributes());

        for (int j = 0; j < dfCollection.getAttrCount(); j++) {
            // check for the attribute in included attributes and add if exists
            if (hasAttribute(includedAttributes, dfCollection.getAttr(j).getName())) {
                attributes.add(dfCollection.getAttr(j).getName());
            }
        }


        if (i == 0) {
            processReport(dfCollection, moduleTypes[i], attributes);
        } else {
            updateReport(dfCollection, moduleTypes[i], attributes);
        }
        cleanCollection(dfCollection);
    }
}

public static boolean hasAttribute(@NotNull List<String> attributes, String attribute) {
    for(String attr : attributes){
        if(attribute.contains(attr)){
            return true;
        }
    }
    return false;
}

直接對 POI 結構使用集合,無需在數組中插入數據並再次對其進行迭代。

private void writeToSheet(@NotNull IDfCollection dfCollection, Sheet sheet2,
                              @NotNull Workbook workbook, List<String> attributes) throws DfException {
        Sheet sheet;
        Row row;

        sheet = sheet2;

        StopWatch timer = new StopWatch();

        String dateRegex = "^([0-9]{4})/([0-1][0-9])/([0-3][0-9])\\s([0-1][0-9]|[2][0-3]):([0-5][0-9]):([0-5][0-9])$";
        Pattern datePattern = Pattern.compile(dateRegex);

        DateTimeFormatter timeFormatter = DateTimeFormatter.ofPattern("yyyy/MM/dd HH:mm:ss");

        CellStyle dateCellStyle = workbook.createCellStyle();
        dateCellStyle.setDataFormat(workbook.getCreationHelper().createDataFormat().getFormat("yyyy/MM/dd HH:mm:ss"));

        int lastRow = 0;

        timer.start();
        while (dfCollection.next()) {
            row = sheet.createRow(++lastRow);
            int cellNum = 0;

            for (String attribute : attributes) {
                Object value = dfCollection.getString(attribute);

                Cell cell = row.createCell(cellNum++);

                if (datePattern.matcher(value.toString()).matches()) {
                    cell.setCellValue(LocalDateTime.parse(value.toString(), timeFormatter));
                    cell.setCellStyle(dateCellStyle);
                } else {
                    cell.setCellValue(value.toString());
                }
            }
        }
        timer.stop();
        logger.info("Time taken for writing data " + timer.getTotalTimeSeconds()/60 + " minutes");


        // Resize all columns to fit the content size
        for (int i = 0; i < attributes.size(); i++) {
            sheet.autoSizeColumn(i);
        }
    }

你可以試試 forkjoinPoll 或者使用 jdk 的 stream 並行:),使用你的 cpu 的多核處理器。 forkjoinpool 示例請參考https://www.baeldung.com/java-fork-join

  public static List<Map<String, Object>> insertAttrValues(Stream<Object> stream, List<String> attributes) throws RuntimeException {
    if (stream == null || attributes == null) {
        throw new RuntimeException("collection and attributes");
    }
    final int size = attributes.size();
    return stream.parallel().map(item -> {
        Map<String, Object> map = new LinkedHashMap<>(size);
        for (String attribute : attributes) {
            //map.put(attribute, item.getString(attribute));
        }
        return map;
    }).collect(Collectors.toList());
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM