简体   繁体   English

save()上的Spring Boot JPARepository性能

[英]Spring Boot JPARepository performance on save()

I have an issue where my spring boot application performance is very slow when inserting data. 我有一个问题,当我插入数据时,我的Spring Boot应用程序性能非常慢。

I am extracting a large subset of data from one database and inserting the data into another database. 我正在从一个数据库中提取大量数据并将其插入到另一个数据库中。

The following is my entity. 以下是我的实体。

@Entity
@Table(name = "element")
public class VXMLElementHistorical {

@Id
@Column(name = "elementid")   
private long elementid;

@Column(name = "elementname")
private String elementname; 

Getter/Setter methods...    

I have configured a JPA repository 我已经配置了一个JPA存储库

public interface ElementRepository extends JpaRepository<Element, Long> {

}

and call the save() method with my object 然后用我的对象调用save()方法

@Transactional 
public void processData(List<sElement> hostElements) 
throws DataAccessException { 

List<Element> elements = new ArrayList<Element>();    

for (int i = 0; i < hostElements.size(); i++) {
        Element element = new Element();
        element.setElementid(hostElements.get(i).getElementid());
        element.setElementname(hostElements.get(i).getElementname());
        elements.add(element);
    }

   try{
   elementRepository.save(elements);{
   //catch etc...

}

What is happening is that for each item, it is taking between 6 and 12 seconds to perform an insert. 发生的情况是,对于每个项目,执行插入操作都需要6到12秒。 I have turned on hibernate trace logging and statistics and what is happening when I call the save function is that hibernate performs two queries, a select and an insert. 我已经打开了休眠跟踪记录和统计信息,当我调用保存功能时,发生的事情是休眠执行了两个查询,一个选择和一个插入。 The select query is taking 99% of the overall time. 选择查询占用了总时间的99%。

I have ran the select query direct on the database and the result returns in nanoseconds. 我直接在数据库上运行了选择查询,结果以纳秒为单位返回。 Which leads me to believe it is not an indexing issue however I am no DBA. 这使我相信这不是索引问题,但是我不是DBA。

I have created a load test in my dev environment, and with similar load sizes, the over all process time is no where near as long as in my prod environment. 我已经在我的开发环境中创建了一个负载测试,并且具有相似的负载大小,整个过程时间远没有在产品环境中那么长。

Any suggestions? 有什么建议么?

As @M. 作为@M。 Deinum said in comment you can improve by calling flush() and clear() after a certain number of inserts like below. Deinum在评论中说,您可以通过在一定数量的插入之后调用flush()clear()来进行改进,如下所示。

int i = 0;
for(Element element: elements) {
    dao.save(element);
    if(++i % 20 == 0) {
        dao.flushAndClear();
    }

}

Instead of creating a list of elements and saving those, save the individual elements. 保存单个元素,而不是创建元素列表并保存它们。 Every now an then do a flush and clear to prevent dirty checking to become a bottleneck. 然后每隔一段时间进行flushclear以防止脏检查成为瓶颈。

@PersistenceContext
private EntityManager entityManager;

@Transactional 
public void processData(List<sElement> hostElements) 
throws DataAccessException {     

for (int i = 0; i < hostElements.size(); i++) {
        Element element = new Element();
        element.setElementid(hostElements.get(i).getElementid());
        element.setElementname(hostElements.get(i).getElementname());
        elementRepository.save(element)
        if ( (i % 50) == 0) {
            entityManager.flush();
            entityManager.clear();
        }
}
entityManager.flush(); // flush the last records.

You want to flush + clear every x elements (here it is 50 but you might want to find your own best number. 您想刷新并清除每个x元素(此处为50,但是您可能希望找到自己的最佳数字。

Now as you are using Spring Boot you also might want to add some additional properties. 现在,当您使用Spring Boot时,您可能还想添加一些其他属性。 Like configuring the batch-size. 就像配置批处理大小一样。

spring.jpa.properties.hibernate.jdbc.batch_size=50 

This will, if your JDBC driver supports it, convert 50 single insert statements into 1 large batch insert. 如果您的JDBC驱动程序支持,它将把50个单插入语句转换成1个大批量插入。 Ie 50 inserts to 1 insert. 即50插入到1插入。

See also https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/ 另请参阅https://vladmihalcea.com/how-to-batch-insert-and-update-statements-with-hibernate/

Since loading the entities seems to be the bottleneck and you really just want to do inserts, ie you know the entities don't exist in the database you probably shouldn't use the standard save method of Spring Data JPA. 由于加载实体似乎是瓶颈,您实际上只想进行插入,即您知道实体在数据库中不存在,因此您可能不应该使用Spring Data JPA的标准save方法。

The reason is that it performs a merge which triggers Hibernate to load an entity that might already exist in the database. 原因是它执行merge ,从而触发Hibernate加载数据库中可能已经存在的实体。

Instead, add a custom method to your repository which does a persist on the entity manager. 而是将自定义方法添加到您的存储库中,该方法在实体管理器上persist Since you are setting the Id in advance, make sure you have a version property so that Hibernate can determine that this indeed is a new entity. 由于您是预先设置Id的,因此请确保您具有version属性,以便Hibernate可以确定这确实是一个新实体。

This should make the select go away. 这应该使选择消失。

Other advice given in other answers is worth considering as a second step: 其他答案中给出的其他建议值得考虑作为第二步:

  • enable batching. 启用批处理。
  • experiment with intermediate flushing and clearing of the session. 尝试进行中间冲洗和会话清除。
  • saving one instance at a time without gathering them in a collection, since the call to merge or persist doesn't actually trigger writing to the database, but only the flushing does (this is a simplification, but it shall do for this context) 一次保存一个实例而不将其收集到一个集合中,因为mergepersist调用实际上并不会触发写入数据库的操作,但是只有刷新会这样做(这是一种简化,但是在这种情况下应该这样做)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM