简体   繁体   English

Spring数据缓存启用批量提交?

[英]Spring data caching enabling batch commit?

I have a spring-boot application, which imports a huge document containing several thousand entities (~50 different classes, all with the same super-class), to work on the persisted entities later on. 我有一个spring-boot应用程序,该应用程序导入包含数千个实体(〜50个不同的类,都具有相同的超类)的巨大文档,以便以后处理持久化的实体。 The document contains a lot of references within the document (using rdf). 该文档在文档中包含许多参考(使用rdf)。

At the moment I persist every entity. 目前,我坚持每个实体。 If this entity (E1) has a reference to another one (E2) I look in the database if there is already an entity (E2) with this rdf-id and if not I first import this entity (E2) and then the first one (E1). 如果此实体(E1)引用了另一个实体(E2),则我在数据库中查找是否已存在带有该rdf-id的实体(E2),如果不存在,则我首先导入该实体(E2),然后导入第一个实体(E1)。 (This can also have much longer reference-chains). (这也可以有更长的参考链)。

So I thought, that I have the following 2 bottlenecks: A lot of SELECTs and a lot of INSERTs. 因此,我认为我有以下两个瓶颈:很多SELECT和很多INSERT。 For the first one, I use the cache, which comes with spring (delete isn't realy needed for the import): 对于第一个,我使用了spring附带的缓存(导入确实不需要删除):

public interface IdentifiedObjectRepository<T extends IdentifiedObject> extends CrudRepository<T, Long>{
    @Cacheable(cacheNames = "cimCache", key = "#p0")
    public T findOneByRdfId(String rdfId);

    @CachePut(cacheNames = "cimCache", key = "#p0.rdfId")
    public <S extends T> S save(S entity);

    @CacheEvict(cacheNames = "cimCache", allEntries = true)
    public void delete(Long id);
}

Every class has a repository, which is a sub-interface of the above one. 每个类都有一个存储库,该存储库是上述存储库的子接口。

As it turned out the SELECTs doesn't seem to be a bottleneck, unless I did something wrong in the code above, since the duration hasn't changed. 事实证明,SELECT似乎不是瓶颈,除非我在上面的代码中做错了什么,因为持续时间没有改变。

Now I need to address the second bottleneck. 现在,我需要解决第二个瓶颈。 My idea was to work just on the cache and INSERT everything as a batch once the document is completely imported, but I don't know how to do this with the tool-set of spring. 我的想法是仅在缓存上工作,并在文档完全导入后批量插入所有内容,但是我不知道如何使用spring的工具集来做到这一点。 (I already had a solution, where I made my own cache with a lot of HashMaps ). (我已经有了一个解决方案,其中我使用许多HashMaps制作了自己的缓存)。

If I understand the problem correctly, I'd say you are using the wrong tool for the job or you a using the wrong approach. 如果我正确地理解了问题,那可能是您在使用错误的工具来完成这项工作,或者您使用的是错误的方法。

Interrupting an import of one entity to import another one which it references sounds complicated and inefficient to me. 中断一个实体的导入以导入它引用的另一个实体对我来说听起来既复杂又效率低下。

I would do one of the following. 我将执行以下操作之一。

Option 1: Use the database 选项1:使用数据库

  1. Insert all the entities in the database (using Spring Data JPA or even just plain JDBC). 将所有实体插入数据库中(使用Spring Data JPA甚至只是普通JDBC)。 But don't resolve the references yet, keep them just as Strings. 但是请不要解析引用,将其保留为字符串。 Make sure you use batching. 确保使用批处理。

  2. Run a single update statement (per table) that sets all the references to the correct value. 运行单个更新语句(每个表),该语句将所有引用设置为正确的值。

Option 2: Do the same in memory 选项2:在记忆体中做同样的事情

  1. Load all entities, without resolving any references. 加载所有实体,而不解析任何引用。 Store them in a way that let's you easily and efficiently find a document by it's reference. 存储它们的方式使您可以通过参考轻松而有效地找到文档。

  2. Resolve all the references. 解决所有引用。

  3. Save it. 保存。 Make sure batching is configured properly. 确保批处理配置正确。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM