简体   繁体   English

Grails批量插入/更新优化

[英]Grails bulk insert/update optimization

I am importing a large amount of data from a csv file, (file size is over 100MB) 我从csv文件导入大量数据,(文件大小超过100MB)

the code i'm using looks like this : 我正在使用的代码如下所示:

    def errorLignes = []
    def index = 1
    csvFile.toCsvReader(['charset':'UTF-8']).eachLine { tokens ->
        if (index % 100 == 0) cleanUpGorm()
        index++

        def order = Orders.findByReferenceAndOrganization(tokens[0],organization)

        if (!order) {
            order = new Orders()

        }

        if (tokens[1]){
            def user = User.findByReferenceAndOrganization(tokens[1],organization)
            if (user){
                order.user = user
            }else{
                errorLignes.add(tokens)
            }
        }

        if (tokens[2]){
            def customer =  Customer.findByCustomCodeAndOrganization(tokens[2],organization)
            if (customer){
                order.customer = customer
            }else{
                errorLignes.add(tokens)
            }
        }


        if (tokens[3]){
            order.orderType = Integer.parseInt(tokens[3])
        }
        // etc.....................
        order.save()

    }

and i'm using the cleanUpGorm method to clean session after each 100 entries 我正在使用cleanUpGorm方法在每100个条目后清理会话

def cleanUpGorm() {
    println "clean up gorm"
    def session = sessionFactory.currentSession
    session.flush()
    session.clear()
    propertyInstanceMap.get().clear()
}

I also turned 2nd level cache off 我也关闭了二级缓存

hibernate {
    cache.use_second_level_cache = false
    cache.use_query_cache = false
    cache.provider_class = 'net.sf.ehcache.hibernate.EhCacheProvider'
}

the grails version of the project is 2.0.4 and as database i am using mysql 项目的grails版本是2.0.4,而我正在使用mysql作为数据库

for every entry , i am doing 3 calls to a find 对于每个条目,我正在进行3次调用

  • to check if the order already exists 检查订单是否已存在
  • to check if user is correct 检查用户是否正确
  • to check if customer is correct 检查客户是否正确

and finally i'm saving the order instance 最后我正在保存订单实例

the import process is too slow, i am wondering how can I speed up and optimise this code. 导入过程太慢,我想知道如何加快和优化此代码。

EDIT : 编辑:

I found that the searchable plugin is also making it slower . 我发现可搜索的插件也让它变慢了。 so , to get around this , I used the command : 所以,为了解决这个问题,我使用了命令:

searchableService.stopMirroring()

But it still not fast enough,I am finally changing the code to use groovy sql instead 但它仍然不够快,我终于改变代码使用groovy sql代替

This found this blog entry very useful: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/ 这发现这篇博客文章非常有用: http//naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/

You are already cleaning up GORM, but try cleaning every 100 entries: 您已经在清理GORM,但请尝试清理每100个条目:

def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
propertyInstanceMap.get().clear()

Creating database indexes might help aswell and use default-storage-engine=innodb instead of MyISAM . 创建数据库索引可能也有帮助,并使用default-storage-engine=innodb而不是MyISAM

I'm also in the process of writing a number of services that will accomplish loads of very large datasets (multiple files of up to ~17million rows each). 我也正在编写​​一些服务,这些服务将完成大量非常大的数据集(每个数据库最多可达1700万行)。 I initially tried the cleanUpGorm method you use, but found that, whilst it did improve things, the loading was still slow. 我最初尝试使用你使用的cleanUpGorm方法,但发现虽然它确实改进了东西,但加载仍然很慢。 Here's what I did to make it much faster: 这就是我做得更快的事情:

  1. Investigate what it is that is causing the app to actually be slow. 调查是什么导致应用程序实际上变慢。 I installed the Grails Melody plugin, then did a run-app then opened a browser at /monitoring . 我安装了Grails Melody插件,然后做了一个run-app然后在/monitoring打开了一个浏览器。 I could then see which routines took time to execute and what the worst-performing queries actually were. 然后,我可以看到哪些例程需要时间来执行,以及执行情况最差的查询实际上是什么。

  2. Many of the Grails GORM methods map to a SQL ... where ... clause. 许多Grails GORM方法映射到SQL ... where ...子句。 You need to ensure that you have an index for each item used in a where clause for each query that you want to make faster, otherwise the method will become considerably slower the bigger your dataset is. 您需要确保在每个查询的where子句中使用的每个项目的索引都要更快,否则当数据集越大时,该方法将变得相当慢。 This includes putting indexes on the id and version columns that are injected into each of your domain classes. 这包括将索引放在注入每个域类的idversion列中。

  3. Ensure you have indexes set up for all of your hasMany and belongsTo relationships. 确保为所有hasMany和belongsTo关系设置了索引。

  4. If the performance is still too slow, use Spring Batch. 如果性能仍然太慢,请使用Spring Batch。 Even if you've never used it before, it should take you no time at all to set up a batch parse of a CSV file to parse into Grails domain objects. 即使您以前从未使用它,也应该花费时间设置CSV文件的批处理解析以解析为Grails域对象。 I suggest you use the grails-spring-batch plugin to do this and use the examples here to get a working implementation going quickly. 我建议您使用grails-spring-batch插件来执行此操作,并使用此处的示例快速完成工作实现。 It's extremely fast, very configurable and you don't have to mess around with cleaning up the session. 它非常快,非常可配置,您不必乱用清理会话。

I had used batch insert while insert records, this is much faster than gorm cleanup method. 我在插入记录时使用了批量插入,这比gorm清理方法快得多。 Below example describes you how to implement it. 下面的示例描述了如何实现它。

    Date startTime   = new Date()
    Session session = sessionFactory.openSession();
    Transaction tx = session.beginTransaction();

    (1..50000).each {counter ->
        Person person           = new Person()
        person.firstName        = "abc"
        person.middleName       = "abc"
        person.lastName         = "abc"
        person.address          = "abc"
        person.favouriteGame    = "abc"
        person.favouriteActor   = "abc"

        session.save(person)
        if(counter.mod(100)==0) {
            session.flush();
            session.clear();
        }

        if(counter.mod(10000)==0) {
            Date endTime    =new Date()
            println "Record inserted Counter =>"+counter+" Time =>"+TimeCategory.minus(endTime,startTime)
        }
    }

    tx.commit();
    session.close(); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM