Java - 每次执行循环需要更长时间吗？

Question

I'm working on some J2EE project which involves storing postal codes, cities and countries together. 我正在研究一些涉及将邮政编码，城市和国家存储在一起的J2EE项目。 We have developed a Java class which handles the integration of every country file (containing each postal code and each city). 我们开发了一个Java类来处理每个国家文件的集成（包含每个邮政编码和每个城市）。 The problem is that for some countries (Great Britain, Netherlands...), the file is pretty consequent (400.000 to 800.000 lines). 问题是，对于一些国家（英国，荷兰......），该文件非常紧凑（400.000到800.000行）。

I've got a while() loop which reads the next line, gets the information and stores it into my database. 我有一个while()循环，它读取下一行，获取信息并将其存储到我的数据库中。 The problem is that for the 1000 or 10.000 first lines, the process is fast, really fast, then seems to be slowing each time it goes through the loop, then happens to throw a HeapSpaceOverflowException after 150.000 lines. 问题是，对于1000或10.000的第一行，进程很快，非常快，然后每次进入循环时似乎都在减慢，然后恰好在150.000行之后抛出HeapSpaceOverflowException 。

I thought first that some object wasn't garbage collected and slowed down my algorithm, but I can't figure out which one. 我首先想到一些对象不是垃圾收集并且减慢了我的算法，但我无法弄清楚哪一个。 Besides, when I run this algorithm on my PC, JConsole tells me that heap space is regularly cleaned (seems to be garbage collected), but the process is still slower and slower... 此外，当我在我的PC上运行此算法时，JConsole告诉我堆空间定期清理（似乎是垃圾收集），但过程仍然更慢更慢...

Below is the code of the method : 以下是该方法的代码：

FileReader fr = new FileReader(nomFichier);
BufferedReader br = new BufferedReader(fr);

int index = 0; String ligne; String codePostal; String nomVille; 
String codePays; PPays pays; String[] colonnes;

while ((ligne = br.readLine()) != null)
{
    System.out.println("line "+ ++index);

    colonnes = ligne.split(Pattern.quote(";"));

    codePostal = colonnes[9];
    nomVille   = colonnes[8];
    codePays   = colonnes[0];

    pays = this.pc.getByCodePays(codePays);

    this.pc.getByCodePostalAndVilleAndINSEE(codePostal, nomVille, pays.getNomPays(), "");
}

Variable this.pc is injected through @Inject annotation. 变量this.pc通过@Inject注释注入。

Can someone help me to figure out why this code gets slower and slower ? 有人可以帮我弄清楚为什么这段代码变得越来越慢？

Thanks a lot. 非常感谢。

Edit : For completion sake, I've added the code of the get...() method : 编辑：为了完成，我添加了get...()方法的代码：

public Codepostalville getByCodePostalAndVilleAndINSEE(String codePostal, String ville, 
                                                       String pays, String codeINSEE) throws DatabaseException
{
    Codepostal cp = null; Ville v = null; PPays p = null; Codepostalville cpv = null;

    try
    {
        // Tout d'abord, il faut retrouver l'objet CodePostal
        cp = (Codepostal) this.em
                        .createNamedQuery("Codepostal.findByCodePostal")
                        .setParameter("codePostal", codePostal)
                        .getSingleResult();
    }
    catch (NoResultException nre1)
    {
        // Si on ne l'a pas trouvé, on le crée
        if (cp == null)
        {
            cp = new Codepostal();
            cp.setCodePostal(codePostal);
            cpc.getFacade().create(cp);
        } 
    }

    // On retrouve la ville...
    try
    {
        // Le nom de la ville passé par l'utilisateur doit être purgé (enlever
        // les éventuels tirets, caractères spéciaux...)
        // On crée donc un nouvel objet Ville, auquel on affecte le nom à purger
        // On effectue la purge, et on récupère le nom purgé
        Ville purge = new Ville();
        purge.setNomVille(ville);
        purge.purgerNomVille();
        ville = purge.getNomVille();

        v = (Ville) this.em
                        .createNamedQuery("Ville.findByNomVille")
                        .setParameter("nomVille", ville)
                        .getSingleResult();
    }
    catch (NoResultException nre2)
    {
        // ... ou on la crée si elle n'existe pas
        if (v == null)
        {
            v = new Ville();
            v.setNomVille(ville);
            vc.getFacade().create(v);
        }
    }

    // On retrouve le pays
    try
    {
        p = (PPays) this.em
                        .createNamedQuery("PPays.findByNomPays")
                        .setParameter("nomPays", pays)
                        .getSingleResult();
    }
    catch (NoResultException nre2)
    {
        // ... ou on la crée si elle n'existe pas
        if (p == null)
        {
            p = new PPays();
            p.setNomPays(pays);
            pc.getFacade().create(p);
        }
    }

    // Et on retrouve l'objet CodePostalVille
    try
    {
        cpv = (Codepostalville) this.em
                .createNamedQuery("Codepostalville.findByIdVilleAndIdCodePostalAndIdPays")
                .setParameter("idVille", v)
                .setParameter("idCodePostal", cp)
                .setParameter("idPays", p)
                .getSingleResult();

        // Si on a trouvé l'objet CodePostalVille, on met à jour son code INSEE
        cpv.setCodeINSEE(codeINSEE);
        this.getFacade().edit(cpv);
    }
    catch (NoResultException nre3)
    {         
        if (cpv == null)
        {
            cpv = new Codepostalville();
            cpv.setIdCodePostal(cp);
            cpv.setIdVille(v);
            cpv.setCodeINSEE(codeINSEE);
            cpv.setIdPays(p);
            this.getFacade().create(cpv);
        }
    }

    return cpv;
}

Thanks again. 再次感谢。

Edit 2 : So, I have some more information. 编辑2：所以，我有更多的信息。 The getCodePostal...() method needs around 15ms to be executed at the very beginning of the loop, and after 10.000 lines, it needs more than 100ms to be executed (almost 10 times more !). getCodePostal...()方法需要大约15ms才能在循环开始时执行，而在10.000行之后，它需要执行超过100ms（几乎10倍！）。 In this new version I have disabled the commit/rollback code, so each query is committed on the fly. 在这个新版本中，我已经禁用了提交/回滚代码，因此每个查询都是动态提交的。

I can't really find why it needs more and more time. 我无法真正找到为什么它需要越来越多的时间。

I've tried to search for some information about JPA's cache : My current configuration is this (in persistence.xml) : 我试图搜索有关JPA缓存的一些信息：我当前的配置是这样的（在persistence.xml中）：

   <property name="eclipselink.jdbc.bind-parameters" value="true"/>
  <property name="eclipselink.jdbc.cache-statements" value="true"/>
  <property name="eclipselink.cache.size.default" value="10000"/>
  <property name="eclipselink.query-results-cache" value="true"/>

I don't know if it is the most efficient configuration, and I would appreciate some help and some explanations about JPA's cache. 我不知道它是否是最有效的配置，我希望得到一些关于JPA缓存的帮助和一些解释。

Thanks. 谢谢。

Answer 1

You might want to read up on JPA concepts. 您可能想要阅读JPA概念。 In brief, an EntityManager is associated with a persistence context, which keeps a reference to all persistent objects manipulated through it, so it can write any changes done to these objects back to the database. 简而言之，EntityManager与持久化上下文相关联，该上下文保持对通过它操纵的所有持久对象的引用，因此它可以将对这些对象所做的任何更改写回数据库。

Since you never close the persistence context, that's the likely cause of your memory leak. 由于您从未关闭持久性上下文，这可能是导致内存泄漏的原因。 Moreover, a persistence provider must write changes to persistent objects to the database prior to issuing a query, if these changes might alter the result of the query. 此外，如果这些更改可能会更改查询结果，则持久性提供程序必须在发出查询之前将持久对象的更改写入数据库。 To detect these changes requires iteration over all objects associated with the current persistent context. 要检测这些更改，需要迭代与当前持久上下文关联的所有对象。 In your code, that's nearly a million objects for every query you issue. 在您的代码中，您发出的每个查询都有近一百万个对象。

Therefore, at the very least, you should clear the persistence context in regular intervals (say every 1000 rows). 因此，至少应该定期清除持久化上下文（比如说每1000行）。

It's also worth noting that unless your database is on the same server, every query you issue must travel over the network to the database, and the result back to the application server, before your program can continue. 还值得注意的是，除非您的数据库位于同一服务器上，否则您发出的每个查询都必须通过网络传输到数据库，并在程序继续之前将结果返回给应用程序服务器。 Depending on network latency, this can easily take a milli second each time - and you are doing this several million times. 根据网络延迟，每次可能很容易花费一毫秒 - 而且这样做数百万次。 If it needs to be truly efficient, loading the entire table into memory, and performing the checks for existence there, might be substantially faster. 如果它需要真正有效，将整个表加载到内存中，并在那里执行检查，可能会大大加快。

Answer 2

Problem "solved" (almost) ! 问题“解决了”（差不多）！ I have configured my persistence.xml this way : 我以这种方式配置了我的persistence.xml ：

<property name="eclipselink.jdbc.batch-writing" value="JDBC"/>
<property name="eclipselink.jdbc.batch-writing.size" value="10000"/>

At first, it didn't change anything. 起初，它没有改变任何东西。 But then, I tried to cut my file into smaller pieces (when the file has more than 5000 rows, I read the 5000 rows, I store them in a StringBuilder, then I read the StringBuilder to insert 5000 rows at once). 但后来，我试图将我的文件切成小块（当文件有超过5000行时，我读了5000行，我将它们存储在StringBuilder中，然后我读取StringBuilder一次插入5000行）。 This way, my code doesn't get any slower after 20.000 rows (for now). 这样，我的代码在20.000行（现在）之后不会变慢。 It seems to work fine, but I still can't get why my code was getting slower when I worked with bigger pieces of file... 它看起来工作正常，但我仍然无法理解为什么我的代码变得越来越慢，当我使用更大的文件...

Thanks to everyone who tried to help me on this one ;) 感谢所有试图帮助我的人;）

Java - 每次执行循环需要更长时间吗？

问题描述

2 个解决方案

解决方案1
12 2014-07-24 12:59:47

解决方案2
0 2014-07-25 15:11:37

Java - 每次执行循环需要更长时间吗？

问题描述

2 个解决方案

解决方案1 12 2014-07-24 12:59:47

解决方案2 0 2014-07-25 15:11:37

解决方案1
12 2014-07-24 12:59:47

解决方案2
0 2014-07-25 15:11:37