简体   繁体   中英

Delete huge number of rows from an EJB Timer

I need to delete millions of rows from a table from within an EJB Timer. The problem is that the timer has a transaction timeout of 90 seconds, so I should divide the work into bite-size chunks.

Since I don't know how many rows can be deleted in 90 seconds the algorithm should loop and delete a few at a time until the time is almost up.

The problem is: How can the number of rows to delete be limited elegantly in JPA? The delete is made on all rows having a timestamp earlier than a certain date.

I guess it is possible to find the 1000th oldest row and DELETE WHERE timestamp <= {1000th-oldest-row.timestamp} This, however, is not very elegant and I would have to get to the last row in a 1000 to get the timestamp.

Secondly, the timer should trigger immediately again if the table is not clean after the 90 seconds. This can be easily solved but, again, is not very elegant.

You will still face transaction expiration issues with the solution you have.

The trick is to execute each chunk in a separate transaction as shown below in pesudo code.

@Entity

@NamedQueries ( value = {
    @NamedQuery (
        name = pagedDeleteExpiredItems
        query=    DELETE FROM MyTable
            WHERE (<table key>) IN (
                SELECT <table key> FROM (
                SELECT ROWNUM AS row_num, <table key> FROM MyTable
                WHERE timestamp <= :currentTime
                )
                WHERE row_num < :pageSize
            )
    )
})

public class MyEntity {
    @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW)
    int doPagedDeleteExpiredItems(Date currentTime, int pageSize) {
        Query query = em.createNamedQuery("pagedDeleteExpiredItems");
        query.setParameter("currentTime", currentTime);
        query.setParameter("pageSize", pageSize);
        int deleteCount = query.executeUpdate();
        return deleteCount;
    }
}


@EJBTimer
public class DeleteExpiredItemsTimer {

    @EJB(beanName = "MyEntity")
    MyEntity myEntity;

    @TransactionAttribute(TransactionAttributeType.NOT_SUPPORTED)
    void handleTimeout(Timer timer) {
        Date currentTime = getCurrentTime()
        int pageSize = 100
        int deleteCount;
        do {
            myEntity.doPagedDeleteExpiredItems(currentTime, pageSize);
        } while(deleteCount>0);
    }
}

We had a similar requirement and here is how we solved it. I was using EJB 3.0.

  1. Timer is started when the app. server starts (or the module is deployed) in a ServletContextListener .
  2. When the timer fires, it process up to 100 rows that are pending. You need then to order the result of the query and limit the number of row.
  3. If there was 100 rows, the timer schedules the next timeout with 0ms . That it, the transaction is committed, and the timer fires again in a new transaction.
  4. If there was less than 100 rows, the timer schedule the next timeout in 90sec .

If there are, say, 250 rows, the timer fires three time in a sequence. There is only a minor problem if there is exactly 100 row to process, in which case the timer fires twice in a sequence, but the 2nd fire processes actually nothing. But all in all, it was working OK.

One trick I've used within SQL is to DELETE TOP 1000 (or 100 or 10000, depending on the average number of rows in a page), like so:

DELETE top 1000 WHERE timestamp <= @ExpirationDate

Call this repeatedly until no rows are deleted (check with @@rowcount) or you run out of time. Can this technique be implemented in JPA?

Solved the problem by getting a sorted list of rows eligible to clean and using setFirstResult(int) to the same as setMaxResults(int). This way I get the ordering of an item approximately maxCount steps from the oldest.

Query expired = dm.createNamedQuery("getExpiredElements");
expired.setParameter("currentTime", getCurrentTime());
expired.setMaxResults(maxCount);
expired.setFirstResult(maxCount);
@SuppressWarnings("unchecked")
List<Item> expiredChunk = (List<Item>) expired.getResultList();
long lastChunkEndTime = expiredChunk.get(0).getEndTime();
Query query = em.createNamedQuery("deleteExpiredItems");
query.setParameter("currentTime", lastChunkEndTime);
int result = query.executeUpdate();
return result >= maxCount;

The function returns true (at least) if it should be executed again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM