简体   繁体   中英

How to find out the root cause of container going out of memory (OOM)

I am running a batch in my Micronaut application which fetches 500 000 records from db, picking batch of 100 items and after doing processing (which includes a api call for that batch) again inserting the data in another sqlite table.

     try (Connection connection = dataSource.getConnection();
            PreparedStatement statement = connection.prepareStatement("SELECT id,item_id,type,operation FROM table WHERE serial_id = ? AND type = ? AND fail_reason IS NULL"  )) {
            statement.setString(1, serialId);
            statement.setString(2, type.name());
            ResultSet resultSet = statement.executeQuery();
            List<ItemEntity> itemEntities = new ArrayList<>(batchSize);
            int i = 0;
            while (resultSet.next()) {
                itemEntities.add(ItemEntity.builder()
                        .id(resultSet.getString("id"))
                        .itemId(resultSet.getString("item_id"))
                        .type(ItemType.valueOf(resultSet.getString("type")))
                        .operation(Operation.valueOf(resultSet.getString("operation")))
                        .build());

                i++;

                if(i == batchSize) {
                    i = 0;
                    consumer.accept(itemEntities);
                    itemEntities.clear();
                }
            }

            if(!itemEntities.isEmpty())
                consumer.accept(itemEntities);
        } catch (Exception ex) {
            log.error("error", ex);
            throw new RuntimeException("error", ex);
        }
    }

Whenever this batch is running container is restarting with exit code 137. I have checked with the below jvm arguments

-XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:HeapDumpPath=/var/data/heapdump.hprof

As I am not getting any heapdump file after container restart So I am assuming it's not the Micronaut application that is causing container to go OOM.

Container memory limit is 512m.

What are all the things I can try to debug this issue?

A technique that i find effective to debug issues I can call "isolate and strip down". The idea behind this is to isolate all the factors of this problem into separate concerns and keep iterating to find out which factor weighs into the most to cause the problem. It is a problem identification exercise. Once you have listed out some candidate problems, then you can shift your attention to a solution mindset.

For your particular issue, I can summarize whats happening as such (correct me if I am wrong)

  • use a database connection to prepare a statement, then assign it some variables, and execute.
  • get the result set. with the result set, fill the item entities and send them to a consumer in a batch (about 100). Then repeat

From here i can extract a list of concerns.

    1. sql query the database and expect a result of 50k
    1. processing one resultSet row into an item entity
    1. passing one item entity into the consumer.

So isolate these concerns to debug it. While doing so comment out or delete the other parts of the code.

For item 1, see if the resulting set of 50k is causing an OOM error. adjust the query to have a max of 10k, 20k, 30k. Or start with 1k; see if 100 results works. Change the query a bit and see if you can add to it bit by bit and see when it break.

For item 2, is building the items in a loop of a 100 a problem. How about can you build it all the way to 500k. Try different amounts.

For Item 3, can you process one item to consumer without any issues. Can you send it mock items just created by hand and processing one batch of 100. Then raise it to a batch of 1000 or 5000. If successful rotate through and see at what point it fails.

Another angle is to mess with the memory limits. I would do this afterwards once you have some subset of the above working, and keeping incrementing until you see it break.

A still another route is see if you can use Micronaut Repositories and entities to solve the same issue. both in a smaller context like 100 entities and then try to increment up to 50k. I dont think from what i see above that is micronaut issue, but experimenting will help rule it out with proof

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM