简体   繁体   English

骆驼:文件使用者组件“咬得比它所能咀嚼的更多”,管道因内存不足错误而终止

[英]Camel: File consumer component “bites off more than it can chew”, pipeline dies from out-of-memory error

I have a route defined in Camel that goes something like this: GET request comes in, a file gets created in the file system. 我在Camel中定义了一条路由,该路由类似于以下内容:收到GET请求,在文件系统中创建了一个文件。 File consumer picks it up, fetches data from external web services, and sends the resulting message by POST to other web services. 文件使用者将其拾取,从外部Web服务中获取数据,然后通过POST将结果消息发送到其他Web服务。

Simplified code below: 下面的简化代码:

    // Update request goes on queue:
    from("restlet:http://localhost:9191/update?restletMethod=post")
    .routeId("Update via POST")
    [...some magic that defines a directory and file name based on request headers...]
    .to("file://cameldest/queue?allowNullBody=true&fileExist=Ignore")

    // Update gets processed
    from("file://cameldest/queue?delay=500&recursive=true&maxDepth=2&sortBy=file:parent;file:modified&preMove=inprogress&delete=true")
    .routeId("Update main route")
    .streamCaching() //otherwise stuff can't be sent to multiple endpoints
    [...enrich message from some web service using http4 component...]
    .multicast()
        .stopOnException()
        .to("direct:sendUpdate", "direct:dependencyCheck", "direct:saveXML")
    .end();

The three endpoints in the multicast are simply POSTing the resulting message to other web services. 多播中的三个端点只是将结果消息发布到其他Web服务。

This all works rather well when the queue (ie the file directory cameldest ) is fairly empty. 当队列(即文件目录cameldest )为空时,这一切都很好。 Files are being created in cameldest/<subdir> , picked up by the file consumer and moved into cameldest/<subdir>/inprogress , and stuff is being sent to the three outgoing POSTs no problem. 正在使用cameldest/<subdir>创建文件,由文件使用者将其拾取并移入cameldest/<subdir>/inprogress ,并且将东西发送到三个外发POST都没问题。

However, once the incoming requests pile up to about 300,000 files progress slows down and eventually the pipeline fails due to out-of-memory errors (GC overhead limit exceeded). 但是,一旦传入的请求堆积了大约300,000个文件,进度就会变慢,最终由于内存不足错误 (超出了GC开销限制), 管道将失败

By increasing logging I can see that the file consumer polling basically never runs, because it appears to take responsibility for all files it sees at each time, waits for them to be done processing, and only then starts another poll round. 通过增加日志记录,我可以看到文件使用者轮询基本上从未运行,因为它似乎对每次看到的所有文件负责 ,等待它们完成处理,然后才开始另一轮轮询。 Besides (I assume) causing the resources bottleneck, this also interferes with my sorting requirements: Once the queue is jammed with thousands of messages waiting to be processed, new messages that would naively be sorted higher up are -if they even still get picked up- still waiting behind those that are already "started". 除了(我假设)导致资源瓶颈之外,这还干扰了我的排序要求:一旦队列中塞满了数千条等待处理的消息,新消息将被天真地排序得更高-如果它们仍然被捡起-仍在等待那些已经“开始”的人。

Now, I've tried the maxMessagesPerPoll and eagerMaxMessagesPerPoll options. 现在,我已经试过maxMessagesPerPolleagerMaxMessagesPerPoll选项。 They seem to alleviate the problem at first, but after a number of poll rounds I still end up with thousands of files in "started" limbo. 一开始它们似乎可以缓解问题,但经过多次调查后,我仍然在“开始的”困境中得到了数千个文件。

The only thing that sort of worked was making the bottle neck of delay and maxMessages... so narrow that the processing on average would finish faster than the file polling cycle. 唯一maxMessages...就是使delaymaxMessages...瓶颈变得如此狭窄,以至于平均而言,处理将比文件轮询周期更快。

Clearly, that is not what I want. 显然,那不是我想要的。 I would like my pipeline to process files as fast as possible, but not faster. 我希望我的管道尽快处理文件,但不要更快。 I was expecting the file consumer to wait when the route is busy. 我期望文件使用者在路由繁忙时等待。

Am I making an obvious mistake? 我犯了一个明显的错误吗?

(I'm running a somewhat older Camel 2.14.0 on a Redhat 7 machine with XFS, if that is part of the problem.) (如果这是问题的一部分,那么我将在带有XFS的Redhat 7机器上运行稍旧的Camel 2.14.0。)

Try set maxMessagesPerPoll to a low value on the from file endpoint to only pickup at most X files per poll which also limits the total number of inflight messages you will have in your Camel application. 尝试将源文件端点上的maxMessagesPerPoll设置为一个较低的值,以使每次轮询最多只能拾取X个文件,这也限制了您的Camel应用程序中的运行中消息总数。

You can find more information about that option in the Camel documentation for the file component 您可以在Camel文档中找到有关该选项的更多信息,该文件组件

I would propose an alternative solution unless you really need to save the data as files. 除非您确实需要将数据另存为文件,否则我将提出一种替代解决方案。

From your restlet consumer, send each request to a message queuing app such as activemq or rabbitmq or something similar. 从您的restlet使用者处,将每个请求发送到消息队列应用程序,例如activemq或rabbitmq或类似的东西。 You will quickly end up with lots of messages on that queue but that is ok. 您很快就会在该队列中收到很多消息,但这没关系。

Then replace your file consumer with a queue consumer. 然后,将文件使用方替换为队列使用方。 It will take some time but the each message should be processed separately and sent to wherever you want. 这将需要一些时间,但是每条消息都应分别处理并发送到所需的任何地方。 I have tested rabbitmq with about 500 000 messages and that has worked fine. 我已经用大约500 000条消息测试了rabbitmq,并且效果很好。 This should reduce the load on the consumer as well. 这也应减轻消费者的负担。

The short answer is that there is no answer: The sortBy option of Camel's file component is simply too memory-inefficient to accomodate my use-case: 简短的答案是没有答案:Camel的文件组件的sortBy选项太内存sortBy ,无法适应我的用例:

  • Uniqueness: I don't want to put a file on queue if it's already there. 唯一性:如果文件已经存在,我不想将其放在队列中。
  • Priority: Files flagged as high priority should be processed first. 优先级:标记为高优先级的文件应首先处理。
  • Performance: Having a few hundred thousands of files, or maybe even a few million, should be no problem. 性能:拥有几十万个文件,甚至几百万个文件应该没有问题。
  • FIFO: (Bonus) Oldest files (by priority) should be picked up first. FIFO :(奖励)最早的文件(按优先级排序)应首先获取。

The problem appears to be, if I read the source code and the documentation correctly, that all file details are in memory to perform the sorting, no matter whether the built-in language or a custom pluggable sorter is used. 问题是,如果我正确阅读了源代码文档 ,则无论使用内置语言还是自定义可插拔sorter ,所有文件详细信息都在内存中以执行排序。 The file component always creates a list of objects containing all details, and that apparently causes an insane amount of garbage collection overhead when polling many files often. 文件组件总是会创建一个包含所有细节对象的列表,而且显然会导致垃圾收集开销的疯狂额时轮询许多文件经常。

I got my use case to work, mostly, without having to resort to using a database or writing a custom component, using the following steps: 大多数情况下,我的用例都能正常工作,而不必通过以下步骤使用数据库或编写自定义组件:

  • Move from one file consumer on the parent directory cameldest/queue that sorts recursively the files in the subdirectories ( cameldest/queue/high/ before cameldest/queue/low/ ) to two consumers , one for each directory, with no sorting at all. 从父目录上的一个文件使用方cameldest/queue移动到两个使用方 ,子目录递归地对子目录中的文件( cameldest/queue/high/ cameldest/queue/low/ )进行分类,每个目录一个,不进行任何排序。
  • Set up only the consumer from /cameldest/queue/high/ to process files through my actual business logic. 通过/cameldest/queue/high/ 设置使用者以通过我的实际业务逻辑处理文件。
  • Set up the consumer from /cameldest/queue/low to simply promote files from "low" to "high" (copying them over, ie .to("file://cameldest/queue/high"); ) /cameldest/queue/low设置使用者,以简单地将文件从“ low”升级为“ high”(将其复制,即.to("file://cameldest/queue/high");
  • Crucially, in order to only promote from "low" to "high" when high is not busy , attach a route policy to "high" that throttles the other route , ie "low" if there are any messages in-flight in "high" 至关重要的是,为了仅在高忙时将其从“低”提升为“高” ,请将路由策略附加到“高”以限制其他路由 ,即,如果“高”中有任何正在运行的消息,则将“低” ”
  • Additionally, I added a ThrottlingInflightRoutePolicy to "high" to prevent it from inflighting too many exchanges at once. 另外,我将ThrottlingInflightRoutePolicy添加到“高”,以防止它一次影响太多的交换。

Imagine this like at check-in at the airport, where tourist travellers are invited over into the business class lane if that is empty. 想象一下,就像在机场办理登机手续一样,如果那里是空的话,就会邀请游客进入商务舱专用道。

This worked like a charm under load, and even while hundreds of thousands of files were on queue in "low", new messages (files) dropped directly into "high" got processed within seconds. 这在负载下就像是一种魅力,即使数十万个文件处于“低”队列中,新消息(文件)也可以在几秒钟内直接处理成“高”。

The only requirement that this solution doesn't cover, is the orderedness: There is no guarantee that older files are picked up first, rather they are picked up randomly. 该解决方案不能满足的唯一要求是顺序性:不能保证首先拾取较旧的文件,而是随机拾取它们。 One could imagine a situation where a steady stream of incoming files could result in one particular file X just always being unlucky and never being picked up. 可以想象这样一种情况,一堆稳定的传入文件流可能导致一个特定的文件X总是很不走运,而且永远不会被拾取。 The chance of that happening, though, is very low. 但是,发生这种情况的机会很小。

Possible improvement: Currently the threshold for allowing / suspending the promotion of files from "low" to "high" is set to 0 messages inflight in "high". 可能的改进:当前,允许/中止将文件从“低”提升为“高”的提升的阈值设置为“高”飞行中的0条消息。 On the one hand, this guarantees that files dropped into "high" will be processed before another promotion from "low" is performed, on the other hand it leads to a bit of a stop-start-pattern, especially in a multi-threaded scenario. 一方面,这保证了放到“高”位置的文件将在执行从“低”位置进行的另一次升级之前得到处理,另一方面,这会导致有点停止启动模式,尤其是在多线程环境中场景。 Not a real problem though, the performance as-is was impressive. 虽然这不是一个真正的问题,但其性能还是令人印象深刻的。


Source: 资源:

My route definitions: 我的路线定义:

    ThrottlingInflightRoutePolicy trp = new ThrottlingInflightRoutePolicy();
    trp.setMaxInflightExchanges(50);

    SuspendOtherRoutePolicy sorp = new SuspendOtherRoutePolicy("lowPriority");

    from("file://cameldest/queue/low?delay=500&maxMessagesPerPoll=25&preMove=inprogress&delete=true")
    .routeId("lowPriority")
    .log("Copying over to high priority: ${in.headers."+Exchange.FILE_PATH+"}")
    .to("file://cameldest/queue/high");

    from("file://cameldest/queue/high?delay=500&maxMessagesPerPoll=25&preMove=inprogress&delete=true")
    .routeId("highPriority")
    .routePolicy(trp)
    .routePolicy(sorp)
    .threads(20)
    .log("Before: ${in.headers."+Exchange.FILE_PATH+"}")
    .delay(2000) // This is where business logic would happen
    .log("After: ${in.headers."+Exchange.FILE_PATH+"}")
    .stop();

My SuspendOtherRoutePolicy , loosely built like ThrottlingInflightRoutePolicy 我的SuspendOtherRoutePolicy ,像ThrottlingInflightRoutePolicy一样松散地构建

public class SuspendOtherRoutePolicy extends RoutePolicySupport implements CamelContextAware {

    private CamelContext camelContext;
    private final Lock lock = new ReentrantLock();
    private String otherRouteId;

    public SuspendOtherRoutePolicy(String otherRouteId) {
        super();
        this.otherRouteId = otherRouteId;
    }

    @Override
    public CamelContext getCamelContext() {
        return camelContext;
    }

    @Override
    public void onStart(Route route) {
        super.onStart(route);
        if (camelContext.getRoute(otherRouteId) == null) {
            throw new IllegalArgumentException("There is no route with the id '" + otherRouteId + "'");
        }
    }

    @Override
    public void setCamelContext(CamelContext context) {
        camelContext = context;
    }

    @Override
    public void onExchangeDone(Route route, Exchange exchange) {
        //log.info("Exchange done on route " + route);
        Route otherRoute = camelContext.getRoute(otherRouteId);
        //log.info("Other route: " + otherRoute);
        throttle(route, otherRoute, exchange);
    }

    protected void throttle(Route route, Route otherRoute, Exchange exchange) {
        // this works the best when this logic is executed when the exchange is done
        Consumer consumer = otherRoute.getConsumer();

        int size = getSize(route, exchange);
        boolean stop = size > 0;
        if (stop) {
            try {
                lock.lock();
                stopConsumer(size, consumer);
            } catch (Exception e) {
                handleException(e);
            } finally {
                lock.unlock();
            }
        }

        // reload size in case a race condition with too many at once being invoked
        // so we need to ensure that we read the most current size and start the consumer if we are already to low
        size = getSize(route, exchange);
        boolean start = size == 0;
        if (start) {
            try {
                lock.lock();
                startConsumer(size, consumer);
            } catch (Exception e) {
                handleException(e);
            } finally {
                lock.unlock();
            }
        }
    }

    private int getSize(Route route, Exchange exchange) {
        return exchange.getContext().getInflightRepository().size(route.getId());
    }

    private void startConsumer(int size, Consumer consumer) throws Exception {
        boolean started = super.startConsumer(consumer);
        if (started) {
            log.info("Resuming the other consumer " + consumer);
        }
    }

    private void stopConsumer(int size, Consumer consumer) throws Exception {
        boolean stopped = super.stopConsumer(consumer);
        if (stopped) {
            log.info("Suspending the other consumer " + consumer);
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM