骆驼：文件使用者组件“咬得比它所能咀嚼的更多”，管道因内存不足错误而终止

Question

我在Camel中定义了一条路由，该路由类似于以下内容：收到GET请求，在文件系统中创建了一个文件。 文件使用者将其拾取，从外部Web服务中获取数据，然后通过POST将结果消息发送到其他Web服务。

下面的简化代码：

    // Update request goes on queue:
    from("restlet:http://localhost:9191/update?restletMethod=post")
    .routeId("Update via POST")
    [...some magic that defines a directory and file name based on request headers...]
    .to("file://cameldest/queue?allowNullBody=true&fileExist=Ignore")

    // Update gets processed
    from("file://cameldest/queue?delay=500&recursive=true&maxDepth=2&sortBy=file:parent;file:modified&preMove=inprogress&delete=true")
    .routeId("Update main route")
    .streamCaching() //otherwise stuff can't be sent to multiple endpoints
    [...enrich message from some web service using http4 component...]
    .multicast()
        .stopOnException()
        .to("direct:sendUpdate", "direct:dependencyCheck", "direct:saveXML")
    .end();

多播中的三个端点只是将结果消息发布到其他Web服务。

当队列（即文件目录cameldest ）为空时，这一切都很好。 正在使用cameldest/<subdir>创建文件，由文件使用者将其拾取并移入cameldest/<subdir>/inprogress ，并且将东西发送到三个外发POST都没问题。

但是，一旦传入的请求堆积了大约300,000个文件，进度就会变慢，最终由于内存不足错误 （超出了GC开销限制）， 管道将失败 。

通过增加日志记录，我可以看到文件使用者轮询基本上从未运行，因为它似乎对每次看到的所有文件负责 ，等待它们完成处理，然后才开始另一轮轮询。除了（我假设）导致资源瓶颈之外，这还干扰了我的排序要求：一旦队列中塞满了数千条等待处理的消息，新消息将被天真地排序得更高-如果它们仍然被捡起-仍在等待那些已经“开始”的人。

现在，我已经试过maxMessagesPerPoll和eagerMaxMessagesPerPoll选项。 一开始它们似乎可以缓解问题，但经过多次调查后，我仍然在“开始的”困境中得到了数千个文件。

唯一maxMessages...就是使delay和maxMessages...瓶颈变得如此狭窄，以至于平均而言，处理将比文件轮询周期更快。

显然，那不是我想要的。 我希望我的管道尽快处理文件，但不要更快。 我期望文件使用者在路由繁忙时等待。

我犯了一个明显的错误吗？

（如果这是问题的一部分，那么我将在带有XFS的Redhat 7机器上运行稍旧的Camel 2.14.0。）

Answer 1

尝试将源文件端点上的maxMessagesPerPoll设置为一个较低的值，以使每次轮询最多只能拾取X个文件，这也限制了您的Camel应用程序中的运行中消息总数。

您可以在Camel文档中找到有关该选项的更多信息，该文件组件

Answer 2

除非您确实需要将数据另存为文件，否则我将提出一种替代解决方案。

从您的restlet使用者处，将每个请求发送到消息队列应用程序，例如activemq或rabbitmq或类似的东西。 您很快就会在该队列中收到很多消息，但这没关系。

然后，将文件使用方替换为队列使用方。 这将需要一些时间，但是每条消息都应分别处理并发送到所需的任何地方。 我已经用大约500 000条消息测试了rabbitmq，并且效果很好。 这也应减轻消费者的负担。

Answer 3

简短的答案是没有答案：Camel的文件组件的sortBy选项太内存sortBy ，无法适应我的用例：

唯一性：如果文件已经存在，我不想将其放在队列中。
优先级：标记为高优先级的文件应首先处理。
性能：拥有几十万个文件，甚至几百万个文件应该没有问题。
FIFO ：（奖励）最早的文件（按优先级排序）应首先获取。

问题是，如果我正确阅读了源代码和文档，则无论使用内置语言还是自定义可插拔sorter ，所有文件详细信息都在内存中以执行排序。 文件组件总是会创建一个包含所有细节对象的列表，而且显然会导致垃圾收集开销的疯狂额时轮询许多文件经常。

大多数情况下，我的用例都能正常工作，而不必通过以下步骤使用数据库或编写自定义组件：

从父目录上的一个文件使用方cameldest/queue移动到两个使用方 ，子目录递归地对子目录中的文件（ cameldest/queue/high/ cameldest/queue/low/ ）进行分类，每个目录一个，不进行任何排序。
通过/cameldest/queue/high/ 仅设置使用者以通过我的实际业务逻辑处理文件。
从/cameldest/queue/low设置使用者，以简单地将文件从“ low”升级为“ high”（将其复制，即.to("file://cameldest/queue/high"); ）
至关重要的是，为了仅在高忙时将其从“低”提升为“高” ，请将路由策略附加到“高”以限制其他路由 ，即，如果“高”中有任何正在运行的消息，则将“低” ”
另外，我将ThrottlingInflightRoutePolicy添加到“高”，以防止它一次影响太多的交换。

想象一下，就像在机场办理登机手续一样，如果那里是空的话，就会邀请游客进入商务舱专用道。

这在负载下就像是一种魅力，即使数十万个文件处于“低”队列中，新消息（文件）也可以在几秒钟内直接处理成“高”。

该解决方案不能满足的唯一要求是顺序性：不能保证首先拾取较旧的文件，而是随机拾取它们。 可以想象这样一种情况，一堆稳定的传入文件流可能导致一个特定的文件X总是很不走运，而且永远不会被拾取。 但是，发生这种情况的机会很小。

可能的改进：当前，允许/中止将文件从“低”提升为“高”的提升的阈值设置为“高”飞行中的0条消息。一方面，这保证了放到“高”位置的文件将在执行从“低”位置进行的另一次升级之前得到处理，另一方面，这会导致有点停止启动模式，尤其是在多线程环境中场景。 虽然这不是一个真正的问题，但其性能还是令人印象深刻的。

资源：

我的路线定义：

    ThrottlingInflightRoutePolicy trp = new ThrottlingInflightRoutePolicy();
    trp.setMaxInflightExchanges(50);

    SuspendOtherRoutePolicy sorp = new SuspendOtherRoutePolicy("lowPriority");

    from("file://cameldest/queue/low?delay=500&maxMessagesPerPoll=25&preMove=inprogress&delete=true")
    .routeId("lowPriority")
    .log("Copying over to high priority: ${in.headers."+Exchange.FILE_PATH+"}")
    .to("file://cameldest/queue/high");

    from("file://cameldest/queue/high?delay=500&maxMessagesPerPoll=25&preMove=inprogress&delete=true")
    .routeId("highPriority")
    .routePolicy(trp)
    .routePolicy(sorp)
    .threads(20)
    .log("Before: ${in.headers."+Exchange.FILE_PATH+"}")
    .delay(2000) // This is where business logic would happen
    .log("After: ${in.headers."+Exchange.FILE_PATH+"}")
    .stop();

我的SuspendOtherRoutePolicy ，像ThrottlingInflightRoutePolicy一样松散地构建

public class SuspendOtherRoutePolicy extends RoutePolicySupport implements CamelContextAware {

    private CamelContext camelContext;
    private final Lock lock = new ReentrantLock();
    private String otherRouteId;

    public SuspendOtherRoutePolicy(String otherRouteId) {
        super();
        this.otherRouteId = otherRouteId;
    }

    @Override
    public CamelContext getCamelContext() {
        return camelContext;
    }

    @Override
    public void onStart(Route route) {
        super.onStart(route);
        if (camelContext.getRoute(otherRouteId) == null) {
            throw new IllegalArgumentException("There is no route with the id '" + otherRouteId + "'");
        }
    }

    @Override
    public void setCamelContext(CamelContext context) {
        camelContext = context;
    }

    @Override
    public void onExchangeDone(Route route, Exchange exchange) {
        //log.info("Exchange done on route " + route);
        Route otherRoute = camelContext.getRoute(otherRouteId);
        //log.info("Other route: " + otherRoute);
        throttle(route, otherRoute, exchange);
    }

    protected void throttle(Route route, Route otherRoute, Exchange exchange) {
        // this works the best when this logic is executed when the exchange is done
        Consumer consumer = otherRoute.getConsumer();

        int size = getSize(route, exchange);
        boolean stop = size > 0;
        if (stop) {
            try {
                lock.lock();
                stopConsumer(size, consumer);
            } catch (Exception e) {
                handleException(e);
            } finally {
                lock.unlock();
            }
        }

        // reload size in case a race condition with too many at once being invoked
        // so we need to ensure that we read the most current size and start the consumer if we are already to low
        size = getSize(route, exchange);
        boolean start = size == 0;
        if (start) {
            try {
                lock.lock();
                startConsumer(size, consumer);
            } catch (Exception e) {
                handleException(e);
            } finally {
                lock.unlock();
            }
        }
    }

    private int getSize(Route route, Exchange exchange) {
        return exchange.getContext().getInflightRepository().size(route.getId());
    }

    private void startConsumer(int size, Consumer consumer) throws Exception {
        boolean started = super.startConsumer(consumer);
        if (started) {
            log.info("Resuming the other consumer " + consumer);
        }
    }

    private void stopConsumer(int size, Consumer consumer) throws Exception {
        boolean stopped = super.stopConsumer(consumer);
        if (stopped) {
            log.info("Suspending the other consumer " + consumer);
        }
    }
}

骆驼：文件使用者组件“咬得比它所能咀嚼的更多”，管道因内存不足错误而终止

问题描述

3 个解决方案

解决方案1
2 2017-02-19 08:02:56

解决方案2
0 2017-02-20 08:07:55

解决方案3
0 已采纳 2017-02-24 10:34:36

骆驼：文件使用者组件“咬得比它所能咀嚼的更多”，管道因内存不足错误而终止

问题描述

3 个解决方案

解决方案1 2 2017-02-19 08:02:56

解决方案2 0 2017-02-20 08:07:55

解决方案3 0 已采纳 2017-02-24 10:34:36

解决方案1
2 2017-02-19 08:02:56

解决方案2
0 2017-02-20 08:07:55

解决方案3
0 已采纳 2017-02-24 10:34:36