[英]Can I use message broker to stream PDF or MS Word document content as XML?
I am trying to send content of word document and PDF to Apache OpenNLP.我正在尝试将 word 文档和 PDF 的内容发送到 Apache OpenNLP。 I am wondering if I can use ActiveMQ to read the MS word so that I can trigger a process to Apache Kafka to process the stream.
我想知道我是否可以使用 ActiveMQ 读取 MS word,以便我可以触发一个进程到 Apache Kafka 来处理 stream。
Any suggestion to stream the PDF or word other than ActiveMQ is welcome.欢迎任何建议给 stream PDF 或 ActiveMQ 以外的词。
If you use ActiveMQ "Classic" (ie any 5.x version) you'll have problems moving large messages as there's no real support for that use-case.如果您使用 ActiveMQ“Classic”(即任何 5.x 版本),您将在移动大消息时遇到问题,因为没有对该用例的真正支持。 However, ActiveMQ Artemis (ie ActiveMQ's next-gen broker) has support for arbitrarily large messages which would facilitate your use-case.
但是, ActiveMQ Artemis (即 ActiveMQ 的下一代代理)支持任意大的消息,这将有助于您的用例。 The nice thing about having large message support in the broker is that you don't have to involve some other kind of storage mechanism in your solution.
在代理中支持大消息的好处是您不必在解决方案中涉及其他类型的存储机制。 That makes development and maintenance of your application and environment a bit simpler.
这使得您的应用程序和环境的开发和维护更加简单。
Message queues generally shouldn't be used for file transfer.消息队列通常不应用于文件传输。 Put the files in blob storage like S3, then send the URI between clients (eg
"s3://bucket/file.txt"
), and download and process elsewhere... Other option is to use Apache POI or similar tools in the producer client to parse your files, then send that data in whatever format you want (JSON, Avro, or Protobuf, are generally used more often in streaming tools than XML)将文件放在像 S3 这样的 blob 存储中,然后在客户端之间发送 URI(例如
"s3://bucket/file.txt"
),然后在其他地方下载和处理......其他选择是使用 Apache POI 或类似工具生产者客户端解析你的文件,然后以你想要的任何格式发送该数据(JSON、Avro 或 Protobuf,通常在流媒体工具中比 XML 更常用)
Actual file processing has nothing to do with the queue technology used实际文件处理与使用的队列技术无关
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.