简体   繁体   English

我可以使用消息代理到 stream PDF 或 MS Word 文档内容为 XML 吗?

[英]Can I use message broker to stream PDF or MS Word document content as XML?

I am trying to send content of word document and PDF to Apache OpenNLP.我正在尝试将 word 文档和 PDF 的内容发送到 Apache OpenNLP。 I am wondering if I can use ActiveMQ to read the MS word so that I can trigger a process to Apache Kafka to process the stream.我想知道我是否可以使用 ActiveMQ 读取 MS word,以便我可以触发一个进程到 Apache Kafka 来处理 stream。

Any suggestion to stream the PDF or word other than ActiveMQ is welcome.欢迎任何建议给 stream PDF 或 ActiveMQ 以外的词。

If you use ActiveMQ "Classic" (ie any 5.x version) you'll have problems moving large messages as there's no real support for that use-case.如果您使用 ActiveMQ“Classic”(即任何 5.x 版本),您将在移动大消息时遇到问题,因为没有对该用例的真正支持。 However, ActiveMQ Artemis (ie ActiveMQ's next-gen broker) has support for arbitrarily large messages which would facilitate your use-case.但是, ActiveMQ Artemis (即 ActiveMQ 的下一代代理)支持任意大的消息,这将有助于您的用例。 The nice thing about having large message support in the broker is that you don't have to involve some other kind of storage mechanism in your solution.在代理中支持大消息的好处是您不必在解决方案中涉及其他类型的存储机制。 That makes development and maintenance of your application and environment a bit simpler.这使得您的应用程序和环境的开发和维护更加简单。

Message queues generally shouldn't be used for file transfer.消息队列通常不应用于文件传输。 Put the files in blob storage like S3, then send the URI between clients (eg "s3://bucket/file.txt" ), and download and process elsewhere... Other option is to use Apache POI or similar tools in the producer client to parse your files, then send that data in whatever format you want (JSON, Avro, or Protobuf, are generally used more often in streaming tools than XML)将文件放在像 S3 这样的 blob 存储中,然后在客户端之间发送 URI(例如"s3://bucket/file.txt" ),然后在其他地方下载和处理......其他选择是使用 Apache POI 或类似工具生产者客户端解析你的文件,然后以你想要的任何格式发送该数据(JSON、Avro 或 Protobuf,通常在流媒体工具中比 XML 更常用)

Actual file processing has nothing to do with the queue technology used实际文件处理与使用的队列技术无关

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 保留来自 MS Word 文档的 PDF 导出文档中的引文链接 - Preserve citation links in PDF export document from MS word document 使用MS Word doc作为pdf的样板模板 - Use MS Word doc as boilerplate template for pdf 为什么用SQL Server搜索MS Word和PDF中的内容不正确? - Why is searching content in MS Word and PDF with SQL Server not accurate? 我可以使用哪个PDF API来获取具有动态字段的文档? - Which PDF API can i use to get a document with dynamic fields? MS Word 将每个页面保存为单独的 PDF 文档,并在文档中以特定文本命名 - MS Word Save each page as separate PDF document named with specific text within the document 如何提取嵌入在 Word 文档中的所有“pdf”文件并将它们保存到文件夹中? - How can I extract all the ‘pdf’ files embedded in word document and save them in to a folder? 如何在没有office.word.interop C#的情况下将带有图表的MS Word文档转换为PDF - How to convert MS word document with chart in it to PDF without office.word.interop c# 如何修复使用 MS2XML.XMLHTTP 请求的 PDF 文件中的损坏? - How can I fix the damage in PDF files requested using MS2XML.XMLHTTP? 将word文档转换为pdf - convert word document to pdf 我需要一个(最好是免费的)PDF / Word生成器.Net组件,可以从文档模板中使用 - I need a (preferably free) PDF/Word generator .Net component that can work from a document template
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM