简体   繁体   English

使用Akka进行文件处理?

[英]File Processing with Akka?

This is rather a design problem. 这是一个设计问题。 I don't know how to achieve this in Akka 我不知道如何在Akka实现这一目标

User Story 用户的故事
- I need to parse big files (> 10 million lines) which look like -我需要解析看起来像的大文件(> 1000万行)

2013-05-09 11:09:01 Local4.Debug    172.2.10.111    %MMT-7-715036: Group = 199.19.248.164, IP = 199.19.248.164, Sending keep-alive of type DPD R-U-THERE (seq number 0x7db7a2f3)
2013-05-09 11:09:01 Local4.Debug    172.2.10.111    %MMT-7-715046: Group = 199.19.248.164, IP = 199.19.248.164, constructing blank hash payload
2013-05-09 11:09:01 Local4.Debug    172.2.10.111    %MMT-7-715046: Group = 199.19.248.164, IP = 199.19.248.164, constructing qm hash payload
2013-05-09 11:09:01 Local4.Debug    172.2.10.111    %ASA-7-713236: IP = 199.19.248.164, IKE_DECODE SENDING Message (msgid=61216d3e) with payloads : HDR + HASH (8) + NOTIFY (11) + NONE (0) total length : 84
2013-05-09 11:09:01 Local4.Debug    172.22.10.111   %MMT-7-713236: IP = 199.19.248.164, IKE_DECODE RECEIVED Message (msgid=867466fe) with payloads : HDR + HASH (8) + NOTIFY (11) + NONE (0) total length : 84
  • For each line I need to generate some Event that will be sent to server. 对于每一行,我需要生成一些将发送到服务器的Event

Question
- How can I read this log file efficiently in Akka model? -如何在Akka模型中有效读取此日志文件? I read that reading a file synchronously is better because of less magnetic tape movement. 我读到,由于磁带移动较少,因此同步读取文件会更好。
- In that case, there could be FileReaderActor per file, that would read each line and send them for processing to lets say EventProcessorRouter and Router may have many actors working on line (from file) and creating Event . -在这种情况下,每个文件可能有FileReaderActor ,它将读取每一行并将其发送以进行处理,例如EventProcessorRouterRouter可能有许多actor line (从文件中)并创建Event There would be 1 Event per line line会有1个Event
- I was also thinking of sending Event s in batch to avoid too much data transfer in network. -我还考虑批量发送Event ,以避免在网络中传输过多数据。 In such cases, where shall I keep accumulating these Events ? 在这种情况下,我应该在哪里继续累积这些Events and How would I know if I all Events are generated from inputFile ? 和我怎么知道是否所有Events都是从inputFile生成的?

Thanks 谢谢

I think I know what your asking, your basically saying that if you read and proccess a file in the mannor you are describing you risk having a massive amount of messages if the proccessing takes significantly longer than the reading. 我想我知道您的询问,您的基本意思是,如果您阅读并处理了文件格式中的文件,则说明您要冒大量消息的风险,如果处理时间比阅读时间长得多。 Also if you are messaging over the network ideally you would want to minimize the amount of messages to send. 同样,如果理想情况下是通过网络进行消息传递,则希望将发送的消息量减到最少。 If your lines don't take long to process then I wouldn't send them to be processed over the network. 如果您的线路处理时间不长,那么我不会通过网络发送它们。 Have you considered using futures instead? 您是否考虑过使用期货? Don't know if you case is as simple as Parallel File Processing: What are recommended ways? 不知道您的情况是否像并行文件处理那么简单:建议的方法是什么? in that case you should use streams. 在这种情况下,您应该使用流。 But I think the thing is with actors although they are good for throttling their main purpose is to wrap up state, and you don't have that so much with proccessing a file. 但是我认为问题在于演员,尽管他们擅长限制其主要目的是包装状态,而且处理文件没有那么多。 Maybe you would be better off with futures, I show an example of that here Executing Dependent tasks in parallel in Java . 也许您会更好地使用期货,我在这里以Java并行执行相关任务的示例为例。 But you could use actors like you say and have the processing actors communicate with the reader actor and tell it to stop reading for lets say a second as soon as the number of messages waiting to be processed exceeds 1000000 or however many you decide. 但是,您可以像说的那样使用参与者,并让处理参与者与阅读者参与者进行交流,并在等待处理的消息数量超过1000000或您决定的消息数量过多时,告诉处理者停止阅读,比如说一秒钟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM