简体   繁体   English

使用Akka Actor的文件操作

[英]File Operations using Akka Actor

What is the advantage using Akka Actor over normal File operation method?. 与常规的File操作方法相比,使用Akka Actor有什么优势? I tried to calculate the time taken to analyze a log file. 我试图计算分析日志文件所花费的时间。 The operation is to find the IP addresses which have logged on more than 50 times and display them. 该操作是查找已登录50次以上的IP地址并显示它们。 Normal file operation was faster when compared to Akka Actor model. 与Akka Actor模型相比,正常文件操作更快。 Why so? 为什么这样?

Using normal file operation 使用普通文件操作

public static void main(String[] args) {
        // TODO Auto-generated method stub
        //long startTime = System.currentTimeMillis();
        File file = new File("log.txt");
        Map<String, Long> ipMap = new HashMap<>();

        try {

                FileReader fr = new FileReader(file);
                BufferedReader br = new BufferedReader(fr);
                String line = br.readLine();

                while(line!=null) {
                    int idx = line.indexOf('-');
                    String ipAddress = line.substring(0, idx).trim();
                    long count = ipMap.getOrDefault(ipAddress, 0L);
                    ipMap.put(ipAddress, ++count);
                    line = br.readLine();
                }

                 System.out.println("================================");
                 System.out.println("||\tCount\t||\t\tIP");
                 System.out.println("================================");

                 fr.close();
                 br.close();
                 Map<String, Long> result = new HashMap<>();

                    // Sort by value and put it into the "result" map
                    ipMap.entrySet().stream()
                            .sorted(Map.Entry.<String, Long>comparingByValue().reversed())
                            .forEachOrdered(x -> result.put(x.getKey(), x.getValue()));

                    // Print only if count > 50
                    result.entrySet().stream().filter(entry -> entry.getValue() > 50).forEach(entry ->
                        System.out.println("||\t" + entry.getValue() + "   \t||\t" + entry.getKey())
                    );

//                  long endTime = System.currentTimeMillis();
//                  System.out.println("Time: "+(endTime-startTime));

            } catch (FileNotFoundException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

    }

Using Actors:
1. The Main Class
 public static void main(String[] args) {
        long startTime = System.currentTimeMillis();
        // Create actorSystem
        ActorSystem akkaSystem = ActorSystem.create("akkaSystem");

        // Create first actor based on the specified class
        ActorRef coordinator = akkaSystem.actorOf(Props.create(FileAnalysisActor.class));

        // Create a message including the file path
        FileAnalysisMessage msg = new FileAnalysisMessage("log.txt");

        // Send a message to start processing the file. This is a synchronous call using 'ask' with a timeout.
        Timeout timeout = new Timeout(6, TimeUnit.SECONDS);
        Future<Object> future = Patterns.ask(coordinator, msg, timeout);

        // Process the results
        final ExecutionContext ec = akkaSystem.dispatcher();
        future.onSuccess(new OnSuccess<Object>() {
            @Override
            public void onSuccess(Object message) throws Throwable {
                if (message instanceof FileProcessedMessage) {
                    printResults((FileProcessedMessage) message);

                    // Stop the actor system
                    akkaSystem.shutdown();
                }
            }

            private void printResults(FileProcessedMessage message) {
                System.out.println("================================");
                System.out.println("||\tCount\t||\t\tIP");
                System.out.println("================================");

                Map<String, Long> result = new LinkedHashMap<>();

                // Sort by value and put it into the "result" map
                message.getData().entrySet().stream()
                        .sorted(Map.Entry.<String, Long>comparingByValue().reversed())
                        .forEachOrdered(x -> result.put(x.getKey(), x.getValue())); 

                // Print only if count > 50
                result.entrySet().stream().filter(entry -> entry.getValue() > 50).forEach(entry ->
                    System.out.println("||\t" + entry.getValue() + "   \t||\t" + entry.getKey())
                );
                long endTime = System.currentTimeMillis();
                System.out.println("Total time: "+(endTime - startTime));
            }

        }, ec);

    }

2.File Analyser Class 2.文件分析器类

public class FileAnalysisActor extends UntypedActor {

    private Map<String, Long> ipMap = new HashMap<>();
    private long fileLineCount;
    private long processedCount;
    private ActorRef analyticsSender = null;

    @Override
    public void onReceive(Object message) throws Exception {
        /*
            This actor can receive two different messages, FileAnalysisMessage or LineProcessingResult, any
            other type will be discarded using the unhandled method
         */
            //System.out.println(Thread.currentThread().getName());
        if (message instanceof FileAnalysisMessage) {

            List<String> lines = FileUtils.readLines(new File(
                    ((FileAnalysisMessage) message).getFileName()));

            fileLineCount = lines.size();
            processedCount = 0;

            // stores a reference to the original sender to send back the results later on
            analyticsSender = this.getSender();

            for (String line : lines) {
                // creates a new actor per each line of the log file
                Props props = Props.create(LogLineProcessor.class);
                ActorRef lineProcessorActor = this.getContext().actorOf(props);

                // sends a message to the new actor with the line payload
                lineProcessorActor.tell(new LogLineMessage(line), this.getSelf());
            }

        } else if (message instanceof LineProcessingResult) {

            // a result message is received after a LogLineProcessor actor has finished processing a line
            String ip = ((LineProcessingResult) message).getIpAddress();

            // increment ip counter
            Long count = ipMap.getOrDefault(ip, 0L);
            ipMap.put(ip, ++count);

            // if the file has been processed entirely, send a termination message to the main actor
            processedCount++;
            if (fileLineCount == processedCount) {
                // send done message
                analyticsSender.tell(new FileProcessedMessage(ipMap), ActorRef.noSender());
            }

        } else {
            // Ignore message
            this.unhandled(message);
        }
    }
}

3.Logline Processor Class 3,Logline处理器类

public class LogLineProcessor extends UntypedActor {

    @Override
    public void onReceive(Object message) throws Exception {
        if (message instanceof LogLineMessage) {
            // What data each actor process?
            //System.out.println("Line: " + ((LogLineMessage) message).getData());
            // Uncomment this line to see the thread number and the actor name relationship
           //System.out.println("Thread ["+Thread.currentThread().getId()+"] handling ["+ getSelf().toString()+"]");

            // get the message payload, this will be just one line from the log file
            String messageData = ((LogLineMessage) message).getData();

            int idx = messageData.indexOf('-');
            if (idx != -1) {
                // get the ip address
                String ipAddress = messageData.substring(0, idx).trim();

                // tell the sender that we got a result using a new type of message
                this.getSender().tell(new LineProcessingResult(ipAddress), this.getSelf());
            }
        } else {
            // ignore any other message type
            this.unhandled(message);
        }
    }
}

Message Classes 讯息类别

  1. FileAnalysis Message 文件分析消息

    public class FileAnalysisMessage { 公共类FileAnalysisMessage {

     private String fileName; public FileAnalysisMessage(String file) { this.fileName = file; } public String getFileName() { return fileName; } 

    } }

2.File Processed Message 2.文件处理的消息

public class FileProcessedMessage {

    private Map<String, Long> data;

    public FileProcessedMessage(Map<String, Long> data) {
        this.data = data;
    }

    public Map<String, Long> getData() {
        return data;
    }
}
  1. LineProcessing Result 线处理结果

    public class LineProcessingResult { 公共类LineProcessingResult {

     private String ipAddress; public LineProcessingResult(String ipAddress) { this.ipAddress = ipAddress; } public String getIpAddress() { return ipAddress; } 

    } }

4.Logline Message 4.Logline消息

public class LogLineMessage {

    private String data;

    public LogLineMessage(String data) {
        this.data = data;
    }

    public String getData() {
        return data;
    }
}

I am creating an actor for each line in the file. 我正在为文件中的每一行创建一个actor。

With all concurrency frameworks there is always a trade-off between the amount of concurrency that is deployed vs. the complexity involved for each unit of concurrency. 对于所有并发框架,在部署的并发量与每个并发单元所涉及的复杂性之间总会有一个权衡。 Akka is no exception. Akka也不例外。

In your non-akka approach you have a relatively simple sequence of steps for each line: 在非akka方法中,每一行都有相对简单的步骤序列:

  1. read a line from the file 从文件中读取一行
  2. split the line by '-' 用“-”分隔行
  3. submit the ip address into a hashmap & increment the count 将IP地址提交到哈希图中并增加计数

By comparison, your akka approach is much more complicated for each line: 相比之下,每行的akka​​方法要复杂得多:

  1. create an Actor 创建一个演员
  2. create a LogLineMessage message 创建一个LogLineMessage消息
  3. send the message to the actor 发送消息给演员
  4. split the line by '-' 用“-”分隔行
  5. create a LineProcessingResult message 创建LineProcessingResult消息
  6. send the message back to the coordinating actor 将消息发送回协调角色
  7. submit the ip address into a hashmap & increment the count 将IP地址提交到哈希图中并增加计数

If we naively assumed each of the above steps took the same amount of time then you would need 2 threads with akka just to run at the same speed as 1 thread without akka. 如果我们天真地假设上述每个步骤花费相同的时间,那么您将需要2个具有akka的线程才能以与1个没有akka的线程相同的速度运行。

Make Each Concurrency Unit Do More Work 让每个并发单元做更多的工作

Instead of having 1 Actor per 1 line, have each actor process N lines into its own sub-hashmap (eg each Actor processes 1000 lines): 而不是每1行有1个Actor ,而是让每个Actor将N行处理到其自己的子哈希图中(例如,每个Actor处理1000行):

public class LogLineMessage {

    private String[] data;

    public LogLineMessage(String[] data) {
        this.data = data;
    }

    public String[] getData() {
        return data;
    }
}

Then the Actor wouldn't be sending back something as simple as the IP address. 这样,Actor就不会发送回像IP地址这样简单的内容。 Instead it will send a hash of counts for its subset of lines: 相反,它将发送其行子集的计数哈希:

public class LineProcessingResult {

    private HashMap<String, Long> ipAddressCount;

    public LineProcessingResult(HashMap<String, Long> count) {
        this.ipAddressCount = Count;
    }

    public HashMap<String, Long> getIpAddress() {
        return ipAddressCount;
    }
}

And the coordinating Actor can be responsible for combining all of the various sub-counts: 并且协调Actor可以负责合并所有各个子计数:

//inside of FileAnalysisActor
else if (message instanceof LineProcessingResult) {
    HashMap<String,Long>  localCount = ((LineProcessingResult) message).getIpAddressCount();

    localCount.foreach((ipAddress, count) -> {
        ipMap.put(ipAddress, ipMap.getOrDefault(ipAddress, 0L) + count);
    })

You can then vary N to see where you get peak performance for your particular system. 然后,您可以改变N来查看特定系统的最佳性能。

Don't Read the Whole File Into Memory 不要将整个文件读入内存

One other disadvantage that your concurrent solution has is that it is first reading the entire file into memory. 并发解决方案的另一个缺点是,它首先将整个文件读入内存。 This is unnecessary and taxing for the JVM. 这是不必要的,并且对JVM造成了负担。

Instead, read the file N lines at a time. 而是一次读取文件N行。 Once you have those lines in memory spawn off the Actor as mentioned earlier. 一旦将这些行存储在内存中,就会如前所述从Actor中生成。

FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);

String[] lineBuffer;
int bufferCount = 0;
int N = 1000;

String line = br.readLine();

while(line!=null) {
    if(0 == bufferCount)
      lineBuffer = new String[N];
    else if(N == bufferCount) {
      Props props = Props.create(LogLineProcessor.class);
      ActorRef lineProcessorActor = this.getContext().actorOf(props);

      lineProcessorActor.tell(new LogLineMessage(lineBuffer),
                              this.getSelf());

      bufferCount = 0;
      continue;
    }

    lineBuffer[bufferCount] = line;
    br.readLine();
    bufferCount++;
}

//handle the final buffer
if(bufferCount > 0) {
    Props props = Props.create(LogLineProcessor.class); 
    ActorRef lineProcessorActor = this.getContext().actorOf(props);

    lineProcessorActor.tell(new LogLineMessage(lineBuffer),
                            this.getSelf());
}

This will allow for File IO, line processing, and sub-map combining to all run in parallel. 这将允许文件IO,行处理和子图组合全部并行运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM