简体   繁体   English

clojure core.async-意外的不一致

[英]clojure core.async - unexpected inconsistencies

haven't done any Clojure for couple years, so decided to go back and not ignore core.async this time around ) pretty cool stuff, that - but it surprised me almost immediately. 几年没有做过Clojure了,所以决定回去,不要忽略core.async),这很酷,但这几乎让我感到惊讶。 Now, I understand that there's inherent indeterminism when multiple threads are involved, but there's something bigger than that at play here. 现在,我了解到涉及多个线程时存在固有的不确定性,但是这里所发挥的作用远大于此。

The source code for my oh-so-simple example, where I am trying to copy lines from STDIN to a file: 我的简单示例的源代码,其中我试图将行从STDIN复制到文件中:

(defn append-to-file
  "Write a string to the end of a file"
  ([filename s]
   (spit filename (str s "\n")
         :append true))
  ([s]
   (append-to-file "/tmp/journal.txt" s)))

(defn -main
  "I don't do a whole lot ... yet."
  [& args]
  (println "Initializing..")
  (let [out-chan (a/chan)]
    (loop [line (read-line)]
      (if (empty? line) :ok
          (do
            (go (>! out-chan line))
            (go (append-to-file (<! out-chan)))
            (recur (read-line)))))))

except, of course, this turned out to be not so simple. 当然,除了事实并非如此简单。 I think I've narrowed it down to something that's not properly cleaned up. 我想我已经将范围缩小到东西是不正确清理。 Basically, running the main function produces inconsistent results. 基本上,运行main函数会产生不一致的结果。 Sometimes I run it 4 times, and see 12 lines in the output. 有时我运行了4次,在输出中看到12行。 But sometimes, 4 run will produce just 10 lines. 但是有时候,4次运行只会产生10行。 Or like below, 3 times, 6 lines: 或像下面这样3次6行:

akamac.home ➜  coras git:(master) ✗ make clean
cat /dev/null > /tmp/journal.txt
lein clean
akamac.home ➜  coras git:(master) ✗ make compile
lein uberjar
Compiling coras.core
Created /Users/akarpov/repos/coras/target/uberjar/coras-0.1.0-SNAPSHOT.jar
Created /Users/akarpov/repos/coras/target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar
akamac.home ➜  coras git:(master) ✗ make run    
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜  coras git:(master) ✗ make run
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜  coras git:(master) ✗ make run
java -jar target/uberjar/coras-0.1.0-SNAPSHOT-standalone.jar < resources/input.txt
Initializing..
akamac.home ➜  coras git:(master) ✗ make check  
cat /tmp/journal.txt
line a
line z
line b
line a
line b
line z

(Basically, sometimes a run produced 3 lines, sometimes 0, sometimes 1 or 2). (基本上,有时运行产生3条线,有时0条,有时1条或2条线)。 The fact that lines appear in random order doesn't bother me - go blocks do things in a concurrent/threaded manner, and all bets are off. 线条以随机顺序出现的事实并不困扰我-go块以并发/线程方式处理事情,所有赌注都没有了。 But why they don't do all of work all the time? 但是,为什么他们不这样做的所有工作, 所有的时间? (Because I am misusing them somehow, but where?) Thanks! (因为我以某种方式滥用了它们,但是在哪里?)谢谢!

There's many problems with this code, let me walk through them real quick: 这段代码有很多问题,让我快速地逐步解决它们:

1) Every time you call (go ...) you're spinning of a new "thread" that will be executed in a thread pool. 1)每次调用(go ...)您正在旋转的新“线程”将在线程池中执行。 It is undefined when this thread will run. 该线程何时运行尚不确定。

2) You aren't waiting for the completion of these threads, so it's possible (and very likely) that you will end up reading several lines form the file, writing several lines to the channel, before a read even occurs. 2)您不必等待这些线程的完成,因此有可能(很可能)最终会在读取偶数发生之前读取文件中的多行,并将多行写入通道。

3) You are firing off multiple calls to append-to-file at the same time (see #2) these functions are not synchronized, so it's possible that multiple threads will append at once. 3)您正在同时触发对append-to-file多次调用(请参阅#2),这些功能未同步,因此有可能多个线程一次追加。 Since access to files in most OSes is uncoordinated it's possible for two threads to write to your file at the same time, overwriting eachother's results. 由于大多数操作系统中对文件的访问是不协调的,因此两个线程可以同时写入文件,从而覆盖彼此的结果。

4) Since you are creating a new go block for every line read, it's possible they will execute in a different order than you expect, this means the lines in the output file may be out of order. 4)由于您正在为读取的每一行创建一个新的go块,因此它们执行的顺序可能与您期望的顺序不同,这意味着输出文件中的行可能会乱序。

I think all this can be fixed a bit by avoiding a rather common anti-pattern with core.async: don't create go blocks (or threads) inside unbounded or large loops. 我认为,可以通过避免使用core.async一个非常常见的反模式来解决所有这些问题:不要在无界或大循环内创建go块(或线程)。 Often this is doing something you don't expect. 通常这会做您不期望的事情。 Instead create one core.async/thread with a loop that reads from the file (since it's doing IO, never do IO inside a go block) and writes to the channel, and one that reads from the channel and writes to the output file. 而是创建一个core.async/thread并带有一个循环,该循环从文件中读取(因为它正在执行IO,因此永远不要在go块内执行IO)并写入通道,而另一个则从通道读取并写入输出文件。

View this as an assembly line build out of workers ( go blocks) and conveyor belts (channels). 将其视为由工人( go块)和传送带(通道)组成的装配线。 If you built a factory you wouldn't have a pile of people and pair them up saying "you take one item, when you're done hand it to him". 如果您建了一家工厂,您将不会有一堆人,将他们配对成对:“拿完一件东西,交给他,就把它拿走。” Instead you'd organize all the people once, with conveyors between them and "flow" the work (or data) between the workers. 取而代之的是,您一次组织所有人员,在他们之间使用传送带,使工作(或数据)在工人之间“流动”。 Your workers should be static, and your data should be moving. 您的工作人员应该是静态的,而您的数据应该在移动。

.. and of course, this was a misuse of core.async on my part: ..当然,这对我来说是对core.async的滥用:

If I care about seeing all the data in the output, I must use a blocking 'take' on the channel, when I want to pass the value to my I/O code -- and, as it was pointed out, that blocking call should not be inside a go block. 如果我想查看输出中的所有数据,则当我想将值传递给我的I / O代码时, 必须在通道上使用阻塞的 “获取”功能-正如所指出的那样,该阻塞调用不应放在go块中。 A single line change was all I needed: 我只需要更改一行即可:

from: 从:

(go (append-to-file (<! out-chan)))

to: 至:

(append-to-file (<!! out-chan))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM