简体   繁体   English

如何使编写方法线程安全?

[英]How to make writing method thread safe?

I have multiple threads to call one method in writing contents from an object to file, as below: When I use 1 thread to test this method, the output into my file is expected. 我有多个线程来调用一种方法,以将对象中的内容写入文件,如下所示:当我使用1个线程来测试此方法时,将输出到我的文件中。 However, for multiple threads, the output into the file is messy. 但是,对于多个线程,文件中的输出是混乱的。 How to make this thread safe? 如何使该线程安全?

void (Document doc, BufferedWriter writer){
       Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
       for(Sentence sentence : matrix.keySet()){
           Set<Matrix> set = doc.getMatrix(sentence);
           for(Matrix matrix : set){
               List<Result> results = ResultGenerator.getResult();
               writer.write(matrix, matrix.frequency());
               writer.write(results.toString());
               writer.write("\n");
           }
       }
}

Edit: 编辑:

I added this line List<Result> results = ResultGenerator.getResult() . 我在此行添加了List<Result> results = ResultGenerator.getResult() What I really want is to use multiple threads to process this method call, since this part is expensive and takes a lot of time. 我真正想要的是使用多个线程来处理此方法调用,因为这部分很昂贵并且需要很多时间。 The writing part is very quick, I don't really need multiple threads. 写作部分很快,我真的不需要多个线程。

Given this change, is there a way to make this method call safe in concurrent environment? 有了此更改,是否有一种方法可以使该方法在并发环境中安全调用?

Essentially, you are limited by single file at the end. 本质上,最后您受单个文件的限制。 There are no global variables and it publishes nothing, so the method is thread safe. 没有全局变量,并且不发布任何内容,因此该方法是线程安全的。

But, if processing does take a lot of time, you can use parallelstreams and publish the results to concurrenthashmap or a blocking queue. 但是, 如果处理确实花费大量时间,则可以使用并行流并将结果发布到并发哈希图或阻塞队列。 You would however still have a single consumer to write to the file. 但是,您仍然只有一个使用者来写入文件。

If you need the final file in a predetermined sequential order, do not multithread, or you will not get what you expect. 如果您需要按预定的顺序获取最终文件,请不要使用多线程,否则将无法获得预期的结果。

If you think that with multithreading your program will execute faster in regards to I/O output, you are likely mistaken; 如果您认为使用多线程可以使程序在I / O输出方面执行得更快,那么您可能会误会; because of locking or overhead due to synchronisation, you will actually get degraded performance than a single thread. 由于同步导致的锁定或开销,实际上,与单个线程相比,您将获得性能下降。

If you trying to write a very big file, the ordering of Document instances is not relevant, and you think your writer method will hit a CPU bottleneck instead (but the only possible cause I can figure out from our code is the frequency() method call), what you can do is having each thread hold its own BufferedWriter that writes to a temporary file, and then add an additional thread that waits for all, then generates the final file using concatenation. 如果您尝试编写一个非常大的文件,则Document实例的顺序无关紧要,并且您认为您的writer方法将遇到CPU瓶颈(但是我可以从我们的代码中找出的唯一可能原因是frequency()方法调用),您可以做的就是让每个线程都拥有自己的BufferedWriter,该BufferedWriter写入一个临时文件,然后添加一个等待所有线程的附加线程,然后使用串联生成最终文件。

I am not well versed in Java so I am going to provide a language-agnostic answer. 我不精通Java,所以我将提供与语言无关的答案。

What you want to do is to transform matrices into results, then format them as string and finally write them all into the stream. 您要做的是将矩阵转换为结果,然后将其格式化为字符串,最后将它们全部写入流中。

Currently you are writing into the stream as soon as you process each result, so when you add multi threads to your logic you end up with racing conditions in your stream. 当前,您在处理每个结果时就立即将其写入流中,因此,当向逻辑中添加多线程时,最终会在流中产生竞争条件。

You already figured out that only the calls for ResultGenerator.getResult() should be done in parallel whilst the stream still need to be accessed sequentially. 您已经弄清楚,仅应并行执行ResultGenerator.getResult()的调用,而仍然需要按顺序访问流。

Now you only need to put this in practice. 现在,您只需要实践一下即可。 Do it in order: 按顺序执行:

  • Build a list where each item is what you need to generate a result 建立一个列表,其中每个项目都是生成结果所需要的
  • Process this list in parallel thus generating all results (this is a map operation). 并行处理此列表,从而生成所有结果(这是一个map操作)。 Your list of items will become a list of results. 您的项目列表将成为结果列表。
  • Now you already have your results so you can iterate over them sequentially to format and write them into the stream. 现在您已经有了结果,因此可以顺序地对其进行迭代以格式化并将其写入流中。

I suspect the Java 8 provides some tools to make everything in a functional-way, but as said I am not a Java guy so I cannot provide code samples. 我怀疑Java 8提供了一些工具来以功能方式实现所有功能,但是正如我所说的,我不是Java专家,所以我无法提供代码示例。 I hope this explanation will suffice. 我希望这种解释就足够了。

@edit @编辑

This sample code in F# explains what I meant. F#中的此示例代码解释了我的意思。

open System

// This is a pretty long and nasty operation!
let getResult doc =
    Threading.Thread.Sleep(1000)
    doc * 10

// This is writing into stdout, but it could be a stream...
let formatAndPrint =
    printfn "Got result: %O"

[<EntryPoint>]
let main argv =
    printfn "Starting..."

    [| 1 .. 10 |] // A list with some docs to be processed
    |> Array.Parallel.map getResult // Now that's doing the trick
    |> Array.iter formatAndPrint

    0

I'd make it synchronized. 我会使其同步。 In that case, only one thread in your application is allowed to call this method at the same time => No messy output. 在这种情况下,您的应用程序中只有一个线程被允许同时调用此方法=>没有混乱的输出。 If you have multiple applications running, you should consider something like file locking. 如果有多个应用程序在运行,则应考虑使用文件锁定之类的方法。

Example for a synchronized method: 同步方法的示例:

public synchronized void myMethod() {
    // ...
}

This method is exclusive for each thread. 此方法是每个线程专用的。

You could lock down a method and then unlock it when you are finished with it. 您可以锁定一个方法,然后在完成后将其解锁。 By putting synchronized before a method, you make sure only one thread at a time can execute it. 通过在方法前放置同步,可以确保一次只能执行一个线程。 Synchronizing slows down Java, so it should only be used when necessary. 同步会降低Java的速度,因此只能在必要时使用它。

ReentrantLock lock = new ReentrantLock();

 /* synchronized */ 
public void run(){

    lock.lock();

    System.out.print("Hello!");

    lock.unlock();

 }

This locks down the method just like synchronized. 就像同步一样,这将锁定方法。 You can use it instead of synchronized, that's why synchronized is commented out above. 您可以使用它而不是已同步,这就是为什么已在上面注释了已同步的原因。

If your code is using distinct doc and writer objects, then your method is already thread-safe as it does not access and use instance variables. 如果您的代码使用的是不同的doc和writer对象,则您的方法已经是线程安全的,因为它不访问和使用实例变量。

If you are writing passing the same writer object to the method, you could use one of these approaches, depending on your needs: 如果您要编写将同一writer对象传递给方法的方法,则可以根据需要使用以下方法之一:

void (Document doc, BufferedWriter writer){
       Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
       for(Sentence sentence : matrix.keySet()){
           Set<Matrix> set = doc.getMatrix(sentence);
           for(Matrix matrix : set){
               List<Result> results = ResultGenerator.getResult();

               // ensure that no other thread interferes while the following
               // three .write() statements are executed.
               synchronized(writer) {
                   writer.write(matrix, matrix.frequency()); // from your example, but I doubt it compiles
                   writer.write(results.toString());
                   writer.write("\n");
               }
           }
       }
}

or lock-free with using a temporary StringBuilder object: 或使用临时StringBuilder对象无锁定:

void (Document doc, BufferedWriter writer){
       Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
       StringBuilder sb = new StringBuilder();
       for(Sentence sentence : matrix.keySet()){
           Set<Matrix> set = doc.getMatrix(sentence);
           for(Matrix matrix : set){
               List<Result> results = ResultGenerator.getResult();
               sb.append(matrix).append(matrix.frequency());
               sb.append(results.toString());
               sb.append("n");
           }
       }
       // write everything at once
       writer.write(sb.toString();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM