简体繁体 English

并发和多线程

[英]Concurrency and Multithreading

原文 2010-01-03 22:46:14 6 9 multithreading/ concurrency/ process

I'm not very experienced with subjects such as Concurrency and Multithreading. 我对并发和多线程等主题不太熟悉。 In fact, in most of my web-development career I had never needed to touch these subjects. 事实上，在我的大部分网络开发生涯中，我从未需要触及这些主题。

I feel like it's an important concept, especially for Desktop applications and basically any other application that doesn't generate HTML :). 我觉得这是一个重要的概念，特别是对于桌面应用程序，基本上任何其他不生成HTML的应用程序:)。

After reading a bit on concurrency, it seems to be better supported in languages like Go (google programming language) and I don't quite understand why would a language be better than others at a concept like concurrency , since it's basically about being able to fork() processes and compute stuff in parallel, right? 在阅读了并发性的一点后，似乎在Go（谷歌编程语言）这样的语言中得到了更好的支持，我不太明白为什么语言在并发等概念上比其他语言更好，因为它基本上是关于能够fork（）并行处理和计算东西，对吧？ Isn't this how programming works? 编程是如何工作的？

Multithreading seems to be a branch of concurrency as it allows you to run things in parallel beneath the same process, although it seems to be platform specific how it's implemented. 多线程似乎是并发的一个分支，因为它允许您在同一进程下并行运行事物，尽管它似乎是特定于平台的如何实现它。

I guess my question is, why would specific languages be better at concurrency than others and why would fork()ing processes be a better solution rather than just using threads? 我想我的问题是， 为什么特定语言在并发性方面比其他语言更好？为什么fork（）进程是一个更好的解决方案而不仅仅是使用线程？

9 个解决方案

Well for one thing, multiple threads are not the same as multiple processes, so fork() really does not apply here. 好吧，有一件事，多个线程与多个进程不同，所以fork（）实际上并不适用于此。

Multithreading/parallel processing is hard . 多线程/并行处理很难。 First you have to figure out how to actually partition the task to be done. 首先，您必须弄清楚如何实际划分要完成的任务。 Then you have to coordinate all of the parallel bits, which may need to talk to each other or share resources. 然后，您必须协调所有并行位，这些位可能需要相互通信或共享资源。 Then you need to consolidate the results, which in some cases can be every bit as difficult as the previous two steps. 然后，您需要合并结果，在某些情况下，结果可能与前两个步骤一样困难。 I'm simplifying here, but hopefully you get the idea. 我在这里简化了，但希望你能得到这个想法。

So your question is, why would some languages be better at it? 所以你的问题是，为什么有些语言会更好？ Well, several things can make it easier: 好吧，有几件事可以让它变得更容易：

Optimized immutable data structures. 优化的不可变数据结构。 You want to stick to immutable structures whenever possible in parallel processing, because they are much easier to reason about. 您希望在并行处理中尽可能坚持不可变结构，因为它们更容易推理。 Some languages have better support for these, and some have various optimizations, ie the ability to splice collections together without any actual copying while still enforcing the immutability. 有些语言对这些语言有更好的支持，有些语言有各种优化，即能够在没有任何实际复制的情况下将集合拼接在一起，同时仍然强制执行不变性。 You can always build your own structures like these, but it's easier if the language or framework does it for you. 您可以随时构建自己的结构，但如果语言或框架为您完成，则会更容易。
Synchronization primitives and ease of using them. 同步原语和使用它们的便利性。 When different threads do share state, they need to be synchronized and there are many different ways to accomplish this. 当不同的线程共享状态时，它们需要同步，并且有许多不同的方法来实现这一点。 The wider the array of sync primitives you get, the easier your task will ultimately be. 您获得的同步原语数组越宽，您的任务最终将越容易。 Performance will take a hit if you have to sync with a critical section instead of a reader-writer lock. 如果您必须与关键部分而不是读写器锁定同步，性能将受到影响。
Atomic transactions. 原子交易。 Even better than a wide array of sync primitives is not having to use them at all. 甚至比多种同步原语更好也不必使用它们。 Database engines are very good at this; 数据库引擎非常擅长这一点; instead of you, the programmer, having to figure out exactly which resources you need to lock and when and how, you just say to the compiler or interpreter, "all of the stuff below this line needs to happen together, so make sure nobody else messes around with it while I'm using it." 而不是你，程序员，必须弄清楚你需要锁定哪些资源以及何时以及如何，你只需对编译器或解释器说，“这条线下的所有东西都需要一起发生，所以确保没有其他人当我使用它时，它会被它弄乱。“ And the engine will figure out the locking for you. 引擎会为你找出锁定。 You almost never get this kind of simplicity in an abstract programming language, but the closer you can come, the better. 你几乎从来没有在抽象编程语言中获得这种简单性，但你越接近越好。 Thread-safe objects that combine multiple common operations into one are a start. 将多个常见操作合并为一个的线程安全对象是一个开始。
Automatic parallelism. 自动并行。 Let's say you have to iterate through a long list of items and transform them somehow, like multiply 50,000 10x10 matrices. 假设您必须遍历一长串项目并以某种方式对其进行转换，例如乘以50,000个10x10矩阵。 Wouldn't it be nice if you could just tell the compiler: Hey, each operation can be done independently, so use a separate CPU core for each one? 如果你能告诉编译器，那不是很好吗： 嘿，每个操作都可以独立完成，所以每个操作都使用一个单独的CPU核心？ Without having to actually implement the threading yourself? 无需自己实际执行线程？ Some languages support this kind of thing; 有些语言支持这种事情; for example, the .NET team has been working on PLINQ. 例如，.NET团队一直致力于PLINQ。

Those are just a few examples of things that can make your life easier in parallel/multi-threaded applications. 这些只是一些可以使您在并行/多线程应用程序中更轻松的事情的例子。 I'm sure that there are many more. 我相信还有更多。

In languages that are not designed for concurrency, you must rely upon low-level system calls and manage a lot of things yourself. 在不是为并发而设计的语言中，您必须依赖于低级系统调用并自己管理很多事情。 In contrast, a programming language designed for concurrency, like Erlang, will provide high-level constructs that hide the low-level details. 相比之下，为Erlang设计的编程语言（如Erlang）将提供隐藏低级细节的高级构造。 This makes it easier to reason about the correctness of your code, and also results in more portable code. 这样可以更容易地推断代码的正确性，并且还可以生成更多可移植代码。

Also, in a programming language designed for concurrency, there are typically only a handful of ways to do concurrent things, which leads to consistency. 此外，在为并发性设计的编程语言中，通常只有少数方法可以执行并发操作，从而实现一致性。 In contrast, if the programming language was not designed for concurrency, then different libraries and different programmers will do things in different ways, making it difficult to make choices about how to do them. 相反，如果编程语言不是为并发而设计的，那么不同的库和不同的程序员将以不同的方式做事，这使得很难做出如何做的选择。

It's a bit like the difference between a programming language with automated garbage collection and one without. 这有点像自动垃圾收集的编程语言与没有自动垃圾收集的编程语言之间的区别。 Without the automation, the programmer has to think a lot about implementation details. 如果没有自动化，程序员必须考虑很多实现细节。

The difference between multithreaded programming and multi-process programming (ie, fork()), is that a multithreaded program may be more efficient because data doesn't have to be passed across process boundaries, but a multi-process approach may be more robust. 多线程编程和多进程编程（即fork（））之间的区别在于，多线程程序可能更高效，因为数据不必跨越进程边界传递，但多进程方法可能更强大。

Regarding your question of why fork() instead of threading: when you use separate processes, you get automatic separation of address spaces. 关于为什么fork()而不是线程的问题：当你使用单独的进程时，你会自动分离地址空间。 In multithreaded programs, it is very common for threads to communicate using their (naturally) shared memory. 在多线程程序中，线程使用它们（自然）共享内存进行通信是很常见的。 This is very efficient, but it is also hard to get all the synchronization between threads right, and this is why some languages are better at multithreading than others: they provide better abstractions to handle the common cases of communication between threads. 这是非常有效的，但也很难在线程之间获得所有同步，这就是为什么有些语言在多线程方面比其他语言更好的原因：它们提供了更好的抽象来处理线程之间通信的常见情况。

With separate processes, you don't have these problems to the same extent. 使用单独的进程，您可以在相同的范围内遇到这些问题。 Typically, you set up communication between processes to follow some form of message-passing pattern, which is easier to get right. 通常，您在进程之间建立通信以遵循某种形式的消息传递模式，这更容易实现。 (Well, you can use shared memory between processes too, but that's not as common as message passing.) On Unix systems fork() has typically been very cheap, so traditional design of concurrent programs in Unix uses processes, and pipes to communicate between them, but on systems where process creation is an expensive operation, threads are often regarded as the better approach. （好吧，你也可以在进程之间使用共享内存，但这并不像消息传递那么常见。）在Unix系统上， fork()通常非常便宜，所以Unix中的并发程序的传统设计使用进程，管道之间进行通信它们，但在进程创建是一项昂贵的操作的系统上，线程通常被认为是更好的方法。

I'm studying the subject (right now :D) and one of the things that seems a very important difference on concurrency between languages is the language expressive power on concurrency. 我正在研究这个主题（现在：D），其中一个似乎是语言之间并发性的一个非常重要的区别就是语言对并发性的表达能力。

For example C++ has no native support to concurrency and it rely on functions provided by the OS. 例如，C ++没有本机支持并发性，它依赖于操作系统提供的功能。

Java is a step above because has some built-in methods while others are left to the OS (threads scheduling or priority for example). Java是上面的一步，因为它有一些内置方法，而其他方法则留给操作系统（例如线程调度或优先级）。

Instead one of what seems to be one of the best programming languages supporting concurrency is Ada which has in fact a whole concurrency-model built in (scheduling and priority included). 相反，似乎是支持并发性的最佳编程语言之一是Ada，其实际上内置了一个完整的并发模型 （包括调度和优先级）。

Why is this important? 为什么这很重要？ Because of the portability ! 因为便携性 ！

Using a language with good concurrency expressive power allows you to bring your concurrent program to Windows, linux or Mac withouth great fears about the way it will work. 使用具有良好并发性的语言表达能力允许您将并发程序带到Windows，Linux或Mac，而不用担心它的工作方式。 For example: thread priority will be applied in the same way in your Ada program running in windows, linux or Mac while it can be really different (ignored in some OS and applied in others) with Java or C++. 例如：在Windows，Linux或Mac中运行的Ada程序中，线程优先级将以相同的方式应用，而在Java或C ++中它们可能真的不同 （在某些操作系统中被忽略并在其他操作系统中应用）。

This is what seem to me by the course I'm taking at the university right now :) 这就是我现在正在大学里学习的课程:)

Re: why some languages are better for concurrency than others: it all depends on the tools that the language offers to the programmer. Re：为什么有些语言的并发性比其他语言更好：这一切都取决于语言为程序员提供的工具。 Some languages, like C++, give you low-level access to the system threads. 某些语言（如C ++）为您提供对系统线程的低级访问。 Java has all kinds of libraries that offer constructs for concurrent programming, kind of like design patterns (see latch, barrier, et al). Java有各种各样的库，它们提供并发编程的结构，类似于设计模式（参见latch，barrier等）。 Some languages make it easier than others to deal with threads. 有些语言比其他语言更容易处理线程。 Some languages keep you from sharing state between threads, which is a major source of bugs. 有些语言使您无法在线程之间共享状态，这是错误的主要来源。

And then some languages have different underlying thread models than others. 然后一些语言有不同的底层线程模型。 Python's thread model, as I understand it, uses a single system thread and handles all the context-switching itself, which is not as clean as it is only as granular as a single Python instruction. 正如我所理解的，Python的线程模型使用单个系统线程并处理所有上下文切换本身，这不像它只是像单个Python指令一样精细。

As an analogy, it's like asking why some languages are better at handling regular expressions, or searching, or doing complex math when in the end its all just moving bits around. 作为一个类比，它就像问为什么有些语言在处理正则表达式，搜索或进行复杂数学运算时最好只是移动位。

Edit: frunsi is correct, Python threads are system threads (apparently this is a common misconception). 编辑：frunsi是正确的，Python线程是系统线程（显然这是一个常见的误解）。 The problem I was referring to was with the GIL, or global interpreter lock, which controls thread execution. 我所指的问题是使用GIL或全局解释器锁来控制线程执行。 Only a single thread can run in the Python interpreter at once, and context is only switched between instructions. 只有一个线程可以同时在Python解释器中运行，并且上下文仅在指令之间切换。 My knowledge of Python multithreading mainly comes from this paper: www.dabeaz.com/python/GIL.pdf . 我对Python多线程的了解主要来自本文： www.dabeaz.com/python/GIL.pdf 。 Maybe a little off topic, but a good reference nonetheless. 也许有点偏离主题，但仍然是一个很好的参考。

To fork is human to thread is divine :D to fork is human to thread is divine：D

Forking involves the kernel and creates a seperate adress space -- meaning proc X and Y can't easily share and need to use IPC primitives, and creating a thread allows in process synchronization which is FAR faster than IPC which involves kernel context switches by both processes and needless thread yield's (which involve the kernel to wake said thread up). forking涉及内核并创建一个单独的地址空间 - 意味着proc X和Y不能轻易共享并且需要使用IPC原语，并且创建一个线程允许进程同步，这比IPC更快，因为IPC涉及内核上下文切换进程和不必要的线程产量（涉及内核唤醒所述线程）。

However there is a multitude of reasons why different concurrency models are better than others. 但是，有很多原因导致不同的并发模型比其他模型更好。 I'm just giving you the rough rule of thumb for general programming. 我只是给你一般编程的粗略经验。 For example not forking might endanger logic which could be separated out by forking (I love that word) -- endangered because if the other logic in the process crashes it , well said logic is going down with the process. 例如，不分叉可能会危及可能通过分叉分离出来的逻辑（我喜欢这个词） - 危及因为如果进程中的其他逻辑崩溃了，那么逻辑就会随着进程而下降。

no language is better than an other one, it's all about the concepts. 没有一种语言比另一种语言更好，所有这些都与概念有关。 Doing things simultaniously by processes consumes usually more resources than threads (wich could be seen as lightweight processes) some languages come with easy too use libs. 通过进程同时进行操作通常比线程消耗更多的资源（可以看作是轻量级进程），有些语言也很容易使用库。 Java threads are easy too use, Posix threads (C on unix) are a bit more complicated. Java线程很容易使用，Posix线程（在unix上的C）有点复杂。

Concurrency is basically being able to fork() processes and compute stuff in parallel the same way memory management is basically being able to call malloc. 并发性基本上能够以并行方式fork（）进程和计算内容，就像内存管理基本上能够调用malloc一样。 It is part of the story, but not all of it. 这是故事的一部分，但不是全部。 Being able to simplify the issues relating to concurrency is the difference between languages that are good at concurrency and those that just can be concurrent. 能够简化与并发性相关的问题是兼容并发的语言之间的区别。

The language choice depends on the application you want to do. 语言选择取决于您要执行的应用程序。

Do you want to create a highly scalable system, with lots of incoming "requests"? 您想创建一个高度可扩展的系统，有很多传入的“请求”吗？ Then Erlang may be a good choice. 然后Erlang可能是个不错的选择。 It is known to be a good fit for "highly concurrent" application scenarios. 众所周知，它非常适合“高度并发”的应用场景。

If you want to write a typical game and want it to use the now typically available dual or quad core CPUs of your audience, then you are bound to different decisions (frameworks, engines, libraries, available hardware interfaces). 如果您想编写一个典型的游戏并希望它使用现在通常可用的双核或四核CPU，那么您将受到不同的决策（框架，引擎，库，可用的硬件接口）。 In this case, you will use threads, and thread pools to offload processing work. 在这种情况下，您将使用线程和线程池来卸载处理工作。 Most probably you will use a kind of message queue to communicate between the threads. 很可能你会使用一种消息队列来在线程之间进行通信。

In (server-side) web-development you most probably already gained some experience in concurrent programming! 在（服务器端）Web开发中，您很可能已经获得了并发编程的一些经验！ Maybe you were not aware of it, because the language and the given framework (maybe Apache & PHP) provided you with an environment that took the burden off your shoulders. 也许你没有意识到这一点，因为语言和给定的框架（可能是Apache和PHP）为你提供了一个可以减轻负担的环境。