简体   繁体   English

在多核机器上进行多线程处理而不是最大化CPU

[英]Multithreading on a multi core machines not maxing CPU

I am working on maintaining someone else's code that is using multithreading, via two methods: 我正在通过两种方法维护使用多线程的其他人的代码:

1: ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ReadData), objUpdateItem)

2: Dim aThread As New Thread(AddressOf LoadCache)
   aThread.Start()

However, on a dual core machine, I am only getting 50% CPU utlilization, and on a dual core with hyperthreadin enabled machine, I am only getting 25% CPU utilization. 但是,在双核机器上,我只获得50%的CPU使用率,而在具有超线程启用机器的双核上,我只获得25%的CPU利用率。

Obviously threading is extremely complicated, but this behaviour would seem to indicate that I am not understanding some simple fundamental fact? 显然线程非常复杂,但这种行为似乎表明我不理解一些简单的基本事实?

UPDATE UPDATE

The code is too horribly complex to post here unfortunately, but for reference purposes, here is roughly what happens....I have approx 500 Accounts whose data is loaded from the database into an in memory cache...each account is loaded individually, and that process first calls a long running stored procedure, followed by manipulation and caching of the returned data. 遗憾的是,这里的代码非常复杂,但是出于参考目的,这里大致会发生什么......我有大约500个帐户,其数据从数据库加载到内存缓存中......每个帐户都单独加载,该进程首先调用一个长时间运行的存储过程,然后对返回的数据进行操作和缓存。 So, the point of threading in this situation is that there is indeed a bottleneck hitting the database (ie: the thread will be idled for up to 30 seconds waiting for the query to return), so we thread to allow others to begin processing the data they have received from Oracle. 因此,在这种情况下线程化的关键是确实存在一个瓶颈命中数据库(即:线程将被闲置最多30秒等待查询返回),因此我们通过线程允许其他人开始处理他们从Oracle收到的数据。

So, the main thread executes: 所以,主线程执行:

ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ReadData), objUpdateItem) 

Then, the ReadData() then proceeds to execute (exactly once): 然后,ReadData()然后继续执行(恰好一次):

Dim aThread As New Thread(AddressOf LoadCache)
aThread.Start()

And this is occurring in a recursive function, so the QueueUserWorkItem can be executing multiple times, which in turn then executes exactly one new thread via the aThread.Start 这是在递归函数中发生的,因此QueueUserWorkItem可以多次执行,然后通过aThread.Start执行恰好一个新线程

Hopefully that gives a decent idea of how things are happening. 希望这可以很好地了解事情的发生方式。

So, under this scenario, should this not theoretically pin both cores, rather than maxing out at 100% on one core, while the other core is essentially idle? 那么,在这种情况下,理论上这不应该固定两个核心,而不是在一个核心上达到100%,而另一个核心基本上是空闲的吗?

That code starts one thread that will go an do something. 该代码启动一个将执行某些操作的线程。 To get more than one core working you need to start more than one thread and get them both busy. 要获得多个核心工作,您需要启动多个线程并使它们都忙碌。 Starting a thread to do some work, and then having your main thread wait for it won't get the task done any quicker. 启动一个线程做一些工作,然后让你的主线程等待它将不会更快地完成任务。 It is common to start a long running task on a background thread so that the UI remains responsive, which may be what this code was intended to do, but it won't make the task get done any quicker. 通常在后台线程上启动一个长时间运行的任务,以便UI保持响应,这可能是此代码的目的,但它不会使任务更快地完成。

@Judah Himango - I had assumed that those two lines of code were samples of how multi-threading were being achieved in two different places in the program. @Judah Himango - 我假设这两行代码是程序中两个不同位置如何实现多线程的样本。 Maybe the OP can clarify if this is the case or if these two lines really are in the one method. 也许OP可以澄清是否是这种情况,或者这两条线是否真的在一种方法中。 If they are part of one method then we will need to see what the two methods are actually doing. 如果它们是一种方法的一部分,那么我们需要看看这两种方法实际上在做什么。

Update: 更新:
That does sound like it should max out both cores. 这听起来似乎应该最大化两个核心。 What do you mean by recursivly calling ReadData()? 通过递归调用ReadData()是什么意思? If each new thread is only calling ReadData at or near its end to start the next thread then that could explain the behaviour you are seeing. 如果每个新线程仅在其末尾或其附近调用ReadData以启动下一个线程,那么这可以解释您所看到的行为。
I am not sure that this is actaully a good idea. 我不确定这是一个好主意。 If the stored proc takes 30 seconds to get the data then presumably it is placing a fair load on the database server. 如果存储的proc需要30秒才能获得数据,那么可能是它在数据库服务器上放置了一个公平的负载。 Running it 500 times in parallel is just going to make things worse. 并行运行500次只会让事情变得更糟。 Obviously I don't know your database or data, but I would look at improving the performance of the stored proc. 显然我不知道你的数据库或数据,但我会考虑提高存储过程的性能。
If multi threading does look like the way forward, then I would have a loop on the main thread that calls ThreadPool.QueueUserWorkItem once for each account that needs loading. 如果多线程确实看起来像前进的方式,那么我将在主线程上有一个循环,为需要加载的每个帐户调用一次ThreadPool.QueueUserWorkItem。 I would also remove the explicit thread creation and only use the thread pool. 我也会删除显式线程创建,只使用线程池。 That way you are less likely to starve the local machine by creating too many threads. 这样,您就不太可能通过创建太多线程来使本地计算机饿死。

How many threads are you spinning up? 你旋转了多少个线程? It may seem primitive (wait a few years, and you won't need to do this anymore), but your code has got to figure out an optimal number of threads to start, and spin up that many. 它可能看起来很原始(等待几年,你不再需要这样做了),但是你的代码必须弄清楚要启动的最佳线程数,然后调整那么多。 Simply running a single thread won't make things any faster, and won't pin a physical processor, though it may be good for other reasons (a worker thread to keep your UI responsive, for instance). 简单地运行单个线程不会使事情变得更快,并且不会固定物理处理器,尽管它可能有其他原因(例如,工作者线程保持UI响应)。

In many cases, you'll want to be running a number of threads equal to the number of logical cores available to you (available from Environment.ProcessorCount, I believe), but it may have some other basis. 在许多情况下,您将希望运行多个线程,这些线程等于您可用的逻辑核心数(我相信可以从Environment.ProcessorCount获得),但它可能还有其他一些基础。 I've spun up a few dozen threads, talking to different hosts, when I've been bound by remote process latency, for instance. 例如,当我受到远程进程延迟的约束时,我已经启动了几十个线程,与不同的主机通信。

Multi-Threaded and Multi-Core are two different things. 多线程和多核是两回事。 Doing things Multi-Threaded often won't offer you an enormous increase in performance, sometimes quite the opposite. 做多线程经常不会为你提供巨大的性能提升,有时恰恰相反。 The Operating System might do a few tricks to spread your cpu cycles over multiple cores, but that's where it ends. 操作系统可能会做一些技巧来将您的CPU周期分散到多个核心上,但这就是它结束的地方。

What you are looking for is Parallelism. 您正在寻找的是Parallelism。 The .NET 4.0 framework will add a lot of new features to support Parallelism. .NET 4.0框架将添加许多新功能来支持Parallelism。 Have a sneak-peak here: 在这里有一个潜行高峰:
http://www.danielmoth.com/Blog/2009/01/parallelising-loops-in-net-4.html http://www.danielmoth.com/Blog/2009/01/parallelising-loops-in-net-4.html

The CPU behavior would indicate that the application is only utilizing one logical processor. CPU行为将指示应用程序仅使用一个逻辑处理器。 50% would be one proc out of 2 (proc+proc). 50%将是2个中的一个proc(proc + proc)。 25% would be one logical processor out of 4 (proc + HT + proc + HT) 25%将是4个中的一个逻辑处理器(proc + HT + proc + HT)

How many threads to you have in total and do you have any locks in LoadCache. 你有多少个线程,你在LoadCache中有任何锁定。 A SyncLock may a multi-thread system act as a single thread (by design). SyncLock可以将多线程系统充当单线程(按设计)。 Also if your only spool one thread you will only get one worker thread. 此外,如果您的唯一假脱机一个线程,您将只获得一个工作线程。

CPU utilization is suggesting that you're only using one core; CPU利用率表明您只使用一个核心; this may suggest that you've added threading to a portion where it is not beneficial (in this case, where CPU time is not a bottle neck). 这可能表明你已经添加了线程到一个没有益处的部分(在这种情况下,CPU时间不是瓶颈)。

If Loading the Cache or reading data happens very quickly, multi threading won't provide a massive improvement in speed performance. 如果加载缓存或读取数据发生得非常快,多线程将无法在速度性能方面带来巨大改进。 Similarly, if you're encountering a different bottleneck (slow bandwidth to a server, etc), it may not show up as CPU usage. 同样,如果您遇到不同的瓶颈(服务器的带宽缓慢等),它可能不会显示为CPU使用率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM