简体   繁体   English

Python多重处理:子进程以不同的速度运行

[英]Python Multiprocessing: Child processes working at different speed

I'm new to python multiprocessing . 我是python multiprocessing新手。 I'm trying to use a third-party web-API to fetch data for multiple symbols of interest. 我正在尝试使用third-party web-API来获取感兴趣的多个符号的数据。 Here is my python code: 这是我的python代码:

<!-- language:lang-py-->

def my_worker(symbol, table_name):
    while True:
        # Real-time data for the symbol, third party code which is verified
        data = webApi.getData(symbol)
        query = ('insert into ' + table_name + '(var1, var2) values("%s, %s")' %(data[0], data[1]))
        # Execute query and store the data. Omitted for sake of brevity

if __name__ == "__main__":
    my_symbols = get_symbols_list() # List of symbols
    my_tables = get_tables_list()   # Corresponding list of mysql tables
    jobs = []
    for pidx in range(len(my_symbols)):
        pname = 'datarecorder_' + my_symbols[pidx]  # Naming the process for later identification
        p = multiprocessing.Process(name=pname, target=my_worker, args=(my_symbols[pidx], my_tables[pidx],))
        jobs.append(p)
        p.start()

There are approximately 50 processes created in this code. 此代码中大约50 processes created50 processes created

Problem that I'm facing: is that when I look into the corresponding tables after a certain amount of time (say 5 minutes), the number of records in each of the table in my_tables is drastically different (on the order of multiple of 10s) 我面临的问题 是:当经过一定时间(例如5分钟)查看相应的表时,my_tables中每个表中的记录数完全不同(约为10s的倍数) )

Since I am using the same API, the same network connection and the same code to fetch and write data to the mysql tables, I'm not sure what is causing this difference in number of records. 由于我使用相同的API,相同的网络连接和相同的代码来获取数据并将其写入mysql表,因此我不确定是什么原因导致记录数量的差异。 My hunch is that each of the 50 processes is getting assigned an unequal amount of RAM and other resources, and perhaps the priority is also different(?)

Can someone tell me how can I ensure that each of these processes poll the webApi roughly equal number of times? 有人可以告诉我如何确保每个进程轮询webApi的次数大致相等吗?

An effective way to approach such things is to start with something vastly simpler, then add stuff to it until "a problem" shows up. 解决这类问题的有效方法是从简单得多的事情开始,然后向其中添加内容,直到出现“问题”为止。 Otherwise it's just blind guesswork. 否则,这只是盲目的猜测。

For example, here's something much simpler, which I'm running under Windows (like you - I'm using a current Win10 Pro) and Python 3.5.2: 例如,这要简单得多,我正在Windows下运行(像您一样-我正在使用最新的Win10 Pro)和Python 3.5.2:

import multiprocessing as mp
from time import sleep

NPROCS = 50

def worker(i, d):
    while True:
        d[i] += 1
        sleep(1)

if __name__ == "__main__":
    d = mp.Manager().dict()
    for i in range(NPROCS):
        d[i] = 0

    ps = []
    for i in range(NPROCS):
        p = mp.Process(target=worker, args=(i, d))
        ps.append(p)
        p.start()

    while True:
        sleep(3)
        print(d.values())

Here's the most recent output after about a minute of running: 这是运行约一分钟后的最新输出:

[67, 67, 67, 67, 67, 67, 67, 67, 67, 67,
 67, 67, 67, 67, 67, 67, 67, 67, 67, 66,
 66, 66, 66, 66, 66, 66, 66, 66, 66, 66,
 66, 66, 66, 66, 66, 66, 66, 66, 66, 66,
 66, 66, 66, 66, 66, 66, 66, 66, 66, 66]

So I can conclude that there's nothing "inherently unfair" about process scheduling on this box. 因此,我可以得出结论,此框上的流程调度没有“本来就是不公平的”。 On your box? 在你的盒子上? Run it and see ;-) 运行它并查看;-)

I can also see in Task Manager that all 50 processes are treated similarly, with (for example) the same RAM usage and priority. 我还可以在任务管理器中看到,对所有50个进程都进行了类似的处理,例如(相同)的RAM使用和优先级。 FYI, this box happens to have 8 logical cores (4 physical), and way more than enough RAM (16GB). 仅供参考,这个盒子恰好有8个逻辑核心(4个物理核心),并且远远超过了足够的RAM(16GB)。

Now there are worlds of additional complications in what you're doing, none of which we can guess from here. 现在,您正在做的事情还有很多其他复杂性,我们无法从这里猜到。 For example, maybe you're running out of RAM and so some processes are greatly delayed by pagefile swapping. 例如,也许您的内存不足,因此某些过程会因页面文件交换而大大延迟。 Or maybe the work you're doing takes much longer for some arguments than others. 或者,您正在做的工作中某些论点花费的时间比其他论点花费的时间更长。 Or ... but, regardless, the simplest way to find out is to incrementally make a very simple program a little fancier at a time. 或...但是,不管怎么说,最简单的找出方法是一次使一个非常简单的程序逐渐变得更有趣。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM