简体   繁体   English

在 unix 下的 python 多处理中省略“if __name__ == '__main__'”语句是否安全?

[英]is it safe to leave out "if __name__ == '__main__'" statement for multiprocessing in python under unix?

I am trying to implement a felxible pipeline in python, that i have split up into several modules.我正在尝试在 python 中实现一个灵活的管道,我已经把它分成了几个模块。 Each of these modules can be used as a standalone tool, but they may also sometimes have to import functions from each other.这些模块中的每一个都可以用作独立工具,但有时它们也可能需要相互导入函数。 I have placed general simple functions, that are used frequently by multiple of these modules, into a "misc" module that is imported by all of the other modules when needed.我已经将这些模块中的多个经常使用的通用简单函数放入一个“misc”模块中,该模块在需要时由所有其他模块导入。

Now, each of these modules may want to run some functions in parallel using multiprocessing (usually calling some external tools).现在,这些模块中的每一个都可能希望使用多处理(通常调用一些外部工具)并行运行某些功能。 So i have created a general "run_parallel" function that takes a list of functions and corresponding arguments as arguments, determines the priority of each and distributes the avaiable cores over them accordingly, and then runs these functions in parallel using multiprocessing and starmap().因此,我创建了一个通用的“run_parallel”函数,它将函数列表和相应的参数作为参数,确定每个的优先级并相应地在它们上分配可用的内核,然后使用 multiprocessing 和 starmap() 并行运行这些函数。

Now i think this function could nicely be placed in the "misc" module and could just be imported when any of the other functions need to run jobs in parallel.现在我认为这个函数可以很好地放在“misc”模块中,并且可以在任何其他函数需要并行运行作业时导入。 However, if i follow the (apparently) general rule to always use the if __name__ == '__main__ statement for this, that means I can't import this function and reuse it in multiple modules.但是,如果我遵循(显然)一般规则始终if __name__ == '__main__使用if __name__ == '__main__语句,则意味着我无法导入此函数并在多个模块中重用它。 I never fully understood this requirement, but it does seem to have something to do with windows, specifically?我从来没有完全理解这个要求,但它似乎确实与 Windows 有关系,特别是? My pipeline will work ONLY under unix.我的管道只能在 unix 下工作。

Does that mean I MUST implement this "run_parallel" method seperately for each of my modules?这是否意味着我必须为我的每个模块单独实现这个“run_parallel”方法? Or can i just safely leave it away, if my code only is meant to run under linux/unix environments?或者我可以安全地离开它,如果我的代码只打算在 linux/unix 环境下运行?

EDIT: I realize now i just completely misunderstood the usage of this statement in the tutorials and usage examples for multiprocessing.编辑:我现在意识到我完全误解了多处理教程和用法示例中此语句的用法。 I thought, for some reason it was required also within any function that uses something from multiprocessing (and have always been confused about why that would be).我想,出于某种原因,在任何使用多处理功能的函数中也需要它(并且一直对为什么会这样感到困惑)。 But in these examples they were also only protecting the part of the example code that would call that function, preventing it from automatically being called on every import (not preventing than function to be importet at all, as i thought).但是在这些示例中,它们也只保护了调用该函数的示例代码部分,防止在每次导入时自动调用它(正如我所想的那样,根本不会阻止函数被导入)。 Total misunderstanding!完全误会!

When you run a script or import a module, python executes all of the code written at module level.当您运行脚本或导入模块时,python 会执行在模块级别编写的所有代码。 In the case of a function like在像这样的函数的情况下

def foo():
    pass

"execution" only means to assign the newly compiled function object to a variable called "foo". “执行”仅表示将新编译的函数对象分配给名为“foo”的变量。 These things do not need to be protected by a if __name__ == "__main__": block.这些东西不需要被if __name__ == "__main__":块保护。 You only need to be concerned about code that performs an action, such as code that calls foo() .您只需要关心执行操作的代码,例如调用foo()代码。

The top level script called to start a python program is called "__main__" .用于启动 python 程序的顶级脚本称为"__main__" Modules that you import are not called "__main__" and a if __name__ == "__main__": block is pointless.您导入的模块不称为"__main__"并且if __name__ == "__main__":块是没有意义的。 What is important is that modules be import-safe.重要的是模块是导入安全的。 That is, it should always be safe to import a module without it doing anything beyond initialization.也就是说,导入一个模块而不做任何初始化之外的事情应该总是安全的。 The actions of a module should always be inside functions or classes that are called from other places.模块的操作应该始终在从其他地方调用的函数或类中。

The top level script is different, it has to actually run the program.顶层脚本不同,它必须实际运行程序。 if __name__ == "__main__": is used to make the top level script import safe. if __name__ == "__main__":用于使顶级脚本导入安全。 That doesn't matter (at least for multiprocessing) for forking systems like Unix.对于像 Unix 这样的分叉系统来说,这并不重要(至少对于多处理而言)。 But Windows needs to spawn a new process and import the top level script - and that import needs to safe, it can't re-execute the program itself.但是 Windows 需要生成一个新进程并导入顶级脚本 - 并且该导入需要安全,它不能重新执行程序本身。

Although you don't need this protection on Unix, modules should always be import-safe.尽管您在 Unix 上不需要这种保护,但模块应该始终是导入安全的。 And its a good discipline for top level scripts, too.它也是顶级脚本的一个很好的纪律。 Why limit code execution when you don't have to?为什么在不需要时限制代码执行?

A decent recipe for scripts is一个体面的脚本配方是

def main()
    do all the things
    return 0

if __name__ == "__main__":
    retcode = main()
    exit(retcode)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM