简体   繁体   English

在所有子目录中运行多个python文件

[英]Run multiple python files in all subdirectories

I have directory containing multiple subdirectories of different scraper. 我有包含不同刮板的多个子目录的目录。 How would you go about writing script that will cd into each of the subdirectories and run the scraper, cd out then continue to the next one what would be the best way to do this if it possible? 您将如何编写脚本,将其CD放入每个子目录中并运行刮板,将CD取出,然后继续执行下一个脚本,如果可能的话,什么是最好的方法?

Example of the how the directory looks: 目录外观示例:

- All_Scrapers (parent dir)
   - Scraper_one (sub dir folder)
       - scraper.py
   - Scraper_two (sub dir folder)
       - scraper.py
   - Scraper_three (sub dir folder)
       - scraper.py
   - all.py

all the scrapers have main function 所有刮板都有主要功能

 if __name__ == "__main__":
         main()

One way of doing this is to walk through your directories and programmactically import the modules you need. 一种方法是遍历目录并以编程方式导入所需的模块。

Assuming that the Scraper X folder s are in the same subdirectory scrapers and you have the batch_run.py script in the directory containing scrapers (hence, at the same path level), the following script will do the trick: 假设Scraper X folder s为在同一个子目录scrapers和你有batch_run.py中包含目录脚本scrapers (因此,在相同的路径级别),下面的脚本就可以了:

import os
import importlib

base_subdir = 'scrapers'

for root, subdirs, filenames in os.walk(base_subdir):
    for subdir in subdirs:
        if not subdir.startswith('__'):
            print(root, subdir)
            submodule = importlib.import_module('.'.join((root, subdir, 'scraper')))
            submodule.main()

EDIT 编辑

If the script is inside the base_subdir path, the code can be adapted by changing a bit how the import_module() is called. 如果脚本位于base_subdir路径内,则可以通过稍微更改import_module()的调用方式来修改代码。

import os
import importlib

base_subdir = '.'

for root, subdirs, filenames in os.walk(base_subdir):
    for subdir in subdirs:
        if not subdir.startswith('__'):
            print(root, subdir)
            script = importlib.import_module('.'.join((subdir, 'scraper')), root)
            script.main()

EDIT 2 编辑2

Some explanations: 一些解释:

How import_module() is being used? 如何使用import_module()

The import_module() line, is what is actually doing the job. import_module()行是实际执行的工作。 Roughly speaking, when it is used with only one argument, ie 粗略地说,当它仅与一个参数一起使用时,即

alias = importlib.import_module("my_module.my_submodule")

it is equivalent to: 它等效于:

import my_module.my_submodule as alias

Instead, when used with two argumens, ie 相反,当与两个argumens一起使用时,即

alias = importlib.import_module("my_submodule", "my_module")

it is equivalent to: 它等效于:

from my_module import my_submodule as alias

This second form is very convenient for relative imports (ie imports using . or .. special directories). 第二种形式对于相对导入(即使用...特殊目录的导入)非常方便。

What is if not subdir.startswith('__'): doing? if not subdir.startswith('__'):怎么办?

When you import a module, Python will generate some bytecode to be interpreted and it will cache the result as .pyc files under the __cache__ directory. 导入模块时,Python会生成一些待解释的字节码,并将结果作为.pyc文件缓存在__cache__目录下。 The aforementioned line will avoid that, when walking through the directories, __cache__ (actually, any directory starting with __ ) will be processed as if it would contain modules to import. 上述行将避免在遍历目录时__cache__ __ (实际上,任何以__开头的目录)都将被处理,就好像它将包含要导入的模块一样。 Other kind of filtering may be equally valid. 其他类型的过滤可能同样有效。

You may want to check os.walk function that traverses the directory tree and at each directory run the script (or the main function that you can wrap the contents of the script into). 您可能需要检查遍历目录树的os.walk函数,并在每个目录下运行脚本(或可以将脚本内容包装到其中的main函数)。

An example code would be: 示例代码为:

import os
for root, dirs, files in os.walk(".", topdown=False):
   scraper_main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM