繁体   English   中英

Python 多处理文件读取

[英]Python Multiprocessing File Read

我有一个关键字列表,我想验证这些关键字中的任何一个是否在包含超过 100,000 个域名的文件中。 为了更快的处理,我想实现处理,以便可以并行验证每个关键字。

我的代码似乎运行不佳,因为单个处理速度要快得多。 怎么了? :(

import time
from multiprocessing import Pool


def multiprocessing_func(keyword):

    # File containing more than 100k domain names
    # URL: https://raw.githubusercontent.com/CERT-MZ/projects/master/Domain-squatting/domain-names.txt
    file_domains = open("domain-names.txt", "r")

    for domain in file_domains:
        if keyword in domain:
            print("similar domain identified:", domain)
            
    # Rewind the file, start from the begining
    file_domains.seek(0)


if __name__ == '__main__':

    starttime = time.time()

    # Keywords to check
    keywords = ["google","facebook", "amazon", "microsoft", "netflix"]

    # Create a multiprocessing Pool
    pool = Pool()  

    for keyword in keywords:
        print("Checking keyword:", keyword)
        
        # Without multiprocessing pool
        #multiprocessing_func(keyword)
        
        # With multiprocessing pool
        pool.map(multiprocessing_func, keyword)

    # Total run time
    print('That took {} seconds'.format(time.time() - starttime))

想想为什么这个程序:

import multiprocessing as mp

def work(keyword):
    print("working on", repr(keyword))

if __name__ == "__main__":
    with mp.Pool(4) as pool:
        pool.map(work, "google")

印刷

working on 'g'
working on 'o'
working on 'o'
working on 'g'
working on 'l'
working on 'e'

map()作用于一个序列,一个字符串就是一个序列。 与其将map()调用留在循环中,您可能只想使用keywords (整个列表)作为第二个参数调用它一次。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM