简体   繁体   中英

P4Python: use multiple threads that request perforce information at the same time

I've been working on a "crawler" of sorts that goes through our repository, and lists directories and files as it goes. For every directory it enounters, it creates a thread that does the same for that directory and so on, recursively. Effectively this creates a very short-lived thread for every directory encountered in the repos. ( it doesn't take very long to request information on just one path, there are just tens of thousands of them )

The logic looks as follows:

import threading
import perforce as Perforce #custom perforce class
from pathlib import Path

p4 = Perforce()
p4.connect()

class Dir():
    def __init__(self, path):
        self.dirs = []
        self.files = []
        self.path = path

        self.crawlers = []

    def build_crawler(self):
        worker = Crawler(self)
        # append to class variable to keep it from being deleted
        self.crawlers.append(worker)
        worker.start()

class Crawler(threading.Thread):
    def __init__(self, dir):
        threading.Thread.__init__(self)
        self.dir = dir

    def run(self):
        depotdirs = p4.getdepotdirs(self.dir.path)
        depotfiles = p4.getdepotfiles(self.dir.path)

        for p in depotdirs:
            if Path(p).is_dir():
                _d = Dir(self.dir, p)
                self.dir.dirs.append(_d)

        for p in depotfiles:
            if Path(p).is_file():
                f = File(p) # File is like Dir, but with less stuff, just a path.
                self.dir.files.append(f)

        for dir in self.dir.dirs:
            dir.build_crawler()
            for worker in d.crawlers:
                worker.join()

Obviously this is not complete code, but it represents what I'm doing.

My question really is whether I can create an instance of this Perforce class in the __init__ method of the Crawler class, so that requests can be done separately. Right now, I have to call join() on the created threads so that they wait for completion, to avoid concurrent perforce calls.

I've tried it out, but it seems like there is a limit to how many connections you can create: I don't have a solid number, but somewhere along the line Perforce just started straight up refusing connections, which I presume is due to the number of concurrent requests.

Really what I'm asking I suppose is two-fold: is there a better way of creating a data model representing a repos with tens of thousands of files than the one I'm using, and is what I'm trying to do possible, and if so, how.

Any help would be greatly appreciated :)

I found out how to do this (it's infuriatingly simple, as with all simple solutions to overly complicated problems):

To build a data model that contains Dir and File classes representing a whole depot with thousands of files, just call p4.run("files", "-e", path + "\\\\...") . This will return a list of every file in path , recursively. From there all you need to do is iterate over every returned path and construct your data model from there.

Hope this helps someone at some point.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM