簡體   English   中英

如何在 Luigi 中使用早期任務的輸出?

[英]How to use an earlier Task's output in Luigi?

我正在編寫一個管道,稍后的任務需要在其中讀取早期任務的輸出,以便他們知道需要在他們的要求中傳遞哪些參數。

我在下面創建了一個簡化的設置示例。

import random
import pickle
import luigi


class WriteNumbers(luigi.Task):

    def requires(self):
        pass

    def run(self):
        # pickle a random list of ints 1-100
        numbers = [random.randint(1, 100) for _ in range(100)]
        pickle.dump(numbers, open("./numbers.pkl", 'wb'))

    def output(self):
        return luigi.LocalTarget("./numbers.pkl")


class SquareNumber(luigi.Task):
    number = luigi.IntParameter()

    def requires(self):
        pass

    def run(self):
        # given a number as the parameter, write a file containing its square
        with open("./squared_{}".format(self.number), 'w') as f:
            f.write(str(self.number ** 2))

    def output(self):
        return luigi.LocalTarget("./squared_{}".format(self.number))


class SquareAll(luigi.WrapperTask):

    def requires(self):
        yield WriteNumbers()  # require the number list to be pickled first
        numbers = pickle.load("./numbers.pkl")  # load the number list
        for n in numbers:  # square each number in the number list
            yield SquareNumber(number=n)

class CubeNumber(luigi.Task):
    number = luigi.IntParameter()

    def requires(self):
        pass

    def run(self):
        # given a number as the parameter, write a file containing its cube
        with open("./cubed_{}".format(self.number), 'w') as f:
            f.write(str(self.number ** 3))

    def output(self):
        return luigi.LocalTarget("./cubed_{}".format(self.number))


class CubeAll(luigi.WrapperTask):

    def requires(self):
        yield WriteNumbers()  # require the number list to be pickled first
        numbers = pickle.load("./numbers.pkl")  # load the number list
        for n in numbers:  # square each number in the number list
            yield CubeNumber(number=n)

class CrunchNumbers(luigi.WrapperTask):
    def requires(self):
        yield SquareAll()
        yield CubeAll()

if __name__ == '__main__':
    luigi.run()

當通過python luigi_example.py CrunchNumbers運行時,將創建 100 個隨機數並將列表腌制並轉儲到磁盤。 SquareAll加載該pickle 列表並使用它來要求具有所需參數的SquareNumber任務。 CubeAll為其類似的任務引用了相同的結果文件。

問題是,運行時會拋出異常,因為numbers.pkl文件尚不存在。

如何允許后續任務根據較早任務的輸出生成依賴關系? 我在這里使用了隨機數來表示無法提前知道輸出:我的實際應用程序正在處理來自 API 的數據。

您正在使用動態依賴項,這些需要從run方法中調用(當requires的結果可用作input ),因此CubeAllSquareAll應該像這樣構造:

class SquareAll(luigi.WrapperTask):

    def requires(self):
        yield WriteNumbers()  # require the number list to be pickled first

    def run(self):
        numbers_file = self.input()[0].path
        numbers = pickle.load(numbers_file)  # load the number list
        for n in numbers:  # square each number in the number list
            yield SquareNumber(number=n)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM