[英]How to use an earlier Task's output in Luigi?
我正在編寫一個管道,稍后的任務需要在其中讀取早期任務的輸出,以便他們知道需要在他們的要求中傳遞哪些參數。
我在下面創建了一個簡化的設置示例。
import random
import pickle
import luigi
class WriteNumbers(luigi.Task):
def requires(self):
pass
def run(self):
# pickle a random list of ints 1-100
numbers = [random.randint(1, 100) for _ in range(100)]
pickle.dump(numbers, open("./numbers.pkl", 'wb'))
def output(self):
return luigi.LocalTarget("./numbers.pkl")
class SquareNumber(luigi.Task):
number = luigi.IntParameter()
def requires(self):
pass
def run(self):
# given a number as the parameter, write a file containing its square
with open("./squared_{}".format(self.number), 'w') as f:
f.write(str(self.number ** 2))
def output(self):
return luigi.LocalTarget("./squared_{}".format(self.number))
class SquareAll(luigi.WrapperTask):
def requires(self):
yield WriteNumbers() # require the number list to be pickled first
numbers = pickle.load("./numbers.pkl") # load the number list
for n in numbers: # square each number in the number list
yield SquareNumber(number=n)
class CubeNumber(luigi.Task):
number = luigi.IntParameter()
def requires(self):
pass
def run(self):
# given a number as the parameter, write a file containing its cube
with open("./cubed_{}".format(self.number), 'w') as f:
f.write(str(self.number ** 3))
def output(self):
return luigi.LocalTarget("./cubed_{}".format(self.number))
class CubeAll(luigi.WrapperTask):
def requires(self):
yield WriteNumbers() # require the number list to be pickled first
numbers = pickle.load("./numbers.pkl") # load the number list
for n in numbers: # square each number in the number list
yield CubeNumber(number=n)
class CrunchNumbers(luigi.WrapperTask):
def requires(self):
yield SquareAll()
yield CubeAll()
if __name__ == '__main__':
luigi.run()
當通過python luigi_example.py CrunchNumbers
運行時,將創建 100 個隨機數並將列表腌制並轉儲到磁盤。 SquareAll
加載該pickle 列表並使用它來要求具有所需參數的SquareNumber
任務。 CubeAll
為其類似的任務引用了相同的結果文件。
問題是,運行時會拋出異常,因為numbers.pkl
文件尚不存在。
如何允許后續任務根據較早任務的輸出生成依賴關系? 我在這里使用了隨機數來表示無法提前知道輸出:我的實際應用程序正在處理來自 API 的數據。
您正在使用動態依賴項,這些需要從run
方法中調用(當requires
的結果可用作input
),因此CubeAll
和SquareAll
應該像這樣構造:
class SquareAll(luigi.WrapperTask):
def requires(self):
yield WriteNumbers() # require the number list to be pickled first
def run(self):
numbers_file = self.input()[0].path
numbers = pickle.load(numbers_file) # load the number list
for n in numbers: # square each number in the number list
yield SquareNumber(number=n)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.