[英]Scrapy: TypeError: __init__() missing 1 required positional argument: 'settings'
[英]TypeError: __init__() missing 1 required positional argument with scrapy passing params to pipeline
通常我知道這個錯誤意味着什么,但不知何故我相信我確實通過了論點
I am playing around scrapy and inside pipeline, I figured if I am scraping through few different sites or pages, I want them to let's say all output json file but with different json of course so I can know which json belongs to which website
所以我創建了一個服務文件夾,里面有一個名為管道的文件
所以在這個pipeline.py
里面
我在下面創建了一個 class
import json
import os
class JsonWriterPipeline(object):
"""
write all items to a file, most likely json file
"""
def __init__(self, filename):
print(filename) # this does prints the filename though
self.file = open(filename, 'w')
def open_spider(self, spider):
self.file.write('[')
def close_spider(self, spider):
# remove the last two char which is ',\n' then add closing bracket ']'
self.file.seek(self.file.seek(0, os.SEEK_END) - 2)
self.file.write(']')
def process_item(self, item, spider):
line = json.dumps(dict(item)) + ",\n"
self.file.write(line)
return item
然后在根文件夾下的原始pipeline.py
中我有這樣的東西
from scrape.services.pipeline import JsonWriterPipeline
JsonWriterPipeline('testing.json') # so I have passed the filename argument as `'testing.json'`
但是我只是不斷收到錯誤,如上所述,當我執行print(filename)
時,它會正確打印出來。
如果我沒有傳遞文件名而不是 static 文件名,它可以完美運行,但我當然希望它是動態的,這就是我創建 class 的原因,這樣我就可以重用它
任何人都有想法
編輯:正如下面提到的Gallaecio
然后意識到管道不接受參數,我做了一些谷歌搜索,說管道接受參數的方式是如果參數是通過命令行而不是在代碼本身內部傳遞
感謝您提供的任何建議和意見。
我想到了一種替代方法,它不是創建新的 object 並在創建時傳遞參數。 也許嘗試類似 inheritance
下面的示例
內部service/pipeline.py
import json
import os
class JsonWriterPipeline(object):
"""
write all items to a file, most likely json file
"""
filename = 'demo.json' # instead of passing argument create variable for the class
def __init__(self):
self.file = open(self.filename, 'w+')
def open_spider(self, spider):
self.file.write('[')
def close_spider(self, spider):
# remove the last two char which is ',\n' then add closing bracket ']'
self.file.seek(self.file.seek(0, os.SEEK_END) - 2)
self.file.write(']')
return
def process_item(self, item, spider):
line = json.dumps(dict(item)) + ",\n"
self.file.write(line)
return item
在原始pipeline.py
中
from scrape.services.pipeline import JsonWriterPipeline
class JsonWriterPipelineA(JsonWriterPipeline):
filename = 'a.json'
def __init__(self):
super().__init__()
class JsonWriterPipelineB(JsonWriterPipeline):
filename = 'b.json'
def __init__(self):
super().__init__()
這是我能想到的另一種方法,希望這對你有幫助
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.