簡體   English   中英

TypeError: __init__() 缺少 1 個必需的位置參數,scrapy 將參數傳遞給管道

[英]TypeError: __init__() missing 1 required positional argument with scrapy passing params to pipeline

通常我知道這個錯誤意味着什么,但不知何故我相信我確實通過了論點

I am playing around scrapy and inside pipeline, I figured if I am scraping through few different sites or pages, I want them to let's say all output json file but with different json of course so I can know which json belongs to which website

所以我創建了一個服務文件夾,里面有一個名為管道的文件

所以在這個pipeline.py里面

我在下面創建了一個 class

import json
import os

class JsonWriterPipeline(object):
    """
    write all items to a file, most likely json file
    """
    def __init__(self, filename):
        print(filename)  # this does prints the filename though
        self.file = open(filename, 'w')

    def open_spider(self, spider):
        self.file.write('[')

    def close_spider(self, spider):
        # remove the last two char which is ',\n' then add closing bracket ']'
        self.file.seek(self.file.seek(0, os.SEEK_END) - 2)
        self.file.write(']')

    def process_item(self, item, spider):
        line = json.dumps(dict(item)) + ",\n"
        self.file.write(line)
        return item

然后在根文件夾下的原始pipeline.py中我有這樣的東西

from scrape.services.pipeline import JsonWriterPipeline



JsonWriterPipeline('testing.json')  # so I have passed the filename argument as `'testing.json'`

但是我只是不斷收到錯誤,如上所述,當我執行print(filename)時,它會正確打印出來。

如果我沒有傳遞文件名而不是 static 文件名,它可以完美運行,但我當然希望它是動態的,這就是我創建 class 的原因,這樣我就可以重用它

任何人都有想法

編輯:正如下面提到的Gallaecio然后意識到管道不接受參數,我做了一些谷歌搜索,說管道接受參數的方式是如果參數是通過命令行而不是在代碼本身內部傳遞

感謝您提供的任何建議和意見。

我想到了一種替代方法,它不是創建新的 object 並在創建時傳遞參數。 也許嘗試類似 inheritance

下面的示例

內部service/pipeline.py

import json
import os


class JsonWriterPipeline(object):
    """
    write all items to a file, most likely json file
    """
    filename = 'demo.json'  # instead of passing argument create variable for the class

    def __init__(self):
        self.file = open(self.filename, 'w+')

    def open_spider(self, spider):
        self.file.write('[')

    def close_spider(self, spider):
        # remove the last two char which is ',\n' then add closing bracket ']'
        self.file.seek(self.file.seek(0, os.SEEK_END) - 2)
        self.file.write(']')
        return

    def process_item(self, item, spider):
        line = json.dumps(dict(item)) + ",\n"
        self.file.write(line)
        return item

在原始pipeline.py

from scrape.services.pipeline import JsonWriterPipeline

class JsonWriterPipelineA(JsonWriterPipeline):
    filename = 'a.json'

    def __init__(self):
        super().__init__()


class JsonWriterPipelineB(JsonWriterPipeline):
    filename = 'b.json'

    def __init__(self):
        super().__init__()

這是我能想到的另一種方法,希望這對你有幫助

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM