简体   繁体   中英

How can I add customized method to return the data source not only in dask format in a plugin but also in several different custom formats?

I am working on an intake plugin that allows to read specific JSON files from Github. These JSON files contain basic information about systems that we want to simulate with different simulation software, each with its own input format. We have converters from JSON to each of these formats available. I would now like to add a method 'to_format' to my plugin similar to the 'to_dask' method, but I keep getting `RemoteSequenceSource object has no attribute 'to_format'. Is there a way to do this?


from latticejson.convert import to_elegant, to_madx

class RemoteLatticejson(RemoteSource):
    """
    A lattice json source on the server
    """

    name      = 'remote-latticejson'
    container = 'python'
    partition_access = False

    def __init__(self,org, repo, filename, parameters= None, metadata=None, **kwargs):
        # super().__init__(org, repo, filename, parameters, metadata=metadata, **kwargs)
        self._schema = None
        self.org = org
        self.repo = repo
        self.filename = filename
        self.metadata = metadata

        self._dict = None

    def _load(self):
        self._dict = read_remote_file(self.org, self.repo, self.filename)

    def _get_schema(self):
        if self._dict is None:
            self._load()

        self._dtypes = {
                'version': 'str',
                'title': 'str',
                'root': 'str',
                'elements': 'dict',
                'lattice': 'dict'
                }
        return base.Schema(
                datashape=None,
                dtype=self._dtypes,
                shape=(None, len(self._dtypes)),
                npartitions=1,
                extra_metadata={}
                )


    def _get_partition(self, i):
        if self._dict is None:
            self._load_metadata()
        data = [self.read()]
        return [self._dict]


    def read(self):
        if self._dict is None:
            self._load()

        self.metadata = {
                'version': self._dict.get('version'),
                'title': self._dict.get('title'),
                'root': self._dict.get('root')
                }

        return self._dict

    def to_madx(self):
        self._get_schema()
        return to_madx(self._dict)

    def _close(self):
        pass
`

There are two concepts at play here:

  • a new driver, which can freely add methods to its implementation ( to_X ) and expose them to the user. This is allowed, and there are cases implementing this, to pass out particular formats or to allow access to the base object (like here ). Note that by adding methods, you make the already-long list of methods on the source even longer, so we lightly discourage this.
  • a remote source, which is only used in the case that the client cannot access the data directly (because it doesn't have a route, permission, or the right driver locally). This case is more restricted, and the transfer of data is mediated by the "container" sources . If you wanted to have new, custom behaviour for your source when transferring data through the server, you would need to write your own container as well as the original driver (the driver would have container = "mycustom" and you would register the container with intake.container.register_container ).

You can see from this, that Intake was not really designed for processing or writing data, but to bring you datasets in recognised forms in the simplest way. By limiting scope, we hoped to keep the code simple and flexible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM