简体   繁体   中英

Can I “detect” a slicing expression in a python class method?

I am developing an application where I have defined a "variable" object containing data in the form of a numpy array. These variables are linked to (netcdf) data files, and I would like to dynamically load the variable values when needed instead of loading all data from the sometimes huge files at the start.

The following snippet demonstrates the principle and works well, including access to data portions with slices. For example, you can write:

a = var()   # empty variable
print a.values[7]   # values have been automatically "loaded"

or even:

a = var()
a[7] = 0

However, this code still forces me to load the entire variable data at once. Netcdf (with the netCDF4 library) would allow me to directly access data slices from the file. Example:

f = netCDF4.Dataset(filename, "r")
print f.variables["a"][7]

I cannot use the netcdf variable objects directly, because my application is tied to a web service which cannot remember the netcdf file handler, and also because the variable data don't always come from netcdf files, but may originate from other sources such as OGC web services.

Is there a way to "capture" the slicing expression in the property or setter methods and use them? The idea would be to write something like:

    @property
    def values(self):
        if self._values is None:
            self._values = np.arange(10.)[slice]  # load from file ...
        return self._values

instead of the code below.

Working demo:

import numpy as np

class var(object):

    def __init__(self, values=None, metadata=None):
        if values is None:
            self._values = None
        else:
            self._values = np.array(values)
        self.metadata = metadata  # just to demonstrate that var has mor than just values

    @property
    def values(self):
        if self._values is None:
            self._values = np.arange(10.)  # load from file ...
        return self._values

    @values.setter
    def values(self, values):
        self._values = values

First thought: Should I perhaps create values as a separate class and then use __getitem__ ? See In python, how do I create two index slicing for my own matrix class?

No, you cannot detect what will be done to the object after returning from .values . The result could be stored in a variable and only (much later on) be sliced, or sliced in different places, or used in its entirety, etc.

You indeed should instead return a wrapper object and hook into object.__getitem__ ; it would let you detect slicing and load data as needed. When slicing, Python passes in a slice() object .

Thanks to the guidance of Martijn Pieters and with a bit more reading, I came up with the following code as demonstration. Note that the Reader class uses a netcdf file and the netCDF4 library. If you want to try out this code yourself you will either need a netcdf file with variables "a" and "b", or replace Reader with something else that will return a data array or a slice from a data array.

This solution defines three classes: Reader does the actual file I/O handling, Values manages the data access part and invokes a Reader instance if no data have been stored in memory, and var is the final "variable" which in real life will contain a lot more metadata. The code contains a couple of extra print statements for educational purposes.

"""Implementation of a dynamic variable class which can read data from file when needed or
return the data values from memory if they were read already. This concepts supports
slicing for both memory and file access.""" 

import numpy as np
import netCDF4 as nc

FILENAME = r"C:\Users\m.schultz\Downloads\data\tmp\MACC_20141224_0001.nc"
VARNAME = "a"


class Reader(object):
    """Implements the actual data access to variable values. Here reading a
    slice from a netcdf file.
    """

    def __init__(self, filename, varname):
        """Final implementation will also have to take groups into account...
        """
        self.filename = filename
        self.varname = varname

    def read(self, args=slice(None, None, None)):
        """Read a data slice. Args is a tuple of slice objects (e.g.
        numpy.index_exp). The default corresponds to [:], i.e. all data
        will be read.
        """
        with nc.Dataset(self.filename, "r") as f:
            values = f.variables[self.varname][args]
        return values


class Values(object):

    def __init__(self, values=None, reader=None):
        """Initialize Values. You can either pass numerical (or other) values,
        preferrably as numpy array, or a reader instance which will read the
        values on demand. The reader must have a read(args) method, where
        args is a tuple of slices. If no args are given, all data should be
        returned.
        """
        if values is not None:
            self._values = np.array(values)
        self.reader = reader

    def __getattr__(self, name):
        """This is only be called if attribute name is not present.
        Here, the only attribute we care about is _values.
        Self.reader should always be defined.
        This method is necessary to allow access to variable.values without
        a slicing index. If only __getitem__ were defined, one would always
        have to write variable.values[:] in order to make sure that something
        is returned.
        """
        print ">>> in __getattr__, trying to access ", name
        if name == "_values":
            print ">>> calling reader and reading all values..."
            self._values = self.reader.read()
        return self._values

    def __getitem__(self, args):
        print "in __getitem__"
        if not "_values" in self.__dict__:
            values = self.reader.read(args)
            print ">>> read from file. Shape = ", values.shape
            if args == slice(None, None, None):
                self._values = values  # all data read, store in memory
            return values
        else:
            print ">>> read from memory. Shape = ", self._values[args].shape
            return self._values[args]

    def __repr__(self):
        return self._values.__repr__()

    def __str__(self):
        return self._values.__str__()


class var(object):

    def __init__(self, name=VARNAME, filename=FILENAME, values=None):
        self.name = name
        self.values = Values(values, Reader(filename, name))


if __name__ == "__main__":
    # define a variable and access all data first
    # this will read the entire array and save it in memory, so that
    # subsequent access with or without index returns data from memory
    a = var("a", filename=FILENAME)
    print "1: a.values = ", a.values
    print "2: a.values[-1] = ", a.values[-1]
    print "3: a.values = ", a.values
    # define a second variable, where we access a data slice first
    # In this case the Reader only reads the slice and no data are stored
    # in memory. The second access indexes the complete array, so Reader
    # will read everything and the data will be stored in memory.
    # The last access will then use the data from memory.
    b = var("b", filename=FILENAME)
    print "4: b.values[0:3] = ", b.values[0:3]
    print "5: b.values[:] = ", b.values[:]
    print "6: b.values[5:8] = ",b.values[5:8]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM