简体   繁体   English

这是创建 python 元类的好案例吗?

[英]Is this a good case for creating a python metaclass?

I am tasked with refactoring a bunch of old code.我的任务是重构一堆旧代码。 One thing I wanted to fix is that there's a class of objects which has methods for direct database access, even though the objects themselves are representations of Excel worksheets.我想要解决的一件事是,有一类对象具有直接访问数据库的方法,即使对象本身是 Excel 工作表的表示形式。 So, in my mind there should be another class of objects to mediate database access.所以,在我看来,应该有另一类对象来调解数据库访问。

We're using these helper functions called "handlers", which take a list of parameters and return data from the database.我们正在使用这些称为“处理程序”的辅助函数,它获取参数列表并从数据库返回数据。 The ExcelWorksheet class implements a method for running handlers asynchronously, and each subclass uses one or more handlers. ExcelWorksheet 类实现了一种异步运行处理程序的方法,每个子类使用一个或多个处理程序。 This can also be quite hard to follow imo:这也很难遵循 imo:

data_handler = {
    # We are always going to use facebook here so add the arg here
    "metrics": lambda *args: metrics_handler(*["facebook"] + list(args)),
    "top_content": lambda *args: content_handler(*["facebook"] + list(args)),
}

@property
def data_requirements(self):
    return [
        ["metrics", self.start, self.end, [self.mapping_id], [self.category_id], self.metrics],
        ["top_content", self.start, self.end, "total_actions", "user", [self.mapping_id], 10],
        ["top_content", self.start, self.end, "total_actions", "category", [self.category_id], 10],
        ["top_content", self.start, self.end, "comments_count", "user", [self.mapping_id], 5],
        ["top_content", self.start, self.end, "comments_count", "category", [self.category_id], 5],
        ["top_content", self.start, self.end, "shares_count", "user", [self.mapping_id], 5],
        ["top_content", self.start, self.end, "shares_count", "category", [self.category_id], 5],
    ]

@staticmethod
def _call_async(fns):
    """
    Call the fns in a thread pool asynchronously and return a
    generator of results
    """
    futures = []
    with CloseConnectionThreadPoolExecutor(max_workers=4) as executor:
        for indx, f in enumerate(fns):
            LOGGER.warn("Getting data dependency %s out of %s" % (indx + 1, len(fns)))

            future = executor.submit(f)
            futures.append(future)

    return (i.result() for i in futures)

def require_data(self):
    """
    Returns a generator of all the required data specified by the
    worksheets in this workbook in the order specified.

    Calls self.data_handler to get back a function for the given
    key. If the requirement is a list, the first argument is the
    key and subsequent items are passed as arguments to the function
    returned.

    Data requirement format:
    [
        'resource_1',
        'resource_2',
        ['resource_3', 'arg1', 'arg2'],
        ['resource_3', 'arg3', 'arg4']
    ]
    """
    if not self.data_handler:
        raise NotImplementedError("No data_handler specified")

    # For each requirement, route to the data handler
    data_fns = []
    for r in self.data_requirements:
        # A requirement can be specified as a list where the
        # first item is the resource and the rest are
        # arguments to the function to fulfill the requirement
        if isinstance(r, list):
            key = r[0]
            args = r[1:]
            fn = self.data_handler[key](*args)
        else:
            fn = self.data_handler[r]

        data_fns.append(fn)

    return self._call_async(data_fns)

So I came to the conclusion I should create a separate DataSource object.所以我得出结论,我应该创建一个单独的 DataSource 对象。 But another thing I don't like about this code is the way these handlers are called.但我不喜欢这段代码的另一件事是调用这些处理程序的方式。 It seems really complicated and hard to understand.看起来真的很复杂,很难理解。 So I would like to be able to use my DataSource object like this:所以我希望能够像这样使用我的 DataSource 对象:

datasource.add_request_for_metrics(arg1,arg2,arg3...)
datasource.add_request_for_top_content(arg1,arg2,arg3...)
results = datasource.execute_requests()

I could define all these methods by hand, however there are a whole bunch of handlers and there could be even more added in the future.我可以手动定义所有这些方法,但是有一大堆处理程序,将来可能会添加更多。 So I thought about using a metaclass here that would define an "add_request_for_x" method for every given handler.所以我考虑在这里使用一个元类,为每个给定的处理程序定义一个“add_request_for_x”方法。 This is what I came up with so far:到目前为止,这是我想出的:

class DataSourceMeta(type):
"""
This is a metaclass for defining a datasource class
data_handler_mappings - a dict that defines which handlers to use when looking for data. For example:
{"metrics":metrics_handler, "time_series":time_series_handler} - class objects will use metrics_handler
to search for "metrics" data category and time_series_handler to search for "time_series" data category

For every item in the dictionary the metaclass will add a new accordingly named class method for adding a new request.
In this case the methods will be named "add_request_for_metrics" and "add_request_for_time_series" ("add_request_for_{data_category}").
These requests can then be executed by calling the "execute_requests" method, which should return a list of results

Each class using this meta should implement the following interface:
add_request(): adds a request to be executed
_run_requests(): defines how requests are run(ex. sync vs async)
_clean_up(): performs any necessary cleanup after running the "execute_requests" method (clear list of requests?)
"""
def __new__(cls,*args,**kwargs):
    data_handler_mappings=kwargs.get("data_handler_mappings")
    newclass = super(DataSourceMeta,cls).__new__(cls,*args)
    for data_category,data_category_handler in data_handler_mappings.items():
        data_category=data_category.lower()
        method_name = "add_request_for_"+data_category
        def method(self,*args,**kwargs):
            newclass.add_request(self,data_category_handler(*args,**kwargs))
        setattr(newclass,method_name,method)
    def execute_requests(self):
        results = newclass._run_requests(self)
        newclass._clean_up(self)
        return results
    newclass.execute_requests = execute_requests
    return newclass

Is this a good approach?这是一个好方法吗? Is there anything I can do better?有什么我可以做得更好的吗? I am only a junior programmer and this would be my first time using a metaclass for a serious project我只是一名初级程序员,这将是我第一次将元类用于一个严肃的项目

This answer is covering the option where you can avoid a metaclass.这个答案涵盖了可以避免元类的选项。

My understanding from your question is that you want to improve the readability of your code, and also make the code suitable for a production system ("serious project")我对您的问题的理解是您希望提高代码的可读性,并使代码适合生产系统(“严肃的项目”)

In my opinion, this case described can avoid a metaclass, and is better suited to do so in a production system as you can introduce better type safety with non-dynamic classes.在我看来,所描述的这种情况可以避免元类,并且更适合在生产系统中这样做,因为您可以通过非动态类引入更好的类型安全性。 Additionally, I think a dynamic class in your proposed way makes it harder to at a glance understand what is a given "datasource" can request, ie the readability is low.此外,我认为您提出的动态类使一目了然地理解给定的“数据源”可以请求什么变得更加困难,即可读性低。

The main reason I think this is the case, is that you have a dictionary of methods used to fetch data (the data_handlers ).我认为是这种情况的主要原因是您有一个用于获取数据的方法字典( data_handlers )。 My understanding is that these methods are fixed and you are combining them in specific ways for different databases (excel sheets).我的理解是这些方法是固定的,您正在以特定方式将它们组合用于不同的数据库(Excel 工作表)。

My suggestion in this case would be to use a series of mixins.在这种情况下,我的建议是使用一系列 mixin。

There is a limitation for this, from the code snippet provided, if different databases require different methods (ie data handlers need to be customised) to get data this method may not be ideal.这有一个限制,从提供的代码片段来看,如果不同的数据库需要不同的方法(即需要定制数据处理程序)来获取数据,这种方法可能并不理想。 But if it is just a handful of arguments that are specific, they can be added as class variables to the mixin.但如果只是少数特定的参数,则可以将它们作为类变量添加到 mixin 中。

import abc

class BaseDataSourceMixin(abc.ABC):

    @abc.abstractmethod
    def _run_requests(self):
        raise NotImplementedError()
    
    @abc.abstractmethod
    def _clean_up(self):
        """defines how requests are run(ex. sync vs async)"""
        raise NotImplementedError()

    def execute_requests(self):
        """Performs any necessary cleanup after running the "execute_requests" method (clear list of requests?)"""
        results = self._run_requests()
        self._clean_up()
        return results

class RequestMixin(abc.ABC):

    @abc.abstractproperty
    def data_source_name(self) -> str:
        pass

class MetricsRequest(RequestMixin):

    def add_request_for_metrics(self, *args): # can use meaningful variable names
        pass


class TopContentRequest(RequestMixin):

    def add_request_for_top_content(self, *args):
        print(self.data_source_name) # can be used here, and the subclass will define it
        pass

class FacebookDataSource(
    BaseDataSourceMixin,
    # Now when looking at the code we can just see what we are inheriting
    # and know what methods to expect in this function
    MetricsRequest,
    TopContentRequest,
):

    @property
    def data_source_name(self) -> str:
        # This as an example returns a string, but I may suggest using a NamedTuple
        # especially if there is a specific interface you are using
        # that always requires the same number/type of variables
        return "facebook" 

    def _run_requests(self):
        return

    def _clean_up(self):
        return


datasource =  FacebookDataSource()
datasource.add_request_for_metrics(arg1,arg2,arg3...)
datasource.add_request_for_top_content(arg1,arg2,arg3...)
results = datasource.execute_requests() 

Using it in this way we can easily get docstrings from the inherited methods, that can be exposed to our IDE, making is easier as a developer looking/understanding the code:以这种方式使用它,我们可以轻松地从继承的方法中获取文档字符串,这些方法可以暴露给我们的 IDE,从而使开发人员更容易查看/理解代码: vscode显示继承方法的工具提示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM