I'm working on a module that allows a user to create instances of SQLAlchemy's URL
objects specifically for connecting to MS SQL Server via pyodbc. The module needs to expose a convenient API where URL
s can be created by specifying either hostname, port and database, or a DSN, or by passing a raw ODBC connection string. The string representations of those URL
s would therefore look like the following, where the database and driver are already specified and the rest is up to the user:
"mssql+pyodbc://<username>:<password>@<host>:<port>/<database>?driver=<odbc-driver>"
"mssql+pyodbc://<username>:<password>@<dsn>"
"mssql+pyodbc://<username>:<password>@?odbc_connect=<connection-string>"
Now this seems like a good use case for the factory pattern, whereby I create a separate method/function (eg from_hostname
, from_dsn
, from_connection_string
) for each of the different ways to create a URL
. But I can think of four different implementations of that pattern, and I'm wondering which one to prefer.
(Side notes: You'll notice below, that I instantiate URL
s via the class factory method URL.create
. This is because the SQLAlchemy developers would like to keep users from instantiating URL
s via direct calls to the default constructor . Also, for simplicity's sake I'm ignoring all sorts of other useful parameters that the methods/functions should accept, such as for authentication.)
I subclass URL
adding the value for the drivername
argument of URL.create
as a class attribute/constant. Then I add my class methods.
from sqlalchemy.engine import URL
class MyURL(URL):
_DRIVERNAME = "mssql+pyodbc"
@classmethod
def from_hostname(cls, host, port, database):
parts = {
"drivername": MyURL._DRIVERNAME,
"host": host,
"port": port,
"database": database,
"query": {"driver": "ODBC Driver 17 or SQL Server"}
}
return super().create(**parts)
@classmethod
def from_dsn(cls, dsn):
parts = {
"drivername": MyURL._DRIVERNAME,
"host": dsn
}
return super().create(**parts)
@classmethod
def from_connection_string(cls, connection_string):
parts = {
"drivername": MyURL._DRIVERNAME,
"query": {"odbc_connect": connection_string}
}
return super().create(**parts)
Usage:
MyURL.from_hostname('host', 1234, 'db')
MyURL.from_dsn('my-dsn')
MyURL.from_connection_string('Server=MyServer;Database=MyDatabase')
MyURL
would of course inherit all methods from its parent, including MyURL.create
which allows for the instantiation of all sorts of URL
s (including non-SQL-Server ones), or MyURL.set
which allows to modify the URL
, including the drivername
part. This goes against the intention of the MyURL
class, which exists specifically to provide a few convenient methods to create URL
s for SQL Server via pyodbc only. Also, since now all those parent methods are exposed by my module I would feel obligated to document them for the user which leads to a lot of redundant documentation (I suppose I could just refer to SQLAlchemy's documentation for all other methods and attributes, or something). But the bottom-line is, all of this is somewhat undesirable.
Could it be that a parent-child relationship between URL
and MyURL
is actually not the right choice here, ie since it turns out we're not even interested in inheriting from URL
in the first place, is MyURL
semantically not a child of URL
?
The implementation of delegation is almost identical to inheritance, except we obviously remove the parent class from MyURL
and replace the call to super
with the class name.
from sqlalchemy.engine import URL
class MyURL:
_DRIVERNAME = "mssql+pyodbc"
@classmethod
def from_hostname(cls, host, port, database):
parts = {
"drivername": MyURL._DRIVERNAME,
"host": host,
"port": port,
"database": database,
"query": {"driver": "ODBC Driver 17 or SQL Server"}
}
return URL.create(**parts)
@classmethod
def from_dsn(cls, dsn):
parts = {
"drivername": MyURL._DRIVERNAME,
"host": dsn
}
return URL.create(**parts)
@classmethod
def from_connection_string(cls, connection_string):
parts = {
"drivername": MyURL._DRIVERNAME,
"query": {"odbc_connect": connection_string}
}
return URL.create(**parts)
Usage:
MyURL.from_hostname('host', 1234, 'db')
MyURL.from_dsn('my-dsn')
MyURL.from_connection_string('Server=MyServer;Database=MyDatabase')
This approach leaves MyURL
without all the baggage from URL
, and it doesn't imply a parent-child relationship. But it doesn't necessarily feel right either.
Is it overkill to create a class that does absolutely nothing other than encapsulate a few factory methods? Or maybe this is an anti-pattern, because we create a class MyURL
even though there is not much use for instances of type MyURL
(after all, we're only looking to create instances of URL
)?
This is a pattern along the lines of SQLAlchemy's own make_url
factory function (which is essentially a wrapper around URL.create
). I can think of two ways to implement it.
The implementation of this one is pretty straightforward. It's again almost identical to inheritance and delegation, except of course the functions and attributes aren't wrapped in a class.
from sqlalchemy import URL
_DRIVERNAME = "mssql+pyodbc"
def url_from_hostname(host, port, database):
parts = {
"drivername": _DRIVERNAME,
"host": host,
"port": port,
"database": database,
"query": {"driver": "ODBC Driver 17 or SQL Server"}
}
return URL.create(**parts)
def url_from_dsn(dsn):
parts = {
"drivername": _DRIVERNAME,
"host": dsn
}
return URL.create(**parts)
def url_from_connection_string(connection_string):
parts = {
"drivername": _DRIVERNAME,
"query": {"odbc_connect": connection_string}
}
return URL.create(**parts)
Usage:
url_from_hostname('host', 1234, 'db')
url_from_dsn('my-dsn')
url_from_connection_string('Server=MyServer;Database=MyDatabase')
Does this create a somewhat "cluttered" module API? Is it again an anti-pattern to create a module API with separate functions that all do sort of the same thing, however? Shouldn't there be something that "connects" or "encapsulates" those clearly related functions (such as a class...)?
Trying to encapsulate all the different ways to create URL
s by a single function means that certain parameters are mutually exclusive ( host
, port
and database
vs dsn
vs connection_string
). This makes the implementation a little more involved. Users will almost certainly make mistakes despite all documentation efforts, so one would probably want to validate the supplied function arguments and raise an exception if the combination of arguments doesn't make any sense. Decorators, as suggested here and here , seem like an elegant way to do that. Of course, the if
- elif
logic in the url
function could also be extended to do all that, so this is really just one (and probably not the best) possible implementation.
from functools import wraps
from sqlalchemy import URL
_DRIVERNAME = "mssql+pyodbc"
class MutuallyExclusiveError(Exception):
pass
def mutually_exclusive(*args, **kwargs):
excl_args = args
def inner(f):
@wraps(f)
def wrapper(*args, **kwargs):
counter = 0
for ea in excl_args:
if any(key in kwargs for key in ea):
counter += 1
if counter > 1:
raise MutuallyExclusiveError
return f(*args, **kwargs)
return wrapper
return inner
@mutually_exclusive(
["host", "port", "database"],
["dsn"],
["connection_string"]
)
def url(host=None, port=None, database=None, dsn=None, connection_string=None):
parts = {
"drivername": _DRIVERNAME,
"host": host or dsn,
"port": port,
"database": database
}
if host:
parts["query"] = {"driver": "ODBC Driver 17 or SQL Server"}
elif connection_string:
parts["query"] = {"odbc_connect": connection_string}
return URL.create(**parts)
Usage:
url(host='host', port=1234, database='db')
url(dsn='my-dsn')
url(connection_string='Server=MyServer;Database=MyDatabase')
If the user passes positional rather than keyword arguments they will completely bypass our validation, though, so that's an issue. Moreover, using positional arguments in an efficient manner isn't even really possible for DSNs and connection strings, unless one does something weird like url(None, None, None, 'my-dsn')
. One solution would be to disable positional arguments altogether by changing the function definition to def url(*, host=None, ...):
, thereby essentially discarding positional arguments. All of the above also doesn't quite feel right.
Is it bad practice when a function won't accept positional arguments? Is the whole concept of validating input not somewhat "un-pythonic", or does this merely refer to things like type checking? Is this generally just trying to force too much into a single function?
Any thoughts on all or part of the above (specifically questions raised in italics ) would be very much appreciated.
Thanks!
I'll attempt to answer my own question as best I can. Let me first look at implementing the factory pattern via classes.
The two terms subclassing and subtyping are often mentioned within the context of inheritance. While the former implies a syntactic relationship through implementation reuse (implementation inheritance), the latter implies a semantic "is-a" relationship (interface inheritance). The two concepts are often conflated in Python, but when I ask whether MyURL
objects are or aren't URL
objects, I'm referring to the semantic relationship between the two.
Of course, when I subclass URL
in my code example above, I am indeed creating a subtype of URL
that satisfies the Liskov Substitution Principle (LSP) : I added a few methods (ie I specialized URL
), but I can still pass MyURL
instances to SQLAlchemy's create_engine
function and nothing breaks. That's because MyURL
implements the complete interface of its (generalized) superclass.
What I am really trying to achieve, though, is for MyURL
to not only add those few methods and attributes but to also only possess (or expose) a subset of the methods of its superclass, in an attempt to disable ways to create URL strings that would be incompatible with SQL Server. Others have asked about removing superclass methods in subclasses (see here and here for example), but doing so will violate the LSP as well as the "is-a" relationship between the two classes.
So I suppose that inheritance through subclassing is in fact not what I should be doing here.
Delegation is another example of implementation reuse whereby not a class "blueprint" is shared but an instance of a class. It's therefore more of a "has-a" relationship. Specifically, in my code example I'm doing an implicit delegation, since I'm not passing URL
or an instance thereof as a parameter to MyURL
's methods. URL.create
is a class method, so I can access it directly. In fact, since SQLALchemy's URL
s are themselves a subclass of (immutable) tuples I wouldn't even be able to create my specialized version after instantiating them.
Some of my confusion about what point there is to having instances of MyURL
stemmed from the fact, that I was still to much focused on a "is-a" relationship. Realizing that this is not the case makes it clearer that what MyURL
actually is is a factory class to create URLs. I could rename it to MyURLFactory
to make that distinction clearer.
I could even remove the @classmethod
decorators. To use MyURL
I would then have to instantiate it before use (although I'm not sure what the benefit of that would be):
my_url_factory = MyURLFactory()
my_url_factory.from_hostname('host', 1234, 'db')
Looking at it from this angle makes me think that this might be a good way to go about my problem. But let's also revisit the module-level factory functions.
One problem I have with this solution is that the factory functions are all very closely related and do pretty much the same thing. There's a potential for code repetition. Of course, I could avoid that by moving shared code into a private function:
_DRIVERNAME = "mssql+pyodbc"
def _make_parts_dict(*args, **kwargs):
return dict(kwargs, drivername=_DRIVERNAME)
def url_from_hostname(host, port, database):
parts = _make_parts_dict(
host=host,
port=port,
database=database,
query={"driver": "ODBC Driver 17 or SQL Server"}
)
return URL.create(**parts)
def url_from_dsn(dsn):
parts = _make_parts_dict(host=dsn)
return URL.create(**parts)
def url_from_connection_string(connection_string):
parts = _make_parts_dict(query={"odbc_connect": connection_string})
return URL.create(**parts)
The resulting URLs would be the same. Nevertheless, this still leaves me with the "cluttered" module API - but then again, implementation 2 would leave me with the same "cluttered" class API...
I could also combine 2 and 3.A by adding a _make_parts_dict
class method to the MyURLFactory
class.
I don't have too many comments on this implementation. It's probably doable, but I think 2 or (less preferred) 3.A would be much simpler to implement and maintain. The complexity of taking care of the different mutually exclusive keyword arguments doesn't seem justified considering I'm just trying to create a few URL
s. Also the lack of decent support for positional arguments bothers me.
1 A potential hack would be to keep all the references to the drivername
parameter in all the inherited methods, but change all the implementations to simply ignore it or pull the value from a class attribute.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.