sqlalchemy how to generate (many-to-many) relationships with automap_base

Question

As a background: I'm creating an ORM based on a schema of an already existing database. - This due to the fact that the python application won't be the "owner" of said database.

Now in this database there is a table called " task " and a table called " task_notBefore__task_relatedTasks " - this latter is a many-to-many relation between different entries in the " task " table.

now automap_base() has an automated detection of these relationships as described here . However this fails for my case, and no relationship is being build.

I then try to manually create the relationship:

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.ext.automap import generate_relationship
from sqlalchemy.orm import sessionmaker, interfaces, relationship
from sqlalchemy import create_engine

class DBConnection:
    def __init__(self, connection_url, **kwargs):
        self.engine = create_engine(connection_url, **kwargs)
        self._Base = automap_base()

        self._Base.prepare(self.engine, reflect=True)

        self.Task = self._Base.classes.task
        self.Order = self._Base.classes.order
        self.Poller = self._Base.classes.poller

        rel = generate_relationship(self._Base, interfaces.MANYTOMANY, relationship, 'related', self.Task, self.Task,
                                    secondary=self._Base.classes.task_notBefore__task_relatedTasks, backref='notBefore')

        self._Session = sessionmaker()
        self._Session.configure(bind=self.engine)

        self.session = self._Session()

However this still doesn't "do" anything: it doesn't add anything to the self.Task "class".

How would one do this?

Answer 1

The primary problem in this case is not just the many-to-many relationship, but the fact that it's a self-referential, many-to-many relationship. Because automap is simply translating the mapped class names to relationship names, it constructs the same name, eg task_collection , for both directions of the relationship, and the naming collision generates the error. This shortcoming of automap feels significant in that self-referential, many-to-many relationships are not uncommon.

Explicitly adding the relationships you want, using your own names, won't solve the problem because automap will still try to create the task_collection relationships. To deal with this issue, we need to override task_collection .

If you're okay with keeping the name task_collection for the forward direction of the relationship, we can simply pre-define the relationship--specifying whatever name we want for the backref . If automap finds the expected property already in place, it will assume the relationship is being overridden and not try to add it.

Here's a stripped down example, along with the an sqlite database for testing.

Sqlite Database

CREATE TABLE task (
    id INTEGER, 
    name VARCHAR,
    PRIMARY KEY (id)
);

CREATE TABLE task_task (
    tid1 INTEGER,
    tid2 INTEGER,
    FOREIGN KEY(tid1) REFERENCES task(id),
    FOREIGN KEY(tid2) REFERENCES task(id)
);

-- Some sample data
INSERT INTO task VALUES (0, 'task_0');
INSERT INTO task VALUES (1, 'task_1');
INSERT INTO task VALUES (2, 'task_2');
INSERT INTO task VALUES (3, 'task_3');
INSERT INTO task VALUES (4, 'task_4');

INSERT INTO task_task VALUES (0, 1);
INSERT INTO task_task VALUES (0, 2);

INSERT INTO task_task VALUES (2, 4);
INSERT INTO task_task VALUES (3, 4);

INSERT INTO task_task VALUES (3, 0);

Putting it into a file called setup_self.sql , we can do:

sqlite3 self.db < setup_self.sql

Python Code

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine

from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base

DeclBase = declarative_base()

task_task = Table('task_task', DeclBase.metadata,
                  Column('tid1', Integer, ForeignKey('task.id')),
                  Column('tid2', Integer, ForeignKey('task.id')))

Base = automap_base(DeclBase)

class Task(Base):
    __tablename__ = 'task'

    task_collection = relationship('Task', 
                                   secondary=task_task, 
                                   primaryjoin='Task.id==task_task.c.tid1',
                                   secondaryjoin='Task.id==task_task.c.tid2',
                                   backref='backward')

engine = create_engine("sqlite:///self.db")

Base.prepare(engine, reflect=True)

session = Session(engine)

task_0 = session.query(Task).filter_by(name ='task_0').first()
task_4 = session.query(Task).filter_by(name ='task_4').first()

print("task_0.task_collection = {}".format([x.name for x in task_0.task_collection]))
print("task_4.backward        = {}".format([x.name for x in task_4.backward]))

Results

task_0.task_collection = ['task_1', 'task_2']
task_4.backward        = ['task_2', 'task_3']

Using a Different Name

If you want to have a name other than task_collection , you need to use automap 's function for overriding collection-relationship names:

name_for_collection_relationship(base, local_cls, referred_cls, constraint)

The arguments local_cls and referred_cls are instances of the mapped table classes. For a self-referential, many-to-many relationship, these are both the same class. We can use the arguments to build a key that allows us to identify overrides.

Here is an example implementation of this approach.

from sqlalchemy.ext.automap import automap_base, name_for_collection_relationship
from sqlalchemy.orm import Session
from sqlalchemy import create_engine

from sqlalchemy import Table, Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base


DeclBase = declarative_base()

task_task = Table('task_task', DeclBase.metadata,
                  Column('tid1', Integer, ForeignKey('task.id')),
                  Column('tid2', Integer, ForeignKey('task.id')))

Base = automap_base(DeclBase)

class Task(Base):
    __tablename__ = 'task'

    forward = relationship('Task', 
                           secondary=task_task, 
                           primaryjoin='Task.id==task_task.c.tid1',
                           secondaryjoin='Task.id==task_task.c.tid2',
                           backref='backward')


# A dictionary that maps relationship keys to a method name
OVERRIDES = {
    'Task_Task' : 'forward'
    }

def _name_for_collection_relationship(base, local_cls, referred_cls, constraint):

    # Build the key
    key = '{}_{}'.format(local_cls.__name__, referred_cls.__name__)

    # Did we have an override name?
    if key in OVERRIDES:
        # Yes, return it
        return OVERRIDES[key]

    # Default to the standard automap function
    return name_for_collection_relationship(base, local_cls, referred_cls, constraint)


engine = create_engine("sqlite:///self.db")

Base.prepare(engine, reflect=True, name_for_collection_relationship=_name_for_collection_relationship)

Note that the overriding of name_for_collection_relationship simply changes the name that automap uses for the relationship. In our case, the relationship is still being pre-defined by Task . But, the override tells automap to look for forward instead of task_collection , which it finds and therefore discontinues defining the relationship.

Other Approaches Considered

Under some circumstances, it would be nice if we could override the relationship names without having to pre-define the actual relationship. On first consideration, this should be possible using name_for_collection_relationship . However, I could not get this approach to work for self-referential, many-to-many relationships, due to a combination of two reasons.

name_for_collection_relationship and the related generate_relationship are called twice, once for each direction of the many-to-many relationship. In both cases, local_cls and referred_cls are the same, because of the self-referentiality. Moreover, the other arguments of name_for_collection_relationship are effectively equivalent. Therefore, we cannot, from the context of the function call, determine which direction we are overriding.
Here is the even-more surprising part of the problem. It appears we cannot even count on one direction happening before the other. In other words, the two calls to name_for_collection_relationship and generate_relationship are very similar. The argument that actually determines the directionality of the relationship is constraint , which is one of the two foreign-key constraints for the relationship; these constraints are loaded, from Base.metadata , into a variable called m2m_const . Herein lies the problem. The order that the constraints end up in m2m_const is nondeterministic, ie sometimes it will be one order; other times it will be the opposite (at least when using sqlite3 ). Because of this, the directionality of the relationship is nondeterministic.

On the other hand, when we pre-define the relationship, the following arguments create the necessary determinism.

primaryjoin='Task.id==task_task.c.tid1',
secondaryjoin='Task.id==task_task.c.tid2',

Of particular note, I actually tried to create a solution that simply overrode the relationship names without pre-defining it. It exhibited the described nondeterminism.

Final Thoughts

If you have a reasonable number of database tables that do not change often, I would suggest just using Declarative Base . It might be a little more work to set up, but it gives you more control.

sqlalchemy how to generate (many-to-many) relationships with automap_base

Question

1 answers

solution1
1 2018-03-21 06:08:27

Sqlite Database

Python Code

Results

Using a Different Name

Other Approaches Considered

Final Thoughts

sqlalchemy how to generate (many-to-many) relationships with automap_base

Question

1 answers

solution1 1 2018-03-21 06:08:27

Sqlite Database

Python Code

Results

Using a Different Name

Other Approaches Considered

Final Thoughts

solution1
1 2018-03-21 06:08:27