[英]SQLAlchemy join two models and select top row
Currently I have two tables: Server and Scan. 当前,我有两个表:Server和Scan。
It is possible to have one server to many scans (one to many relationship). 一台服务器可以进行多次扫描(一对多关系)。
What I am trying to achieve is to select a Server and then only the first Scan associated to that Server. 我试图实现的是选择一个服务器,然后仅选择与该服务器关联的第一个扫描。 The following query:
以下查询:
query = db.session.query(models.Server, models.Scan).outerjoin(models.Server.scans).all()
outputs: 输出:
(<Server u'Testing'>, <Scan u'bbd4f805-3966-d464-b2d1-0079eb89d69708c3a05ec2812bcf'>)
(<Server u'Testing'>, <Scan u'bbd4f805-3966-d464-b2d1-0079eb89d69708c3a05ec2812bcf'>)
(<Server u'Testing'>, <Scan u'testscan'>)
(<Server u'fasd'>, <Scan u'testscan'>)
(<Server u'fdaafas'>, None)
whereas I only want one " Testing
" Server and the most recent Scan. 而我只想要一台“
Testing
”服务器和最新的扫描。
ADDITIONAL 额外
When I loop through my query like so: 当我像这样循环查询时:
for a in query:
print a, a.scans.all()
The output is: 输出为:
<Server u'Testing'> [<Scan u'testscan'>, <Scan u'bbd4f805-3966-d464-b2d1-0079eb89d69708c3a05ec2812bcf'>, <Scan u'bbd4f805-3966-d464-b2d1-0079eb89d69708c3a05ec2812bcf'>]
<Server u'fasd'> [<Scan u'testscan'>]
<Server u'fdaafas'> []
The output I want should equal: 我想要的输出应该等于:
<Server u'Testing'> [<Scan u'bbd4f805-3966-d464-b2d1-0079eb89d69708c3a05ec2812bcf'>]
<Server u'fasd'> [<Scan u'testscan'>]
<Server u'fdaafas'> []
You would need to add a subquery in which you select the Scan
register you want to show, using some criteria. 您将需要添加一个子查询,在其中使用某些条件选择要显示的
Scan
寄存器。 For the toy example below I assume you want the maximum value of some parameter. 对于下面的玩具示例,我假设您想要某个参数的最大值。
I've created tables A
and B
; 我已经创建了表
A
和B
; A
corresponds to Server
and B
to Scan
. A
对应于Server
, B
对应于Scan
。
In [2]:
class A(Base):
__tablename__ = 'A'
pk = Column('pk', Integer, primary_key=True)
name = Column('name', String)
class B(Base):
__tablename__ = 'B'
pk = Column('pk', Integer, primary_key=True)
fk = Column('fk', Integer, ForeignKey('A.pk'))
attr = Column('attr', Integer)
a = relationship("A", backref='B')
Inserted some data, 插入了一些数据,
In [10]:
q = session.query(B)
print(q)
for x in q.all():
print(x.pk, x.fk, x.attr)
q = session.query(A)
print(q)
for x in q.all():
print(x.pk, x.name)
SELECT "B".pk AS "B_pk", "B".fk AS "B_fk", "B".attr AS "B_attr"
FROM "B"
1 1 1
2 1 2
3 2 0
4 2 4
5 1 4
SELECT "A".pk AS "A_pk", "A".name AS "A_name"
FROM "A"
1 one
2 two
And solved your problem adding a subquery that selects the maximum value of B.attr
for every B.fk
, ie for every A.pk
. 并解决您的问题,并表示选择的最大值的子查询
B.attr
为每B.fk
,即每一个A.pk
。 (In your example it would be the maximum Scan.attr
for every Server
.) (在您的示例中,这将是每个
Server
的最大Scan.attr
。)
In [13]:
from sqlalchemy import func
from sqlalchemy import tuple_
s = session.query(func.max(B.attr), B.fk).group_by(B.fk)
print(s)
q = session.query(A, B).outerjoin(B).filter(tuple_(B.attr, B.fk).in_(s))
print(q)
for x in q.all():
print(x.A.pk, x.A.name, x.B.pk, x.B.attr)
SELECT max("B".attr) AS max_1, "B".fk AS "B_fk"
FROM "B" GROUP BY "B".fk
SELECT "A".pk AS "A_pk", "A".name AS "A_name", "B".pk AS "B_pk", "B".fk AS "B_fk", "B".attr AS "B_attr"
FROM "A" LEFT OUTER JOIN "B" ON "A".pk = "B".fk
WHERE ("B".attr, "B".fk) IN (SELECT max("B".attr) AS max_1, "B".fk AS "B_fk"
FROM "B" GROUP BY "B".fk)
2 two 4 4
1 one 5 4
NOTE: you don't mention which database you are using, but just in case, please note that the in_
statement with multiple columns does not work in sqlite
(which is quite annoying when you try it). 注意:您没有提到要使用哪个数据库,但是以防万一,请注意,具有多列的
in_
语句在sqlite
不起作用(尝试时非常烦人)。 But if you used one column only, something like, 但是,如果仅使用一列,则类似
s = session.query(func.max(B.attr)).group_by(B.fk)
q = session.query(A, B).outerjoin(B).filter(B.attr.in_(s))
however depending on your data you could get more than one B for each A (eg B.fk
=1 has max( B.attr
)=3, and B.fk
=2 has max( B.attr
)=4 but also a B.attr
=3, you would get for B.fk
=2 both B.attr
=3 and B.attr
=4. 但是,根据您的数据,每个A可能会获得不止一个B(例如
B.fk
= 1的max( B.attr
)= 3,而B.fk
= 2的max( B.attr
)= 4但也有一个B.attr
= 3,你会得到B.fk
= 2都B.attr
= 3和B.attr
= 4。
However if the attribute you are using to select the maximum was unique, it would be fine. 但是,如果用于选择最大值的属性是唯一的,那就没问题了。 Anyway if you are with a database like
postgres
or oracle
you can use the in_
with multiple columns. 无论如何,如果您使用的是
postgres
或oracle
等数据库,则可以将in_
与多列一起使用。
Hope it helps. 希望能帮助到你。
EDIT added after comments: If you want to get also the Servers
without a Scan
, you just need to add an or_
to your query. 注释后添加了EDIT:如果您还想获取没有
Scan
的Servers
,则只需在查询中添加or_
。
In [18]:
from sqlalchemy import func
from sqlalchemy import tuple_
from sqlalchemy import or_
s = session.query(func.max(B.attr), B.fk).group_by(B.fk)
q = session.query(A, B).outerjoin(B).filter(or_(tuple_(B.attr, B.fk).in_(s), B.fk==None))
print(q)
for x in q.all():
if x.B:
print(x.A.pk, x.A.name, x.B.pk, x.B.attr)
else:
print(x.A.pk, x.A.name)
SELECT "A".pk AS "A_pk", "A".name AS "A_name", "B".pk AS "B_pk", "B".fk AS "B_fk", "B".attr AS "B_attr"
FROM "A" LEFT OUTER JOIN "B" ON "A".pk = "B".fk
WHERE ("B".attr, "B".fk) IN (SELECT max("B".attr) AS max_1, "B".fk AS "B_fk"
FROM "B" GROUP BY "B".fk) OR "B".fk IS NULL
2 two 4 4
1 one 5 4
3 three
As you see, you have to be careful with nulls. 如您所见,您必须谨慎使用null。 Note that
outerjoin
already performs a left join
, which is what you needed, but because of the filter
, you have to explicitly say that you want the null rows also. 请注意,
outerjoin
已经执行了left join
,这是您所需要的,但是由于有filter
,您必须明确地说您也想要空行。 As usual, A
is Server
and B
is Scan
. 像往常一样,
A
是Server
, B
是Scan
。 Sorry for not using your table names, it makes it much more difficult to read. 很抱歉没有使用您的表名,这使它的读取更加困难。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.