[英]python sqlalchemy distinct column values
I have 6 tables in my SQLite database, each table with 6 columns( Date, user, NormalA, specialA, contact, remarks
) and 1000+ rows. 我的SQLite数据库中有6个表,每个表有6列( Date, user, NormalA, specialA, contact, remarks
)和1000多行。
How can I use sqlalchemy to sort through the Date column to look for duplicate dates, and delete that row? 如何使用sqlalchemy对Date列进行排序以查找重复日期,并删除该行?
Assuming this is your model: 假设这是你的模型:
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key=True)
date = Column(DateTime)
user = Column(String)
# do not really care of columns other than `id` and `date`
# important here is the fact that `id` is a PK
following are two ways to delete you data: 以下是两种删除数据的方法:
For both of them a helper sub-query will be used: 对于它们两者,将使用辅助子查询:
# helper subquery: find first row (by primary key) for each unique date
subq = (
session.query(MyTable.date, func.min(MyTable.id).label("min_id"))
.group_by(MyTable.date)
) .subquery('date_min_id')
Option-1: Find duplicates, mark them for deletion and commit the transaction 选项-1:查找重复项,将其标记为删除并提交事务
# query to find all duplicates
q_duplicates = (
session
.query(MyTable)
.join(subq, and_(
MyTable.date == subq.c.date,
MyTable.id != subq.c.min_id)
)
)
for x in q_duplicates:
print("Will delete %s" % x)
session.delete(x)
session.commit()
Option-2: Create a single SQL query which will perform deletion on the database directly 选项2:创建单个SQL查询,直接在数据库上执行删除
sq = (
session
.query(MyTable.id)
.join(subq, and_(
MyTable.date == subq.c.date,
MyTable.id != subq.c.min_id)
)
).subquery("subq")
dq = (
session
.query(MyTable)
.filter(MyTable.id.in_(sq))
).delete(synchronize_session=False)
Inspired by the Find duplicate values in SQL table this might help you to select duplicate dates: 受SQL表中的查找重复值的启发,这可能有助于您选择重复日期:
query = session.query(
MyTable
).\
having(func.count(MyTable.date) > 1).\
group_by(MyTable.date).all()
If you only want to show unique dates; 如果您只想显示独特的日期; distinct on
is what you might need distinct on
是你可能需要的东西
While I like the whole object oriented approache with SQLAlchemy, sometimes I find it easier to directly use some SQL. 虽然我喜欢使用SQLAlchemy的整个面向对象的方法,但有时我发现直接使用某些SQL更容易。 And since the records don't have a key, we need the row number ( _ROWID_
) to delete the targeted records and I don't think the API provides it. 由于记录没有密钥,我们需要行号( _ROWID_
)来删除目标记录,我认为API并不提供。
So first we connect to the database: 首先我们连接到数据库:
from sqlalchemy import create_engine
db = create_engine(r'sqlite:///C:\temp\example.db')
eng = db.engine
Then to list all the records: 然后列出所有记录:
for row in eng.execute("SELECT * FROM TableA;") :
print row
And to display all the duplicated records where the dates are identical: 并显示日期相同的所有重复记录:
for row in eng.execute("""
SELECT * FROM {table}
WHERE {field} IN (SELECT {field} FROM {table} GROUP BY {field} HAVING COUNT(*) > 1)
ORDER BY {field};
""".format(table="TableA", field="Date")) :
print row
Now that we identified all the duplicates, they probably need to be fixed if the other fields are different: 现在我们确定了所有重复项,如果其他字段不同,则可能需要修复它们:
eng.execute("UPDATE TableA SET NormalA=18, specialA=20 WHERE Date = '2016-18-12' ;");
eng.execute("UPDATE TableA SET NormalA=4, specialA=8 WHERE Date = '2015-18-12' ;");
And finnally to keep the first inserted record and delete the most recent duplicated records : 并最终保留第一个插入的记录并删除最新的重复记录:
print eng.execute("""
DELETE FROM {table}
WHERE _ROWID_ NOT IN (SELECT MIN(_ROWID_) FROM {table} GROUP BY {field});
""".format(table="TableA", field="Date")).rowcount
Or to keep the last inserted record and delete the other duplicated records : 或者保留最后插入的记录并删除其他重复记录:
print eng.execute("""
DELETE FROM {table}
WHERE _ROWID_ NOT IN (SELECT MAX(_ROWID_) FROM {table} GROUP BY {field});
""".format(table="TableA", field="Date")).rowcount
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.