Create a smaller database from a larger one in SQLAlchemy (sqlite)

Question

I want to create a smaller db starting from a larger one that I have been given, and I would like to do it all in python with sqlalchemy.

Here's what I have so far: I'm querying the original db and performing a join on two tables (books and anatit) to filter out the data (there are many more tables in the original db that I do not need). I also want the market_trades table to be included in the smaller db.

engine = create_engine('sqlite:///D:/backtest_dbs/20191016_MIT.db')
conn=engine.connect()
metadata= MetaData()
books=Table('books',metadata, autoload=True, autoload_with=engine)
anatit=Table('anatit',metadata, autoload=True, autoload_with=engine)
market_trades=Table('market_trades',metadata, autoload=True, autoload_with=engine)
stmt=select([books,anatit.columns.CS,anatit.columns.isin,anatit.columns.desc])
stmt=stmt.select_from(anatit.join(books,anatit.columns.CS==books.columns.CS)).where(anatit.columns.desc.like('%BTP%'))
result_proxy=conn.execute(stmt)

After I execute the statement, however, I'm not sure how to proceed with the ResultProxy. The size is about 2.5 million rows, so I'm not sure about a for loop with .insert() . I first create another engine that creates the smaller db in another folder. What is the best (in this case, `best' means pythonic/efficient) way of creating the tables, and can we do it starting from a ResultProxy without having to populate them after? And what about the market_trades table, can I use autoload_with=engine to add it to enigne_small ?

engine_small = create_engine('sqlite:///D:/backtest_dbs/small/20191016_MIT_small.db')
conn=engine_small.connect()
books=Table('books',metadata,...) # What goes in here?
anatit=Table('anatit',metadata,... )
market_trades=Table('market_trades',...) # Can I use autoload_with=engine, the larger db?
metadata.create_all(engine_small)

I know that there are also other ways that do not use SQLAlchemy, but I thought this would be a good exercise and example, and besides I'm doing everything in python in this particular project and I'd like it to stay that way. However, if someone thinks there are waaaay better solutions that avoid using SQLAlchemy altogether, I'm happy to listen.

Answer 1

For those interested, you can use pandas (especially if, such as in my case, you also want to directly access the data for analysis) when you have order 10^6 data points, it takes a few seconds.

    engine = create_engine('sqlite:///D:/backtest_dbs/20191016_MIT.db')
    conn=engine.connect()
    metadata= MetaData()
    books=Table('books',metadata, autoload=True, autoload_with=engine)
    anatit=Table('anatit',metadata, autoload=True, autoload_with=engine)
    market_trades=Table('market_trades',metadata, autoload=True, autoload_with=engine)
    stmt=select([books,anatit.columns.CS,anatit.columns.isin,anatit.columns.desc])
    stmt=stmt.select_from(anatit.join(books,anatit.columns.CS==books.columns.CS)).where(anatit.columns.desc.like('%BTP%'))
    result_proxy=conn.execute(stmt)
    db_small=pd.DataFrame(result_proxy.fetchall())
    stmt2=select([market_trades])
    result_proxy2=conn.execute(stmt2)
    trades_df=pd.DataFrame(result_proxy2.fetchall())    
    engine_small = create_engine('sqlite:///D:/backtest_dbs/small/20191016_MIT_small.db', echo=True)
    trades_df.to_sql('market_trades', con=engine_small)
    db_small.to_sql('books', con=engine_small)
    result_proxy.close()
    result_proxy2.close()
    conn.close()

The nice part here is the .to_sql() method, of which I didn't know about. You also have access to the small db as a dataframe, so you can modify/analyze if needed.

Would like to know more if someone wants to add things (see last part of original question).

Create a smaller database from a larger one in SQLAlchemy (sqlite)

Question

1 answers

solution1
0 2020-01-17 13:46:15

Create a smaller database from a larger one in SQLAlchemy (sqlite)

Question

1 answers

solution1 0 2020-01-17 13:46:15

solution1
0 2020-01-17 13:46:15