Pandas: Reading a view to a dataframe

Question

Here's my scenario:

I get csv's with columns(Accountnumber, name, age, address, etc).
I read the csv's to a pandas dataframe df1.
I then look up a view on SQL server and match the accountnumber to get the key.
I read this to a new dataframe df2 and write the results to a sql table.

Question: If I'm passing too many records to the SQL view join, it may impact the performance. I want to avoid the python code slowing down the SQL server performance. Is there any other way to solve this?

Thank you in advance.

Answer 1

Consider a staging temp table that you dump into from Pandas each time. Then run an insert-select append query matching accountnumber in the view. All matching and appending process is run on server. Below is a sketch example with sqlalchemy :

# PANDAS DUMP TO TABLE, REPLACING EACH TIME
my_df.to_sql(name="raw_df_tmp", con=engine, if_exists="replace", index=False)

# SQL INSERT-SELECT (VIA TRANSACTION)
with engine.begin() as conn:     

    sql = """INSERT INTO my_table (Col1, Col2, Col3, ...)
             SELECT Col1, Col2, Col3, ...
             FROM raw_df_tmp r           
             WHERE r.accountnumber IN
                (SELECT accountnumber FROM my_view)
          """
    conn.execute(sql)

engine.dispose()

Pandas: Reading a view to a dataframe

Question

1 answers

solution1
0 ACCPTED 2020-09-29 14:58:27

Pandas: Reading a view to a dataframe

Question

1 answers

solution1 0 ACCPTED 2020-09-29 14:58:27

solution1
0 ACCPTED 2020-09-29 14:58:27