简体   繁体   中英

Pandas: Reading a view to a dataframe

Here's my scenario:

  1. I get csv's with columns(Accountnumber, name, age, address, etc).
  2. I read the csv's to a pandas dataframe df1.
  3. I then look up a view on SQL server and match the accountnumber to get the key.
  4. I read this to a new dataframe df2 and write the results to a sql table.

Question: If I'm passing too many records to the SQL view join, it may impact the performance. I want to avoid the python code slowing down the SQL server performance. Is there any other way to solve this?

Thank you in advance.

Consider a staging temp table that you dump into from Pandas each time. Then run an insert-select append query matching accountnumber in the view. All matching and appending process is run on server. Below is a sketch example with sqlalchemy :

# PANDAS DUMP TO TABLE, REPLACING EACH TIME
my_df.to_sql(name="raw_df_tmp", con=engine, if_exists="replace", index=False)

# SQL INSERT-SELECT (VIA TRANSACTION)
with engine.begin() as conn:     

    sql = """INSERT INTO my_table (Col1, Col2, Col3, ...)
             SELECT Col1, Col2, Col3, ...
             FROM raw_df_tmp r           
             WHERE r.accountnumber IN
                (SELECT accountnumber FROM my_view)
          """
    conn.execute(sql)

engine.dispose()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM