简体   繁体   English

使用python通过SQLAlchemy引擎将panda数据帧更新到SQL Server

[英]Update panda dataframe to SQL Server though SQLAlchemy engine using python

I have an existing SQL Server Database.我有一个现有的 SQL Server 数据库。 I want to use python to read from a CSV file and update column values matching the TIMEID column into the SQL Server Table我想使用 python 从 CSV 文件中读取并将与 TIMEID 列匹配的列值更新到 SQL Server 表中

If I do it in SQL Server I would load the the new CSV into a new table and then update using:如果我在 SQL Server 中执行此操作,我会将新的 CSV 加载到新表中,然后使用以下方法进行更新:

UPDATE R 
SET R.[PA]=P.[PA]
FROM [DATABASE_TABLE] AS R
INNER JOIN [NEW_CSV] AS P 
       ON R.[TIMEID] = P.[TIMEID] 
WHERE R.[TIMEID] like '20180201%' //i can survive now without the where, and update everything from the CSV. 

Pretty new to python so pardon me.对 python 来说很新,所以请原谅我。 I have succeeded loading the CSV file into a panda dataframe and also I am able to insert new rows into the SQL Server but I am unable to manage an update (either into existing columns or null columns).我已成功将 CSV 文件加载到熊猫数据框中,并且我能够将新行插入 SQL Server,但我无法管理更新(到现有列或空列)。

import pandas as pd 
from sqlalchemy import create_engine
engine = create_engine("BLOCKOUTFOR PASSWORD")
query="SELECT * FROM [DATABASE].[TABLE]"
df = pd.read_sql_query(query, engine)
display(df) #This is just to display the current data

    TIMEID  DATEID  HOUR    DOW FESTIVAL    PA  PB  PC  P31A    PX  PY  P_TOT
0   20180101H01 2018-01-01  01  2   N   0.4615  0.0570  0.4427  0.0153  None    None    0.9765
1   20180101H02 2018-01-01  02  2   N   0.4112  0.0516  0.4074  0.0154  None    None    0.8856

#Convert Type and Load CSV into df3
def dfReadCSV( Path, Ind):
    df =pd.read_csv(Path,dtype={'DATEID':str,'Hour':str},parse_dates= ['DATEID'])
    df1=df[Ind:]
    return df1
df3=dfReadCSV("C5Liq_2018Test.csv",0)

display(df3) #if there is a neater way to do this it be appreciated, but not critical 

    Attribute   TIMEID  DATEID  Hour    DOW 20A 20DHA   21A 21DHA   30A 31A PA  PB  PC  P31A    P_TOT
0   H01 20180101H01 2018-01-01  01  1   0.2953  0.0158  0.1662  0.0412  0.4427  0.0153  0.4615  0.0570  0.4427  0.0153  0.9765
1   H02 20180101H02 2018-01-01  02  1   0.2711  0.0160  0.1401  0.0356  0.4074  0.0154  0.4112  0.0516  0.4074  0.0154  0.8856

#Insert Function
connStr= engine.connect().connection
cursor = connStr.cursor()

for index,row in df3.iterrows():
    cursor.execute('INSERT INTO [DATABASE].[TABLE]([TIMEID],[DATEID],[Hour],[DOW]) values (?,?,?,?)', row['TIMEID'], row['DATEID'], row['Hour'], row['DOW']) 
    connStr.commit()

cursor.close()
connStr.close()

#Update Function. This is where i have problem.
connStr= engine.connect().connection
cursor = connStr.cursor()

for row in df3.iterrows():
    #sql = 'UPDATE [DATABASE].[TABLE] SET [DATEID]=? WHERE [TIMEID]=?'.format(tbl=[DATABASE].[TABLE])
   cursor.execute("UPDATE [DATABASE].[TABLE]  SET [DATEID] = ? WHERE [TIMEID] = ?", row[:,0],row[;,0])  

cursor.close()
connStr.close()

The Syntax is wrong and I couldn't figure it out.语法错误,我无法弄清楚。 Preferable I like to have a similar method to update as above.Data in the CSV get updated and I want to update these info into my SQL Server table.最好我喜欢有一个类似的方法来更新如上所述。 CSV 中的数据得到更新,我想将这些信息更新到我的 SQL Server 表中。

I have found a similiar thread but found no answer too: Update MSSQL table through SQLAlchemy using dataframes我找到了一个类似的线程,但也没有找到答案: 使用数据帧通过 SQLAlchemy 更新 MSSQL 表

As the threadstarter there, I too cannot drop the table because the new CSV that I load in a new column of data(example PX) might not have some info of the previous insert (PA).作为那里的线程启动器,我也无法删除表,因为我在新数据列(例如 PX)中加载的新 CSV 可能没有先前插入 (PA) 的一些信息。

There are two ways to make an update you want:有两种方法可以进行您想要的更新:

1) Directly on the database: 1)直接在数据库上:

upd = (session.query(TABLE)
       .filter(TIMEID = row[:,0])
       .update({"DATEID": row[:,0]})
       )
print("# of updated rows = {}".format(upd))
# session.commit()

2) Load object(s), update the value, and commit the session 2)加载对象,更新值,并提交会话

upd = (session.query(TABLE)
       .filter(TIMEID = row[:,0])
       )

# assuming there should be exactly one object for given TIMEID
DATEID= upd.one()
DATEID.time_out = datetime.datetime.now()
session.commit()

You can get more info您可以获得更多信息

I don't recommend sqlachemy for updation.我不推荐 sqlachemy 进行更新。 its good for batch insert它有利于批量插入

For sqlalchemy对于 sqlalchemy

import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://postgres:password@host:port/database')
print(engine)
truncate_query = "SELECT * from something.something"
df = pd.read_sql_query(truncate_query , engine)

I found an answer to my solution, after hours of searching:经过数小时的搜索,我找到了解决方案的答案:

Update function更新功能

connStr= engine.connect().connection
cursor = connStr.cursor()

for index, row in df3.iterrows():
    cursor.execute('''UPDATE [DATABASE].[TABLE] SET [Hour] = ? WHERE [TIMEID] = ?''', (row['Hour'],row['TIMEID']))  
    connStr.commit()
    

cursor.close()
connStr.close()

After hours of trying, its an straight forward syntax error.经过数小时的尝试,这是一个直接的语法错误。

I still like to hear about how i can have the solution using the session.query method.我仍然喜欢听听如何使用 session.query 方法获得解决方案。

And I am sure if there my above code could be better if some error checking.而且我确定如果进行一些错误检查,我上面的代码是否会更好。 At the same time if some one can explain why the loop fails without the 'Index' and what it means?同时,如果有人可以解释为什么没有“索引”的循环失败以及它意味着什么?

for index, row in df3.iterrows():

Tired but excited.累但兴奋。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM