简体   繁体   中英

Failing bulk insert data from Pandas dataframe into Sybase database table using to_sql

The purpose of my below code is to get data from a restful service, normalize it, store it in dataframe with necessary columns and then finally load it in Sybase table using Pandas' to_sql .

Error:

File "C:\Program Files\Anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 467, in do_executemany cursor.executemany(statement, parameters) sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Sybase][ODBC Driver][Adaptive Server Enterprise]Incorrect syntax near ','.\n (102) (SQLExecDirectW)") [SQL: 'INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") VALUES (?, ?, ?, ?)'] [parameters: (('0050/TAIEX', 'TAIEX', 'TWD', 0), ('035420/KORE', 'KORE', 'KRW', 0), ('0TL/LIF', 'LIF', 'NOK', 1), ('100FTSE/LIF', 'LIF', 'GBP', 0), ('101FTSE/LIF', 'LIF', 'GBP', 0), ('10STAT/OM', 'OM', 'SEK', 0), ('10TB/KFX', 'KFX', 'KRW', 0), ('10TBA/KFX', 'KFX', 'KRW', 0)... displaying 10 of 4525 total bound parameter sets... ('ZURF/DTB', 'DTB', 'CHF', 0), ('ZX/NYCE', 'NYCE', 'USD', 0))]

Process finished with exit code 1

Code:

from sqlalchemy.engine.url import *               
from sqlalchemy.connectors.pyodbc import *             
from sqlalchemy import create_engine                       
import urllib.request as request                  
import json                         
import pandas as pd                      
from pandas.io.json import json_normalize, DataFrame      
           
response = request.urlopen('http://tfsdscsw5XX/mdsclass/CONTFUTURES--O.json')            
output=response.read()                              
data=json.loads(output)           
df=json_normalize(data)                           
df1=(df[['CONTRACT_ID','EXCHANGE_ID','CURRENCY','TRADING_CODE']])                
df2=pd.DataFrame(df1)           
print(df2)                
print(df2.CONTRACT_ID)          
            
connector =  PyODBCConnector()                 
url = make_url("sybase+pyodbc://myhost/mydatabase?driver=Adaptive Server Enterprise&port=2306")              
print(connector.create_connect_args(url))                         
engine=create_engine(url)

#it is failing here**
df2.to_sql("contract_test",engine,index=False,if_exists="append",schema="dbo")   

response.close()               

Sample of data in dataframe df2:

      CONTRACT_ID EXCHANGE_ID CURRENCY  TRADING_CODE
0      0050/TAIEX       TAIEX      TWD             0
1     035420/KORE        KORE      KRW             0
2         0TL/LIF         LIF      NOK             1
3     100FTSE/LIF         LIF      GBP             0
4     101FTSE/LIF         LIF      GBP             0

Table contract_test definition:

CREATE TABLE contract_test (
    CONTRACT_ID char(12) NOT NULL,
    EXCHANGE_ID char(12),
    CURRENCY char(4) NOT NULL,
    TRADING_CODE smallint
) 
GO

Please help as to how can this be resolved? I am stuck here.

Your issue may simply be the incompatibilities of Python database APIs. Pandas' to_sql is really running an executemany() call from pyodbc . This module is more popularly used with SQL Server especially in implementation with SQLAlchemy. However, integration with Sybase is not fully supported. As mentioned on the SQLAlchemy Sybase docs page :

Note

The Sybase dialect within SQLAlchemy is not currently supported. It is not tested within continuous integration and is likely to have many issues and caveats not currently handled. Consider using the external dialect instead.

Specifically, executemany appears to be running multiple VALUES row inserts which is supported in SQL Server but not Sybase (even though both dialects are variants of TSQL with known history of connection):

INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
VALUES ('0050/TAIEX', 'TAIEX', 'TWD', 0), 
       ('035420/KORE', 'KORE', 'KRW', 0), 
       ('0TL/LIF', 'LIF', 'NOK', 1), 
...

Instead, Sybase requires classic ANSI-SQL with multiple INSERT INTO calls:

INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
VALUES ('0050/TAIEX', 'TAIEX', 'TWD', 0) 
INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
VALUES ('035420/KORE', 'KORE', 'KRW', 0)
INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
VALUES ('0TL/LIF', 'LIF', 'NOK', 1)
...

To resolve, instead of Pandas' convenient to_sql method, consider a direct SQLAlchemy executemany call with parameters using list of data frame rows via DataFrame.to_numpy() . Below assumes contract_test table always exists beforehand.

engine = create_engine(url)
sql = """INSERT INTO dbo.contract_test ("CONTRACT_ID", "EXCHANGE_ID", "CURRENCY", "TRADING_CODE") 
         VALUES (?, ?, ?, ?)"""

with engine.connect() as connection:
    result = connection.execute(sql, df2.to_numpy().tolist())

If above still faces same issue, integrate a for-loop:

with engine.connect() as connection:
    for row in df2.to_numpy().tolist():
        result = connection.execute(sql, row)

The external SAP ASE (Sybase) dialect is now the recommended SQLAlchemy dialect for Sybase, and it does support fast_executemany if you use the SAP ASE ODBC driver.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM