簡體   English   中英

將數據幀寫入mysql數據庫

[英]Write dataframe into mysql database

我想將數據寫入mysql數據庫。 我首先從數據庫中讀取當前數據,然后計算一個新值。 新值的寫入順序應與數據庫中的數據相同,如下所示。 我不想覆蓋現有數據。 我不想使用to_sql

我收到以下錯誤消息:

(mysql.connector.errors.DatabaseError)1265(01000):第1行的“ log_return”列的數據被截斷[SQL:'INSERT INTO

完整的代碼如下。

import sqlalchemy as sqlal
import pandas as pd
import numpy as np

mysql_engine = sqlal.create_engine(xxx)
mysql_engine.raw_connection()

metadata = sqlal.MetaData()

product  = sqlal.Table('product', metadata,
                       sqlal.Column('ticker', sqlal.String(10), primary_key=True, nullable=False, unique=True),                   
                       sqlal.Column('isin', sqlal.String(12), nullable=True),
                       sqlal.Column('product_name', sqlal.String(80), nullable=True),
                       sqlal.Column('currency', sqlal.String(3), nullable=True),
                       sqlal.Column('market_data_source', sqlal.String(20), nullable=True),
                       sqlal.Column('trading_location', sqlal.String(20), nullable=True),
                       sqlal.Column('country', sqlal.String(20), nullable=True),
                       sqlal.Column('sector', sqlal.String(80), nullable=True)
                       )

market_price_data = sqlal.Table('market_price_data', metadata,
                                sqlal.Column('Date', sqlal.DateTime, nullable=True),
                                sqlal.Column('ticker', sqlal.String(10), sqlal.ForeignKey('product.ticker'), nullable=True), 
                                sqlal.Column('adj_close', sqlal.Float, nullable=True),
                                sqlal.Column('log_return', sqlal.Float, nullable=True)
                                ) 

metadata.create_all(mysql_engine) 

GetTimeSeriesLevels = pd.read_sql_query('SELECT Date, ticker, adj_close FROM market_price_data Order BY ticker ASC', mysql_engine)
GetTimeSeriesLevels['log_return'] = np.log(GetTimeSeriesLevels.groupby('ticker')['adj_close'].apply(lambda x: x.div(x.shift(1)))).dropna()
GetTimeSeriesLevels['log_return'].fillna('NULL', inplace=True)
insert_yahoo_data = market_price_data.insert().values(GetTimeSeriesLevels [['log_return']].to_dict('records'))
mysql_engine.execute(insert_yahoo_data)

該數據庫如下所示。

Date                ticker  adj_close log_return
2016-11-21 00:00:00 AAPL    111.73    NULL  
2016-11-22 00:00:00 AAPL    111.8     NULL  
2016-11-23 00:00:00 AAPL    111.23    NULL      
2016-11-25 00:00:00 AAPL    111.79    NULL  
2016-11-28 00:00:00 AAPL    111.57    NULL  
2016-11-23 00:00:00 ACN     119.82    NULL  
2016-11-25 00:00:00 ACN     120.74    NULL  
2016-11-28 00:00:00 ACN     120.76    NULL  
2016-11-29 00:00:00 ACN     120.94    NULL  
2016-11-30 00:00:00 ACN     119.43    NULL  
...

它看起來應該像這樣:

Date                ticker  adj_close log_return
2016-11-21 00:00:00 AAPL    111.73    NULL
2016-11-22 00:00:00 AAPL    111.8     0.000626
2016-11-23 00:00:00 AAPL    111.23    -0.005111
2016-11-25 00:00:00 AAPL    111.79    0.005022
2016-11-28 00:00:00 AAPL    111.57    -0.001970
2016-11-21 00:00:00 ACN     119,68   NULL
2016-11-22 00:00:00 ACN     119,48   -0,001672521
23.11.2016 00:00:00 ACN     119,82   0,002841623
2016-11-25 00:00:00 ACN     120,74   0,007648857
2016-11-28 00:00:00 ACN     120,76   0,000165631    
...

可恥的是,我不僅僅知道sqlalchemy的原始SQL,考慮將pandas數據幀轉儲到臨時表中,然后將其與最終表連接:

# DUMP TO TEMP TABLE (REPLACING EACH TIME)
GetTimeSeriesLevels.to_sql(name='log_return_temp', con=mysql_engine, if_exists='replace', 
                           index=False)

# SQL UPDATE (USING TRANSACTION)
with engine.begin() as conn:     
    conn.execute("UPDATE market_price_data f" +
                 " INNER JOIN log_return_temp t" +
                 " ON f.Date = t.Date" +
                 " AND f.ticker = t.ticker" +
                 " SET f.log_return = t.log_return;")

engine.dispose()

另外,考慮直接在MySQL中進行日志轉換! 據我adj_close ,在您的pandas / numpy代碼中,您正在對數轉換當前行adj_close與最后一行adj_closeadj_close MySQL可以運行自連接來排列當前行和最后一行。 MySQL保持其數學運算符中的自然日志。

下面是可以使用CREATE AS ...轉儲到臨時表或通過嵌套SELECT語句轉換為復雜的UPDATE查詢的SELECT語句:

SELECT t1.*, LOG(t1.adj_close / t2.adj_close) As log_return
FROM    
   (SELECT m.Date, m.ticker, m.adj_close, 
           (SELECT Count(*) FROM market_price_data sub 
            WHERE sub.Date <= m.Date AND sub.ticker = m.ticker) AS rank
    FROM market_price_data m) As t1

INNER JOIN 
   (SELECT m.Date, m.ticker, m.adj_close, 
           (SELECT Count(*) FROM market_price_data sub 
            WHERE sub.Date <= m.Date AND sub.ticker = m.ticker) AS rank
    FROM market_price_data m) As t1

ON t1.rank = (t2.rank - 1) AND t1.ticker = t2.ticker AND t1.Date = t2.Date

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM