[英]Write dataframe into mysql database
我想将数据写入mysql数据库。 我首先从数据库中读取当前数据,然后计算一个新值。 新值的写入顺序应与数据库中的数据相同,如下所示。 我不想覆盖现有数据。 我不想使用to_sql
。
我收到以下错误消息:
(mysql.connector.errors.DatabaseError)1265(01000):第1行的“ log_return”列的数据被截断[SQL:'INSERT INTO
完整的代码如下。
import sqlalchemy as sqlal
import pandas as pd
import numpy as np
mysql_engine = sqlal.create_engine(xxx)
mysql_engine.raw_connection()
metadata = sqlal.MetaData()
product = sqlal.Table('product', metadata,
sqlal.Column('ticker', sqlal.String(10), primary_key=True, nullable=False, unique=True),
sqlal.Column('isin', sqlal.String(12), nullable=True),
sqlal.Column('product_name', sqlal.String(80), nullable=True),
sqlal.Column('currency', sqlal.String(3), nullable=True),
sqlal.Column('market_data_source', sqlal.String(20), nullable=True),
sqlal.Column('trading_location', sqlal.String(20), nullable=True),
sqlal.Column('country', sqlal.String(20), nullable=True),
sqlal.Column('sector', sqlal.String(80), nullable=True)
)
market_price_data = sqlal.Table('market_price_data', metadata,
sqlal.Column('Date', sqlal.DateTime, nullable=True),
sqlal.Column('ticker', sqlal.String(10), sqlal.ForeignKey('product.ticker'), nullable=True),
sqlal.Column('adj_close', sqlal.Float, nullable=True),
sqlal.Column('log_return', sqlal.Float, nullable=True)
)
metadata.create_all(mysql_engine)
GetTimeSeriesLevels = pd.read_sql_query('SELECT Date, ticker, adj_close FROM market_price_data Order BY ticker ASC', mysql_engine)
GetTimeSeriesLevels['log_return'] = np.log(GetTimeSeriesLevels.groupby('ticker')['adj_close'].apply(lambda x: x.div(x.shift(1)))).dropna()
GetTimeSeriesLevels['log_return'].fillna('NULL', inplace=True)
insert_yahoo_data = market_price_data.insert().values(GetTimeSeriesLevels [['log_return']].to_dict('records'))
mysql_engine.execute(insert_yahoo_data)
该数据库如下所示。
Date ticker adj_close log_return
2016-11-21 00:00:00 AAPL 111.73 NULL
2016-11-22 00:00:00 AAPL 111.8 NULL
2016-11-23 00:00:00 AAPL 111.23 NULL
2016-11-25 00:00:00 AAPL 111.79 NULL
2016-11-28 00:00:00 AAPL 111.57 NULL
2016-11-23 00:00:00 ACN 119.82 NULL
2016-11-25 00:00:00 ACN 120.74 NULL
2016-11-28 00:00:00 ACN 120.76 NULL
2016-11-29 00:00:00 ACN 120.94 NULL
2016-11-30 00:00:00 ACN 119.43 NULL
...
它看起来应该像这样:
Date ticker adj_close log_return
2016-11-21 00:00:00 AAPL 111.73 NULL
2016-11-22 00:00:00 AAPL 111.8 0.000626
2016-11-23 00:00:00 AAPL 111.23 -0.005111
2016-11-25 00:00:00 AAPL 111.79 0.005022
2016-11-28 00:00:00 AAPL 111.57 -0.001970
2016-11-21 00:00:00 ACN 119,68 NULL
2016-11-22 00:00:00 ACN 119,48 -0,001672521
23.11.2016 00:00:00 ACN 119,82 0,002841623
2016-11-25 00:00:00 ACN 120,74 0,007648857
2016-11-28 00:00:00 ACN 120,76 0,000165631
...
可耻的是,我不仅仅知道sqlalchemy的原始SQL,考虑将pandas数据帧转储到临时表中,然后将其与最终表连接:
# DUMP TO TEMP TABLE (REPLACING EACH TIME)
GetTimeSeriesLevels.to_sql(name='log_return_temp', con=mysql_engine, if_exists='replace',
index=False)
# SQL UPDATE (USING TRANSACTION)
with engine.begin() as conn:
conn.execute("UPDATE market_price_data f" +
" INNER JOIN log_return_temp t" +
" ON f.Date = t.Date" +
" AND f.ticker = t.ticker" +
" SET f.log_return = t.log_return;")
engine.dispose()
另外,考虑直接在MySQL中进行日志转换! 据我adj_close
,在您的pandas / numpy代码中,您正在对数转换当前行adj_close
与最后一行adj_close
的adj_close
。 MySQL可以运行自连接来排列当前行和最后一行。 MySQL保持其数学运算符中的自然日志。
下面是可以使用CREATE AS ...
转储到临时表或通过嵌套SELECT
语句转换为复杂的UPDATE
查询的SELECT
语句:
SELECT t1.*, LOG(t1.adj_close / t2.adj_close) As log_return
FROM
(SELECT m.Date, m.ticker, m.adj_close,
(SELECT Count(*) FROM market_price_data sub
WHERE sub.Date <= m.Date AND sub.ticker = m.ticker) AS rank
FROM market_price_data m) As t1
INNER JOIN
(SELECT m.Date, m.ticker, m.adj_close,
(SELECT Count(*) FROM market_price_data sub
WHERE sub.Date <= m.Date AND sub.ticker = m.ticker) AS rank
FROM market_price_data m) As t1
ON t1.rank = (t2.rank - 1) AND t1.ticker = t2.ticker AND t1.Date = t2.Date
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.