I want to insert pandas DataFrame into MongoDB. However, when I do so, The timestamp
column ( which is the index_coloumn
of the Dataframe ) does not get inserted into MongoDB.
Below is my pseudocode code which reproduces the problem:
from datetime import datetime
import pandas as pd
from pymongo import MongoClient
client = MongoClient('localhost', 27017)
db = client.ticks
collection = db.STOCK
collection_ohlc = db.STOCK_ohlc
# Read per second ticks data from Mongo into a dataframe
results = collection.find(
{'timestamp': {'$gte': '2019-01-24T09:15:00', '$lte': '2019-01-24T09:19:59'}})
df = pd.DataFrame(list(results))
# Convert per second ticks data into 1 Minute OHLC Candle
df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')
df.set_index('timestamp', inplace=True)
ohlc_data = df['ltp'].resample('5min').ohlc()
# Print OHLC candle dataframe
print(ohlc_data)
# Write the OHLC candle back to Mongo into a new collection STOCK_ohlc
collection_ohlc.insert_many(ohlc_data.to_dict('records'))
Here is the output of above print(ohlc_data)
statement:
open high low close
timestamp
2019-01-24 09:15:00 286.55 286.7 285.5 285.65
Now the code runs fine and ohlc
values are inserted in MongoDB. However, the timestamp
column is missing.
Below is MongoShell which lists above inserted record:
> db.STOCK_ohlc.find()
{ "_id" : ObjectId("5c6abc6f4994a1bc8c3c08fd"), "open" : 286.55, "high" : 286.7, "low" : 285.5, "close" : 285.65 }
>
As we see, the timestamp is missing from above inserted record. This is useless if timestamp is missing.
I tried various orient
as mentioned in pandas.DataFrame.to_dict
but none of them seem to be inserting into the MongoDB. The only orient
that inserts data is records
but then it omits timestamp
.
Any pointers would be of great help.
UPDATE: Here is the output of print(ohlc_data.to_dict('records'))
[{'open': 286.55, 'high': 286.7, 'low': 285.5, 'close': 285.65}]
When you try to convert pd.DataFrame
to dict
, by default to_dict(.) skips the index and only converts the columns.
A solution would be that you set index as a column before use to_dict()
:
df.reset_index(level=0, inplace=True)
collection.insert_many(df.to_dict('records'))
Here is the output of df.to_dict('records')
:
[{'timestamp': Timestamp('2019-01-24 09:15:00'), 'open': 286.55, 'high': 286.7, 'low': 285.5, 'close': 285.65}]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.