简体   繁体   中英

Pandas + PyMongo: Write DataFrame to MongoDB

I want to insert pandas DataFrame into MongoDB. However, when I do so, The timestamp column ( which is the index_coloumn of the Dataframe ) does not get inserted into MongoDB.

Below is my pseudocode code which reproduces the problem:

from datetime import datetime

import pandas as pd
from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.ticks
collection = db.STOCK
collection_ohlc = db.STOCK_ohlc

# Read per second ticks data from Mongo into a dataframe
results = collection.find(
    {'timestamp': {'$gte': '2019-01-24T09:15:00', '$lte': '2019-01-24T09:19:59'}})
df = pd.DataFrame(list(results))

# Convert per second ticks data into 1 Minute OHLC Candle
df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')
df.set_index('timestamp', inplace=True)
ohlc_data = df['ltp'].resample('5min').ohlc()

# Print OHLC candle dataframe
print(ohlc_data)

# Write  the OHLC candle back to Mongo into a new collection STOCK_ohlc
collection_ohlc.insert_many(ohlc_data.to_dict('records'))

Here is the output of above print(ohlc_data) statement:

                       open   high    low   close
timestamp
2019-01-24 09:15:00  286.55  286.7  285.5  285.65

Now the code runs fine and ohlc values are inserted in MongoDB. However, the timestamp column is missing.

Below is MongoShell which lists above inserted record:

> db.STOCK_ohlc.find()
{ "_id" : ObjectId("5c6abc6f4994a1bc8c3c08fd"), "open" : 286.55, "high" : 286.7, "low" : 285.5, "close" : 285.65 }
>

As we see, the timestamp is missing from above inserted record. This is useless if timestamp is missing.

I tried various orient as mentioned in pandas.DataFrame.to_dict but none of them seem to be inserting into the MongoDB. The only orient that inserts data is records but then it omits timestamp .

Any pointers would be of great help.

UPDATE: Here is the output of print(ohlc_data.to_dict('records'))

[{'open': 286.55, 'high': 286.7, 'low': 285.5, 'close': 285.65}]

When you try to convert pd.DataFrame to dict , by default to_dict(.) skips the index and only converts the columns.

A solution would be that you set index as a column before use to_dict() :

df.reset_index(level=0, inplace=True)
collection.insert_many(df.to_dict('records'))

Here is the output of df.to_dict('records') :

[{'timestamp': Timestamp('2019-01-24 09:15:00'), 'open': 286.55, 'high': 286.7, 'low': 285.5, 'close': 285.65}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM