简体   繁体   中英

Bucket pattern for time-series data mongodb with python pymongo

i want to create time based buckets,specifically for every hour or more if needed.I read here https://docs.mongodb.com/manual/tutorial/model-time-data/#example about the bucket pattern but i dont know what code to use with python pymongo.My dataset consist of 11 files from 2010-2020 and its about 1.5 millions rows and look like this:

_id:ObjectId("603fb0b7142a0cbb439ae2e1")
    id1:3758
    id6:2
    id7:-79.09
    id8:35.97
    id9:5.5
    id10:0
    id11:-99999
    id12:0
    id13:-9999
    c14:"U"
    id15:0
    id16:99
    id17:0
    id18:-99
    id19:-9999
    id20:33
    id21:0
    id22:-99
    id23:0
    timestamp1:2010-01-01T00:05:00.000+00:00
    timestamp2:2009-12-31T19:05:00.000+00:00

All the attributes change every 5 minute expect id1 which remains the same.The is what i have tried(after proccesing the files and converted them into df):

files =  os.listdir('sampl/')
sorted_files =  sorted(files)

for file in sorted_files:
    df = process_file(file)
    #df.reset_index(inplace=True)  # Reset Index
    data_dict = df.to_dict('records')  # Convert to dictionary

    mycol1.update_many(
        {'nsamples': {'$lt': 12}},
        {
            '$push': {'samples': data_dict },
            '$min': {'first': df['timestamp1']},
            '$max': {'last': df['timestamp1']},
            '$inc': {'nsamples': 1}
        },
        upsert=True
    )

Output: bson.errors.InvalidDocument: cannot encode object: id1 id6 id7... id23 timestamp1 timestamp2 Any help would be appreciated!Thanks in advance!

Here is the answer on how to insert data with bucket pattern in mongodb:

for file in sorted_files:
    df = process_file(file)
    for row,item in df.iterrows():
        data_dict = item.to_dict()
        id1=3758
        mycol1.update_many(
            {"id1":id1,"nsamples": {"$lt": 12}},
            {
                "$push": {"id24": data_dict},
                "$min": {"first": data_dict['timestamp1']},
                "$max": {"last": data_dict['timestamp1']},
                "$inc": {"nsamples": 1}
            },
            upsert=True
        )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM