繁体   English   中英

MongoDB - 查询以获取每件商品的最新报价

[英]MongoDB - Query to get most recent price quote for each item

我正在尝试按照StackOverflow 上的示例进行操作

我们有一个包含数百万股票行情和价格的数据库。 一个常见的需求是获取每个股票代码的最后(最近)行。 在大数据库中,我们在股票代码和下面使用的 createdDateTime 字段上有复合索引。

所以这个问题有两个部分:

  1. 什么是有效运行以最小化 IO 和运行时间的最佳索引?
  2. 我在下面执行的聚合查询使用上述测试数据返回 0 行。 它应该返回两行,每行都有一个股票代码的最新时间。
import requests
import sys
import traceback
import pprint
import json
import bson
from datetime import datetime as datetime1
import datetime
from time import time
import time as time2
import configHandler
import pymongo
from pymongo import MongoClient, UpdateOne
from pymongo.errors import BulkWriteError
from datetime import datetime
import datetime as datetime1

startTime = time()
startDateNowFmt = datetime1.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
(config_dict, config_user_dict) = configHandler.getConfigVariables()

print("Start TestMongoDBQuerySpeedTuningAggregate, DateTime=" + str(startDateNowFmt))

cluster = MongoClient(config_dict['MONGODB_CONNECTION_STRING'])
db = cluster[config_dict['MONGODB_CLUSTER']]
dbCollectionName = "TestQuotesAggregate"
dbCollection = db[dbCollectionName]

row1 = {'ticker': 'Test1',
        'timestampIsoDateTime': '2020-09-29T15:31:15',
        'createdDateTimeYear': 2020,
        'createdDateTimeMonth': 9,
        'createdDateTimeDay': 29,
        'createdDateTimeHour': 15,
        'createdDateTimeMinute': 31,
        'todaysChangePerc': -11,
        'minuteClose': 100}

row2 = {'ticker': 'Test1',
        'timestampIsoDateTime': '2020-09-29T15:32:15',
        'createdDateTimeYear': 2020,
        'createdDateTimeMonth': 9,
        'createdDateTimeDay': 29,
        'createdDateTimeHour': 15,
        'createdDateTimeMinute': 32,
        'todaysChangePerc': -11.1,
        'minuteClose': 99}

row3 = {'ticker': 'Test2',
        'timestampIsoDateTime': '2020-09-29T15:31:15',
        'createdDateTimeYear': 2020,
        'createdDateTimeMonth': 9,
        'createdDateTimeDay': 29,
        'createdDateTimeHour': 15,
        'createdDateTimeMinute': 31,
        'todaysChangePerc': -12,
        'minuteClose': 200}

row4 = {'ticker': 'Test2',
        'timestampIsoDateTime': '2020-09-29T15:32:15',
        'createdDateTimeYear': 2020,
        'createdDateTimeMonth': 9,
        'createdDateTimeDay': 29,
        'createdDateTimeHour': 15,
        'createdDateTimeMinute': 32,
        'todaysChangePerc': -12.1,
        'minuteClose': 195}

doInsert = False   # only need to do this first time

if doInsert:
    insert_rows = [row1, row2, row3, row4]
    dbCollection.insert_many(insert_rows)



print("Before aggregation - show the data we have to work with")
docs1 = dbCollection.find({})
for doc in docs1:
    print(doc['ticker'], doc['minuteClose'], doc['todaysChangePerc'])

# {"todaysChangePerc": {'$lt': -10}},
docs = dbCollection.aggregate([
            {'$match': {
                      '$and': [
                               {'todaysChangePerc': {'$lt': -10}},
                               {'createdDateTimeYear': 2020},
                               {'createdDateTimeMonth': 9},
                               {'createdDateTimeDay': 29},
                               {'createdDateTimeHour': 15},
                               {'createdDateTimeMinute': {"$gt": 49}}
                      ]
            }},
            {'$group': {
                    '_id': '$ticker',
                    'temp_data': {'$last': '$createdDateTimeIsoDateTime'},
                                  'minuteClose': {'$last': '$minuteClose'},
                                  'todaysChangePerc': {'$last': '$todaysChangePerc'}
            }},
            {'$project': {
                     'ticker_id': '$_id',
                     'minuteClose': '$minuteClose',
                     'todaysChangePerc': '$todaysChangePerc'
            }},
            {'$sort': {
                    'timestampIsoDateTime': -1
            }}
        ])


# pprint.pprint(docs.explain())

# pprint.pprint(docs)
countDocs = 0
print("After aggregation - show the data we have to work with")
for doc in docs:
    print(doc['ticker'], doc['minuteClose'], doc['todaysChangePerc'])
    countDocs += 1


endTime = time()
# print("StartTime=" + str(startTime) + " EndTime=" + str(endTime))
elapsedTime = endTime - startTime
endDateNowFmt = datetime1.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print("\n")
print("Count Docs:", countDocs)
print("Server Start DateTime=" + str(startDateNowFmt))
print("Server End   DateTime=" + str(endDateNowFmt))
print("ElapsedTime=" + str(elapsedTime) + " seconds")

在我急于创建可重现的样本时,我遇到了几个错误:

  1. 我将 createdDateTimeIsoDateTime 添加到我的四个测试行中的每一个

  2. 在最终打印中,我必须使用ticker_id 而不是ticker。 遇到关键错误。

这是更正后的代码,我仍在做一些测试和质量保证,以确定它是正确的:

import requests
import sys
import traceback
import pprint
import json
import bson
from datetime import datetime as datetime1
import datetime
from time import time
import time as time2
import configHandler
#import boto3
import pymongo
from pymongo import MongoClient, UpdateOne
from pymongo.errors import BulkWriteError
from datetime import datetime
import datetime as datetime1

##########################################################################################
startTime = time()
startDateNowFmt = datetime1.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
(config_dict, config_user_dict) = configHandler.getConfigVariables()


print("Start TestMongoDBQuerySpeedTuningAggregate, DateTime=" + str(startDateNowFmt))

# print ("ConnectionString:"  + config_dict['MONGODB_CONNECTION_STRING'])
cluster = MongoClient(config_dict['MONGODB_CONNECTION_STRING'])
db = cluster[config_dict['MONGODB_CLUSTER']]
dbCollectionName = "TestQuotesAggregate"
# dbCollectionName = "ProdPolygonIOQuotes"
dbCollection = db[dbCollectionName]

doInsert = True    # only need to do this first time

if doInsert:

    row1 = {'ticker': 'Test1',
            'timestampIsoDateTime': '2020-09-29T15:31:15',
            'createdDateTimeIsoDateTime': '2020-09-29T15:31:15',
            'createdDateTimeYear': 2020,
            'createdDateTimeMonth': 9,
            'createdDateTimeDay': 29,
            'createdDateTimeHour': 15,
            'createdDateTimeMinute': 31,
            'todaysChangePerc': -11,
            'minuteClose': 100}

    row2 = {'ticker': 'Test1',
            'timestampIsoDateTime': '2020-09-29T15:32:15',
            'createdDateTimeIsoDateTime': '2020-09-29T15:32:15',
            'createdDateTimeYear': 2020,
            'createdDateTimeMonth': 9,
            'createdDateTimeDay': 29,
            'createdDateTimeHour': 15,
            'createdDateTimeMinute': 32,
            'todaysChangePerc': -11.1,
            'minuteClose': 99}

    row3 = {'ticker': 'Test2',
            'timestampIsoDateTime': '2020-09-29T15:31:15',
            'createdDateTimeIsoDateTime': '2020-09-29T15:31:15',
            'createdDateTimeYear': 2020,
            'createdDateTimeMonth': 9,
            'createdDateTimeDay': 29,
            'createdDateTimeHour': 15,
            'createdDateTimeMinute': 31,
            'todaysChangePerc': -12,
            'minuteClose': 200}

    row4 = {'ticker': 'Test2',
            'timestampIsoDateTime': '2020-09-29T15:32:15',
            'createdDateTimeIsoDateTime': '2020-09-29T15:31:15',
            'createdDateTimeYear': 2020,
            'createdDateTimeMonth': 9,
            'createdDateTimeDay': 29,
            'createdDateTimeHour': 15,
            'createdDateTimeMinute': 32,
            'todaysChangePerc': -12.1,
            'minuteClose': 195}

    insert_rows = [row1, row2, row3, row4]
    dbCollection.insert_many(insert_rows)



print("Before aggregation - show the data we have to work with")
docs1 = dbCollection.find({})
for doc in docs1:
    print(doc['ticker'], doc['createdDateTimeIsoDateTime'], doc['minuteClose'], doc['todaysChangePerc'])

docs = dbCollection.aggregate([
            {'$match': {
                      '$and': [
                               {'todaysChangePerc': {'$lt': -10}},
                               {'createdDateTimeYear': 2020},
                               {'createdDateTimeMonth': 9},
                               {'createdDateTimeDay': 29},
                               {'createdDateTimeHour': 15},
                               {'createdDateTimeMinute': {"$gt": 1}}
                      ]
            }},
            {'$group': {
                    '_id': '$ticker',
                    'temp_data': {'$last': '$createdDateTimeIsoDateTime'},
                                  'minuteClose': {'$last': '$minuteClose'},
                                  'todaysChangePerc': {'$last': '$todaysChangePerc'}
            }},
            {'$project': {
                     'ticker_id': '$_id',
                     'minuteClose': '$minuteClose',
                     'todaysChangePerc': '$todaysChangePerc'
            }},
            {'$sort': {
                    'timestampIsoDateTime': -1
            }}
        ])


#pprint.pprint(docs.explain())

# pprint.pprint(docs)
countDocs = 0
print("After aggregation - show the data we have to work with")
for doc in docs:
    print(doc['ticker_id'], doc['minuteClose'], doc['todaysChangePerc'])
    countDocs += 1


endTime = time()
# print("StartTime=" + str(startTime) + " EndTime=" + str(endTime))
elapsedTime = endTime - startTime
endDateNowFmt = datetime1.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print("\n")
print("Count Docs:", countDocs)
print("Server Start DateTime=" + str(startDateNowFmt))
print("Server End   DateTime=" + str(endDateNowFmt))
print("ElapsedTime=" + str(elapsedTime) + " seconds")

第二个改进,如果有人可以验证我在 $match 和 $group 之间的新排序是正确的吗?

为了让 $last 工作,在使用 $last 语句之前需要对行进行排序。 最后,我可以按股票代码对最终数据列表进行排序。

docs = dbCollection.aggregate([
            {'$match': {
                      '$and': [
                               {'todaysChangePerc': {'$lt': -10}},
                               {'createdDateTimeYear': 2020},
                               {'createdDateTimeMonth': 9},
                               {'createdDateTimeDay': 29},
                               {'createdDateTimeHour': 15},
                               {'createdDateTimeMinute': {"$gt": 1}}
                      ]
            }},
            {'$sort': {
                    'ticker': 1,
                    'timestampIsoDateTime': 1
            }},
            {'$group': {
                    '_id': '$ticker',
                    'temp_data': {'$last': '$createdDateTimeIsoDateTime'},
                                  'minuteClose': {'$last': '$minuteClose'},
                                  'todaysChangePerc': {'$last': '$todaysChangePerc'},
                                  'timestampIsoDateTime': {'$last': '$timestampIsoDateTime'}
            }},
            {'$project': {
                     'ticker': '$_id',
                     'minuteClose': '$minuteClose',
                     'todaysChangePerc': '$todaysChangePerc',
                     'timestampIsoDateTime': '$timestampIsoDateTime'
            }},
            {'$sort': {
                    'ticker': 1
            }}
        ])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM