简体   繁体   English

MongoDB + Python-非常慢的简单查询

[英]MongoDB + Python - very slow simple query

I have an open source energy monitor ( http://openenergymonitor.org ) which logs the power usage of my house every five seconds, so I thought this would be a perfect application to play with MongoDB. 我有一个开源能源监控器( http://openenergymonitor.org ),它每五秒钟记录一次我家的用电情况,因此我认为这是与MongoDB一起使用的理想应用程序。 I have a Flask Python application running in Apache using MongoEngine to interface with MongoDB. 我有一个在MongoEngine中使用MongoEngine与MongoDB交互的Flask Python应用程序。

Now I am running all of this on a RaspberryPi, so I'm not expecting incredible performance, but a simple query is taking around 20 seconds, which seems slow even for this limited hardware. 现在,我在RaspberryPi上运行所有这些功能,因此我并不期望获得令人难以置信的性能,但是一个简单的查询大约需要20秒,即使对于这种有限的硬件,这似乎也很慢。

I have the following model: 我有以下模型:

class Reading(db.Document):
    created_at = db.DateTimeField(default=datetime.datetime.now, required=True)
    created_at_year = db.IntField(default=datetime.datetime.now().year, required=True)
    created_at_month = db.IntField(default=datetime.datetime.now().month, required=True)
    created_at_day = db.IntField(default=datetime.datetime.now().day, required=True)
    created_at_hour = db.IntField(default=datetime.datetime.now().hour, required=True)
    battery = db.IntField()
    power = db.IntField()
    meta = {
        'indexes': ['created_at_year', 'created_at_month', 'created_at_day', 'created_at_hour']
    }

I currently have around 36,000 readings stored from the last couple of days. 最近几天,我目前已存储了约36,000个读数。 The following code runs super quick: 以下代码运行超级快:

def get_readings_count():
    count = '<p>Count: %d</p>' % Reading.objects.count()
    return count

def get_last_24_readings_as_json():
    readings = Reading.objects.order_by('-id')[:24]
    result = "["
    for reading in reversed(readings):
        result += str(reading.power) + ","
    result = result[:-1]
    result += "]"
    return result

But doing a simple filter: 但是做一个简单的过滤器:

def get_today_readings_count():
    todaycount = '<p>Today: %d</p>' % Reading.objects(created_at_year=2014, created_at_month=1, created_at_day=28).count()
    return todaycount

Takes around 20 seconds - there are around 11,000 readings for today. 大约需要20秒-今天大约有11,000个读数。

Shall I give up expecting anything more of my Pi, or is there some tuning I can do to get more performance from MongoDB? 我是否应该放弃对Pi的更多期望,或者是否可以做一些调整以使MongoDB获得更高的性能?

Mongo 2.1.1 on Debian Wheezy Debian Wheezy上的Mongo 2.1.1

Update 29/1/2014: 2014年2月1日更新:

In response to an answer below, here are the results of getIndexes() and explain(): 为了回答以下问题,以下是getIndexes()和explain()的结果:

> db.reading.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "created_at_year" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "created_at_year_1",
        "background" : false,
        "dropDups" : false
    },
    {
        "v" : 1,
        "key" : {
            "created_at_month" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "created_at_month_1",
        "background" : false,
        "dropDups" : false
    },
    {
        "v" : 1,
        "key" : {
            "created_at_day" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "created_at_day_1",
        "background" : false,
        "dropDups" : false
    },
    {
        "v" : 1,
        "key" : {
            "created_at_hour" : 1
        },
        "ns" : "sensor_network.reading",
        "name" : "created_at_hour_1",
        "background" : false,
        "dropDups" : false
    }
]

> db.reading.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28 }).explain()
{
    "cursor" : "BtreeCursor created_at_day_1",
    "isMultiKey" : false,
    "n" : 15689,
    "nscannedObjects" : 15994,
    "nscanned" : 15994,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 5,
    "nChunkSkips" : 0,
    "millis" : 25511,
    "indexBounds" : {
        "created_at_day" : [
            [
                28,
                28
            ]
        ]
    },
    "server" : "raspberrypi:27017"
}

Update 4 Feb 2月4日更新

Okay, so I deleted the indexes, set a new one on created_at, deleted all the records and left it a day to collect new data. 好的,所以我删除了索引,在created_at上设置了一个新索引,删除了所有记录,并留了一天收集新数据。 I've just run a query for today's data and it took longer (48 seconds): 我刚刚对今天的数据进行了查询,它花费了更长的时间(48秒):

> db.reading.find({'created_at': {'$gte':ISODate("2014-02-04")}}).explain()
{
    "cursor" : "BtreeCursor created_at_1",
    "isMultiKey" : false,
    "n" : 14189,
    "nscannedObjects" : 14189,
    "nscanned" : 14189,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 9,
    "nChunkSkips" : 0,
    "millis" : 48653,
    "indexBounds" : {
        "created_at" : [
            [
                ISODate("2014-02-04T00:00:00Z"),
                ISODate("292278995-12-2147483314T07:12:56.808Z")
            ]
        ]
    },
    "server" : "raspberrypi:27017"
}

That's with only 16,177 records in the database and only one index. 这样数据库中只有16,177条记录,只有一个索引。 There's around 111MB of free memory, so there shouldn't be an issue with the index fitting in memory. 大约有111MB的可用内存,因此索引适合内存应该不会有问题。 I guess I'm going to have to write this off as the Pi not being powerful enough for this job. 我想我将不得不撇掉它,因为Pi的功能不足以完成这项工作。

Are you sure that your index is created? 您确定您的索引已创建? could you provide the output of getIndexes() of your collection 您能否提供集合的getIndexes()的输出

eg: db.my_collection.getIndexes() 例如: db.my_collection.getIndexes()

and the explanation of your query 以及查询的解释

db.my_collection.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28 }).explain()

PS: of course I must agree with @Aesthete about the fact that you store much more than you need to... PS:当然,我必须同意@Aesthete的事实,即您存储的存储量远远超过了您的需要...

29/1/2014 update 29/1/2014更新

Perfect! 完善! As you see you have four different indexes when you can create ONE compound index which will include all of them. 如您所见,当您可以创建一个包含所有索引的复合索引时,会有四个不同的索引。

defining 确定

db.my_collection.ensureIndex({created_at_year: 1, created_at_month: 1, created_at_day: 1, created_at_hour: 1 })

will provide you a more precise index that will enable you to query for: 将为您提供更精确的索引,使您可以查询:

  • year
  • year and month yearmonth
  • year and month and day yearmonthday
  • year and month and day and hour yearmonthdayhour

This will make your queries (with the four keys) much faster, because all your criteria will be met in the index data! 这将使您的查询(使用四个键)更快,因为索引数据将满足您所有的条件!

please note that that the order of keys in ensureIndex() is crucial, that order actually defines the above mentioned list of queries! 请注意, ensureIndex()中键的顺序至关重要,该顺序实际上定义了上述查询列表!

Also note that if all you need is these 4 fields, than if you specify a correct projection 另请注意,如果您只需要这4个字段,则比您指定正确的投影
eg: 例如:
db.my_collection.find({created_at_year: 2014, created_at_month: 1, created_at_day: 28}, { created_at_year: 1, created_at_month: 1, created_at_day: 1 })

then only the index will be used, which is the maximum performance! 那么只会使用索引,这是最高的性能!

可能与您保存日期5次有关,请保存一次(即保留created_at),然后,如果您希望在视图中显示月份,日期等,只需将created_at值转换为仅显示月份,日期等即可。

I wonder if the indexes don't fit in your raspberry pi's memory. 我想知道索引是否不适合您的树莓派的内存。 Since MongoDB can only use one index per query, and it seems to use only the created_by_day query, you could try dropping the indexes and replacing them with an index on the created_at timestamp. 由于MongoDB每个查询只能使用一个索引,并且似乎只使用created_by_day查询,因此您可以尝试删除索引,并在created_at时间戳上将其替换为索引。 Then you could reduce the size of your documents by getting rid of the created_at_* fields. 然后,可以通过摆脱created_at_*字段来减小文档的大小。

You can easily extract the day, month, year etc from an ISO date in a map reduce function, or with the aggregation framework date operators . 您可以在map reduce函数中或使用聚合框架日期运算符轻松地从ISO日期中提取日,月,年等。

The query for today then becomes something like this: today的查询将变成这样:

db.reading.find({'created_at':{'$gte':ISODate("2014-01-29"), '$lt':ISODate("2014-01-30")}})

I think it's interesting that you chose a database advertised as suitable for BIG data to run on your embedded device. 我认为您选择了一个宣传为适合BIG数据在嵌入式设备上运行的数据库很有趣。 I'm curious how it will work out. 我很好奇它会如何工作。 I have a similar gadget, and used BerkeleyDB for storing the readings. 我有一个类似的小工具,并使用BerkeleyDB来存储读数。 Don't forget that MongoDB on a 32 bit OS has a maximum size of 2GB for the entire database. 不要忘记,在32位操作系统上,整个数据库的MongoDB的最大大小为2GB。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM