简体   繁体   English

GAE数据存储性能与SQLite

[英]GAE Datastore performance vs SQLite

I'm seeing terrible performance using the GAE datastore on both the dev server and the production server. 我在开发服务器和生产服务器上都使用GAE数据存储看到了糟糕的性能。 I have the following simplified model: 我有以下简化模型:

class Team(db.Model):
    name = db.StringProperty()
    # + 1 other property
    # home_games from Game
    # away_games from Game

class Game(db.Model):
    date = db.DateProperty()
    year = db.IntegerProperty()
    home_team = db.ReferenceProperty(Team, collection_name='home_games')
    away_team = db.ReferenceProperty(Team, collection_name='away_games')
    # + 4 other properties
    # results from TeamResults

class TeamResults(db.Model):
    game = db.ReferenceProperty(Game, collection_name='results')
    location = db.StringProperty(choices=('home', 'away'))
    score = db.IntegerProperty()
    # + 17 other properties

I only have one index, on Game year and date. 我只有一个索引,关于游戏年份和日期。 Inserting a small dataset of 478 teams and 786 games took about 50 seconds. 插入478个团队和786个游戏的小型数据集大约需要50秒钟。 A simple query: 一个简单的查询:

games = Game.all()
games.filter('year = ', 2000)
games.order('date')

for game in games:
    for result in game.results:
        # do something with the result

took about 45 seconds. 花了大约45秒。

I'm moving from SQLite-based data storage, and the above query on a much larger dataset takes a fraction of a second. 我从基于SQLite的数据存储中移出,上面对更大数据集的查询只花了不到一秒钟的时间。 Is my data just modeled poorly? 我的数据建模不好吗? Is Datastore just this slow? 数据存储区会这么慢吗?

Edit 1 编辑1
To give a little more background, I'm inserting data from a user-uploaded file. 为了提供更多背景信息,我正在从用户上传的文件中插入数据。 The file is uploaded into the blobstore, then I use csv.reader to parse it. 该文件被上传到Blobstore,然后使用csv.reader对其进行解析。 This happens periodically, and queries are run based on cron jobs. 这是定期发生的,并且查询基于cron作业运行。

your problem is that you insert these records one by one 您的问题是您一一插入这些记录

you need to use batch inserts, see https://developers.google.com/appengine/docs/python/tools/uploadingdata 您需要使用批量插入内容,请参见https://developers.google.com/appengine/docs/python/tools/uploadingdata

Or you may want to insert list of records, as described in documentation: 或者您可能想要插入记录列表,如文档中所述:

https://developers.google.com/appengine/docs/python/datastore/entities#Batch_Operations https://developers.google.com/appengine/docs/python/datastore/entities#Batch_Operations

I don't see any evidence that you're using indexed=False on any of your properties. 我看不到任何证据表明您在任何属性上使用indexed=False Each such property will take two additional writes (one for the ascending index, one for the descending one) per write. 每个这样的属性每次写入将额外进行两次写入(一次用于递增索引,一次用于下降索引)。 Those add up quickly. 那些加起来很快。

You do not need the bulk loader, because you already uploaded the CSV. 您不需要批量加载器,因为您已经上传了CSV。 But you can use batch insert. 但是您可以使用批量插入。 See these tips: http://googleappengine.blogspot.nl/2009/06/10-things-you-probably-didnt-know-about.html Look for : 5. You can batch put, get and delete operations for efficiency 请参阅以下提示: http : //googleappengine.blogspot.nl/2009/06/10-things-you-probably-didnt-know-about.html查找:5.您可以批量放置,获取和删除操作以提高效率

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM