简体   繁体   中英

How to read and write large number (~1 Million ) of entities of a model in GAE Python?

How to read and write large number (~1 Million ) of entities of a model in GAE Python ?

My ndb Model :

class X (ndb.Model): 
    Y = ndb.JsonProperty ( "a", repeated=True, indexed=False ) 
    # max list length = 10. 
    # Example of a list stored in Y above : 
    # Y = [ 
    #       { "n" : "name_p__of_around_100_chars", "s" : number_p__between_0_and_100, "a" : "address_p__of_200_chars" }, 
    #       { "n" : "name_q__of_around_100_chars", "s" : number_q__between_0_and_100, "a" : "address_q__of_200_chars" }, 
    # ] 

I need to read entities of model "X" and update its property "Y" and write is back to NDB.

My First Approach
Read all entities using ndb.get_multi ( key_list ) .
This approach failed because it hit memory limit issue at ndb.get_multi () :

Exceeded soft private memory limit of 512 MB with 623 MB after servicing 1 requests total

Has anybody done this earlier ?
What is the best way to do it ?

I am doing this inside a TaskQueue Push queue to avoid any request timeouts.

WHAT SOLVED MY PROBLEM
Thanks everyone. I optimised my algorithm ( which was too messy earlier ) and got rid of the memory issue. All of your suggestions were very informative but the real problem was my bad algo. So i am not in a position to mark any answer as accepted here.

I am leaving this question here ( not deleting it even though it was a problem with my code ) so that somebody else could get good pointers in memory leak issue on GAE Python.

Thanks Dmitry Sadovnychyi, Dan Cornilescu and Tim Hoffman.

You could split your key_list in smaller pieces and iterate through them.

Watch out as TaskQueue also has a time quota, so you're not avoiding just "any timeout", you may need to further split the overall iteration into smaller chunks.

I'm thinking this could make good use of the Pipeline API to address scalability - you might want to take a look at this article: https://blog.svpino.com/2015/05/19/the-google-app-engine-pipeline-api

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM