简体   繁体   English

App Engine-上传大文件并将其中的数据解析到数据存储中

[英]App engine - Uploading a large file and parsing data from it onto the datastore

I have a file that contains ~16,000 lines of information on entities. 我有一个文件,其中包含约16,000行有关实体的信息。 The user is supposed to upload the file using an HTML upload form, then the system handles this by reading line by line and creating then put()'ing entities onto the datastore. 应该假定用户使用HTML上载表格来上载文件,然后系统通过逐行读取并在数据存储区上创建然后放置()实体来处理此问题。

I'm limited by the 30 second request time limit. 我受30秒请求时间限制。 I have tried a lot of different work-arounds using Task Queue, forced HTML redirecting, etc. and nothing has worked for me. 我使用Task Queue,强制HTML重定向等尝试了许多不同的解决方法,但对我来说没有任何效果。

I am using forced HTML redirecting to delete all data and this works, albeit VERY slowly. 我正在使用强制HTML重定向来删除所有数据,尽管非常缓慢,但这仍然有效。 (4th answer here: Delete all data for a kind in Google App Engine ) (此处的第4个答案: 删除Google App Engine中某种类型的所有数据

I can't seem to apply this to my uploading problem, since my method has to be a POST method. 由于我的方法必须是POST方法,因此似乎无法将其应用于上传问题。 Is there a solution somehow? 有解决办法吗? Sample code would be much appreciated since I'm very new to web development in general. 样本代码将不胜感激,因为我一般对Web开发都是新手。

To solve a similar problem, I stored the dataset in a model with a single TextProperty, then spawn a taskqueue task that: 为了解决类似的问题,我将数据集存储在具有单个TextProperty的模型中,然后生成一个任务队列任务,该任务是:

  1. Fetches a dataset from the datastore if there are any left. 如果有剩余数据,则从数据存储中获取数据集。

  2. Checks if the length of the dataset is <= N, where N is some small number of entities you can put() without a timeout. 检查数据集的长度是否小于等于N,其中N是您可以放置​​而没有超时的少量实体。 I used 5. If so, write the individual entities, delete the dataset record, and spawn a new copy of the task. 我使用了5。如果是这样,则编写单个实体,删除数据集记录,并生成任务的新副本。

  3. If the dataset size was bigger than N, split it into N parts in the same format and write those to the datastore, delete the original entity, and spawn a new copy of the task. 如果数据集的大小大于N,则以相同的格式将其拆分为N个部分,然后将其写入数据存储区,删除原始实体,并生成任务的新副本。

If you're doing this to bulk load data, why not use the bulk loader ? 如果您要批量加载数据,为什么不使用批量加载器呢?

If you need the interface to be accessible to non-admin users, then, as suggested, you need to break the file up into decent sized chunks (by taking blocks of n lines each) put them into the datastore, and start a task to deal with each of them. 如果您需要非管理员用户可以访问该界面,则按照建议,您需要将文件拆分为适当大小的块(每个块取n行)放入数据存储区,然后开始执行任务与他们每个人打交道。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM