简体   繁体   English

导出应用引擎数据时如何排除列

[英]How do I exclude columns when exporting app engine data

I'm planning to do some data mining on my django app which uses appengine for storing data, however, one of my tables stores images in two of it's columns, and because of that, it is gigabytes in size so it's far too slow to download every time I want to analyse new data.我计划对我的 django 应用程序进行一些数据挖掘,该应用程序使用 appengine 来存储数据,但是,我的一个表将图像存储在其中的两个列中,因此,它的大小为千兆字节,因此速度太慢了每次我想分析新数据时都下载。 For data mining, I only care about the plan text columns in that table, how do I exclude those columns while exporting data to an csv file?对于数据挖掘,我只关心该表中的计划文本列,如何在将数据导出到 csv 文件时排除这些列?

I'm aware that there is a "column_list" for the csv connector for buildupload.yaml that you can specify to only include certain columns when exporting data, but it looks like it still downloads the entire table row before filtering out the columns when it's converting appengine's intermediate sqlite3 data file to csv.我知道 buildupload.yaml 的 csv 连接器有一个“column_list”,您可以指定在导出数据时只包含某些列,但看起来它仍然会在过滤掉列之前下载整个表格行将 appengine 的中间 sqlite3 数据文件转换为 csv。

For reference, I'm using the method described here to download my data http://code.google.com/appengine/docs/python/tools/uploadingdata.html , but I'm open to other solutions, preferably ones where I can automate this data export every few days.作为参考,我使用此处描述的方法下载我的数据http://code.google.com/appengine/docs/python/tools/uploadingdata.html ,但我对其他解决方案持开放态度,最好是那些我可以每隔几天自动导出此数据。

You can't.你不能。 The AppEngine datastore API, and the underlying GQL, only do two sorts of SELECT queries: __key__ only, and all fields. AppEngine 数据存储 API 和底层 GQL 仅执行两种 SELECT 查询:仅__key__和所有字段。 There's no way of getting a subset of fields.无法获得字段的子集。

Kind of late here but all I did in a similar situation was delete the unwanted property from the automatically generated bulkloader.yaml file.有点晚了,但我在类似情况下所做的只是从自动生成的 bulkloader.yaml 文件中删除不需要的属性。

Here is an example using the Google documentation to exclude the "account" property from the csv file.这是一个使用Google 文档从 csv 文件中排除“帐户”属性的示例。 I use it for things like blobs and it works fine there too:我将它用于 blob 之类的东西,它在那里也能正常工作:

property_map:
- property: __key__
  external_name: key
  export_transform: transform.key_id_or_name_as_string
START DELETE
- property: account
  external_name: account
  # Type: Key Stats: 119 properties of this type in this kind.
  import_transform: transform.create_foreign_key('TODO: fill in Kind name')
  export_transform: transform.key_id_or_name_as_string
END DELETE
- property: invite_nonce
  external_name: invite_nonce
  # Type: String Stats: 19 properties of this type in this kind.

As you've observed, the bulkloader downloads the entire record using remote_api, then outputs only the fields you care about to the CSV.正如您所观察到的,bulkloader 使用 remote_api 下载整个记录,然后仅将您关心的字段输出到 CSV。 If you want to only download selected fields, you'll have to write your own code to do this on the server-side - possibly by using the new Files API in a mapreduce, to write a file you can then download.如果您只想下载选定的字段,则必须编写自己的代码才能在服务器端执行此操作 - 可能通过在 mapreduce 中使用新的 Files API 来编写您可以下载的文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Google App Engine上存储HTML 5地理位置数据? - How do I store HTML 5 Geolocation Data on Google App Engine? 如何定期将数据上传到Google App Engine? - How do I upload data to Google App Engine periodically? 本地主机运行时如何解锁应用引擎数据库? - How do I unlock the app engine database when localhost runs? 如何从我的生产App Engine应用程序获取实时数据到我的本地开发应用程序? - How do I get live data from my production App Engine app to my local dev app? 字段名称未知时如何将文件上传到Google App Engine应用程序 - How do I upload a files to google app engine app when field name is not known 如何从DataFrame图中排除几列? - How do I exclude a few columns from a DataFrame plot? 如何使用App Engine中的Task Queue Python API传递压缩数据? - How do I pass compressed data using the Task Queue Python API in App Engine? 如何访问Jinja2模板中的会话数据(应用程序引擎上的Bottle框架)? - How do I access session data in Jinja2 templates (Bottle framework on app engine)? 如何在不使用密钥名称的情况下确保Google应用引擎中对象的数据完整性? - How do I ensure data integrity for objects in google app engine without using key names? 如何从Google App Engine中的延期任务返回数据 - How do I return data from a deferred task in Google App Engine
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM