簡體   English   中英

將Google數據存儲區備份從數據存儲加載到Google BigQuery

[英]Load Google Datastore Backups from Data Storage to Google BigQuery

我們的要求是以編程方式備份​​Google Datastore並將這些備份加載到Google Big查詢以進行進一步分析。 我們使用以下方法成功實現了備份自動化

        Queue queue = QueueFactory.getQueue("datastoreBackupQueue");

        /*
         * Create a task which is equivalent to the backup URL mentioned in
         * above cron.xml, using new queue which has Datastore admin enabled
         */
        TaskOptions taskOptions = TaskOptions.Builder.withUrl("/_ah/datastore_admin/backup.create")
                .method(TaskOptions.Method.GET).param("name", "").param("filesystem", "gs")
                .param("gs_bucket_name",
                        "db-backup" + "/" + TimeUtils.parseDateToString(new Date(), "yyyy/MMM/dd"))
                .param("queue", queue.getQueueName());

        /*
         * Get list of dynamic entity kind names from the datastore based on
         * the kinds present in the datastore at the start of backup
         */
        List<String> entityNames = getEntityNamesForBackup();
        for (String entityName : entityNames) {
            taskOptions.param("kind", entityName);
        }

        /* Add this task to above queue */
        queue.add(taskOptions);

然后我可以手動將此備份導入Google Bigquery,但我們如何自動執行此過程?

我也查看了大部分文檔,沒有任何幫助https://cloud.google.com/bigquery/docs/loading-data-cloud-storage#loading_data_from_google_cloud_storage

我自己解決了這個問題,以下是使用JAVA的解決方案以下代碼將從GoogleCloud存儲中提取備份文件並將其加載到Google Big Query中。

        AppIdentityCredential bqCredential = new AppIdentityCredential(
                Collections.singleton(BigqueryScopes.BIGQUERY));

        AppIdentityCredential dsCredential = new AppIdentityCredential(
                Collections.singleton(StorageScopes.CLOUD_PLATFORM));

        Storage storage = new Storage(HTTP_TRANSPORT, JSON_FACTORY, dsCredential);
        Objects list = storage.objects().list(bucket).setPrefix(prefix).setFields("items/name").execute();

        if (list == null) {
            Log.severe(BackupDBController.class, "BackupToBigQueryController",
                    "List from Google Cloud Storage was null", null);
        } else if (list.isEmpty()) {
            Log.severe(BackupDBController.class, "BackupToBigQueryController",
                    "List from Google Cloud Storage was empty", null);
        } else {

            for (String kind : getEntityNamesForBackup()) {
                Job job = new Job();
                JobConfiguration config = new JobConfiguration();
                JobConfigurationLoad loadConfig = new JobConfigurationLoad();

                String url = "";
                for (StorageObject obj : list.getItems()) {
                    String currentUrl = obj.getName();
                    if (currentUrl.contains(kind + ".backup_info")) {
                        url = currentUrl;
                        break;
                    }
                }

                if (StringUtils.isStringEmpty(url)) {
                    continue;
                } else {
                    url = "gs://"+bucket+"/" + url;
                }

                List<String> gsUrls = new ArrayList<>();
                gsUrls.add(url);

                loadConfig.setSourceUris(gsUrls);
                loadConfig.set("sourceFormat", "DATASTORE_BACKUP");
                loadConfig.set("allowQuotedNewlines", true);

                TableReference table = new TableReference();
                table.setProjectId(projectId);
                table.setDatasetId(datasetId);
                table.setTableId(kind);
                loadConfig.setDestinationTable(table);

                config.setLoad(loadConfig);
                job.setConfiguration(config);

                Bigquery bigquery = new Bigquery.Builder(HTTP_TRANSPORT, JSON_FACTORY, bqCredential)
                        .setApplicationName("BigQuery-Service-Accounts/0.1").setHttpRequestInitializer(bqCredential)
                        .build();
                Insert insert = bigquery.jobs().insert(projectId, job);

                JobReference jr = insert.execute().getJobReference();
                Log.info(BackupDBController.class, "BackupToBigQueryController",
                        "Moving data to BigQuery was successful", null);
            }
        }

如果有人有更好的方法,請告訴我

關於您在問題中提到的Google雲端存儲文章加載數據,描述了使用命令行,Node.JS或Python從GCS導入的一些編程示例。

您還可以通過在腳本中運行以下命令,將位於雲存儲上的導入數據自動化為BigQuery:

$ gcloud alpha bigquery import SOURCE DESTINATION_TABLE

有關此命令的更多信息,請訪問本文

截至上周,有一種適當的自動化方法。 最重要的部分是gcloud beta datastore export

我創建了一個簡短的腳本: https//github.com/chees/datastore2bigquery

您可以根據自己的情況進行調整。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM