简体   繁体   中英

Converting a Cloud Storage file into a BigQuery Table/Dataset to then later query

I've been at this for a couple days now but it seems the examples don't work or perhaps I have an environment issue so I'm hoping someone can help.

Here's what I've tried and works:

  • I have Eclipse Neon on Mac.
  • Installed All Google API SDKs available from the guide below as well as here: https://developers.google.com/eclipse/docs/install-eclipse-4.6
  • I followed this Quickstart succesfully: https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-eclipse

So I have an Eclipse Dataflow Project that is able to read and write to CloudStorage. Awesome stuff.

GOAL/ISSUE

I now want to take the data from GCS, convert it into a BigQuery Dataset. For each file I'd like to create a new data set. Those details aren't really important for the question, although it would be a nice cherry on top, because where I'm stuck is a simple Hello World example of BigQuery with Eclipse or even CLI Tools. Any working examples would be appreciated or a simple nudge to existing documentation that has working samples. Again maybe it's how I'm compiling with Eclipse or Maven but I don't see any way to load a Google API based project.

ECLIPSE

In Eclipse I tried a bunch of things to simply test out BigQuery API:

  1. loading a new Java Project by loading the google-cloud-java-master repository found https://github.com/GoogleCloudPlatform/google-cloud-java

Eclipse Screenshot

  1. I proceeded to try to load the examples from the "google-cloud-examples" directory. This seems to have an issue because when I right click the CreateTableAndLoadData Class and choose to Run it, I get an error saying "Selection does not contain Main type". Which makes me feel stupid because there is a main function there.

GOOGLE EXAMPLES from java-doc-samples

I then moved on to Trying Google's own examples which are actually in a different repo called java-doc-samples https://github.com/GoogleCloudPlatform/java-docs-samples/blob/master/bigquery/cloud-client/src/main/java/com/example/bigquery/SimpleApp.java

Google doc: https://cloud.google.com/bigquery/create-simple-app-api#bigquery-simple-app-query-java

I loaded these into Eclipse but I got a ton of errors during compilation. Main errors:

Either com.google.cloud wasn't able to resolve or the package com.google.cloud.examples.bigquery.snippets wouldn't resolve. I could not get any of it to compile. The Class names were a bit different so I tried changing those with Eclipse auto-suggest but in the end BigQueryOptions.getDefaultInstance() ended up being an undefined method.

CLI gcloud

In the CLI I tried MVN compiling and running the google-cloud-exmaples/...CreateTableAndLoadData.java

mvn -X compile exec:java  
-Dexec.mainClass=com.google.cloud.examples.bigquery.snippets.CreateTableAndLoadData    
-Dexec.args="--project=myuniqueproject \
--stagingLocation=gs://myuniquebucket/staging/ \
--runner=BlockingDataflowPipelineRunner"

but I get these errors:

[ERROR] Failed to execute goal on project google-cloud-examples: Could not resolve dependencies for project com.google.cloud:google-cloud-examples:jar:0.8.2-alpha-SNAPSHOT: The following artifacts could not be resolved: com.google.cloud:google-cloud:jar:0.8.2-alpha-SNAPSHOT, com.google.cloud:google-cloud-nio:jar:0.8.2-alpha-SNAPSHOT: Could not find artifact com.google.cloud:google-cloud:jar:0.8.2-alpha-SNAPSHOT -> [Help 1]

I now want to take the data from GCS, convert it into a Bigquery Dataset

I'm going to assume you meant " Table " and not " Dataset ". A dataset is a collection of BigQuery tables.

You could save yourself a lot of work, and simply use a federated source to read the file(s) directly from GCS into BigQuery.

More info -> https://cloud.google.com/bigquery/external-data-sources

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM