简体   繁体   English

如何使用Java将数据从Cloud Storage加载到BigQuery

[英]How to load data from Cloud Storage into BigQuery using Java

I want to upload data from Google Cloud Storage to table in Big Query. 我想将数据从Google Cloud Storage上传到Big Query中的表。 There is my code to create job: 有我的代码来创建工作:

public class LoadStorageToBigQuery {

// ///////////////////////
// USER GENERATED VALUES: you must fill in values specific to your
// application.
//
// Visit the Google API Console to create a Project and generate an
// OAuth 2.0 Client ID and Secret (http://code.google.com/apis/console).
// Then, add the Project ID below, and update the clientsecrets.json file
// with your client_id and client_secret
//
// ///////////////////////
private static final String PROJECT_ID = "gavd.com:compute";
private static final String CLIENTSECRETS_LOCATION = "/client_secrets.json";
 private static final String RESOURCE_PATH =
          ("E:/Work On/ads/Cloud/Dev/Source/BigQueryDemo02" + CLIENTSECRETS_LOCATION).replace(
              '/', File.separatorChar);
static GoogleClientSecrets clientSecrets = loadClientSecrets();

// Static variables for API scope, callback URI, and HTTP/JSON functions
private static final List<String> SCOPES = Arrays
        .asList("https://www.googleapis.com/auth/bigquery");
private static final String REDIRECT_URI = "urn:ietf:wg:oauth:2.0:oob";
private static final HttpTransport TRANSPORT = new NetHttpTransport();
private static final JsonFactory JSON_FACTORY = new JacksonFactory();

private static GoogleAuthorizationCodeFlow flow = null;

/**
 * @param args
 * @throws IOException
 * @throws InterruptedException
 */
public static void main(String[] args) throws IOException,
        InterruptedException {
    System.out.println(CLIENTSECRETS_LOCATION);
    // Create a new BigQuery client authorized via OAuth 2.0 protocol
    Bigquery bigquery = createAuthorizedClient();

    // Print out available datasets to the console
    listDatasets(bigquery, "publicdata");
    JobReference jobId = startQuery(bigquery, PROJECT_ID);
    System.out.println("Job ID = " + jobId); 
    JobReference jobRef = startQuery(bigquery, PROJECT_ID);
    checkQueryResults(bigquery, PROJECT_ID, jobRef);

}

/**
 * Creates an authorized BigQuery client service using the OAuth 2.0
 * protocol
 * 
 * This method first creates a BigQuery authorization URL, then prompts the
 * user to visit this URL in a web browser to authorize access. The
 * application will wait for the user to paste the resulting authorization
 * code at the command line prompt.
 * 
 * @return an authorized BigQuery client
 * @throws IOException
 */
public static Bigquery createAuthorizedClient() throws IOException {

    String authorizeUrl = new GoogleAuthorizationCodeRequestUrl(
            clientSecrets, REDIRECT_URI, SCOPES).setState("").build();

    System.out
            .println("Paste this URL into a web browser to authorize BigQuery Access:\n"
                    + authorizeUrl);

    System.out.println("... and type the code you received here: ");
    BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
    String authorizationCode = in.readLine();

    // Exchange the auth code for an access token and refesh token
    Credential credential = exchangeCode(authorizationCode);

    return Bigquery.builder(TRANSPORT, JSON_FACTORY)
            .setHttpRequestInitializer(credential)
            .setApplicationName("Your User Agent Here").build();
}

/**
 * Display all BigQuery Datasets associated with a Project
 * 
 * @param bigquery
 *            an authorized BigQuery client
 * @param projectId
 *            a string containing the current project ID
 * @throws IOException
 */
public static void listDatasets(Bigquery bigquery, String projectId)
        throws IOException {
    Datasets.List datasetRequest = bigquery.datasets().list(projectId);
    DatasetList datasetList = datasetRequest.execute();
    if (datasetList.getDatasets() != null) {
        List<DatasetList.Datasets> datasets = datasetList.getDatasets();
        System.out.println("Available datasets\n----------------");
        System.out.println( " B = " + datasets.toString());
        for (DatasetList.Datasets dataset : datasets) {
            System.out.format("%s\n", dataset.getDatasetReference()
                    .getDatasetId());
        }
    }
}


public static JobReference startQuery(Bigquery bigquery, String projectId) throws IOException {

    Job job = new Job();
    JobConfiguration config = new JobConfiguration();
    JobConfigurationLoad loadConfig = new JobConfigurationLoad();
    config.setLoad(loadConfig);

    job.setConfiguration(config);

    // Set where you are importing from (i.e. the Google Cloud Storage paths).
    List<String> sources = new ArrayList<String>();
    sources.add("gs://gms_cloud_project/bigquery_data/06_13_2014/namesbystate.csv");
    loadConfig.setSourceUris(sources);
    //state:STRING,sex:STRING,year:INTEGER,name:STRING,occurrence:INTEGER
    // Describe the resulting table you are importing to:
    TableReference tableRef = new TableReference();
    tableRef.setDatasetId("gimasys_database");
    tableRef.setTableId("table_test");
    tableRef.setProjectId(projectId);
    loadConfig.setDestinationTable(tableRef);

    List<TableFieldSchema> fields = new ArrayList<TableFieldSchema>();
    TableFieldSchema fieldState = new TableFieldSchema();
    fieldState.setName("state");
    fieldState.setType("STRING");
    TableFieldSchema fieldSex = new TableFieldSchema();
    fieldSex.setName("sex");
    fieldSex.setType("STRING");
    TableFieldSchema fieldName = new TableFieldSchema();
    fieldName.setName("name");
    fieldName.setType("STRING");
    TableFieldSchema fieldYear = new TableFieldSchema();
    fieldYear.setName("year");
    fieldYear.setType("INTEGER");
    TableFieldSchema fieldOccur = new TableFieldSchema();
    fieldOccur.setName("occurrence");
    fieldOccur.setType("INTEGER");
    fields.add(fieldState);
    fields.add(fieldSex);
    fields.add(fieldName);
    fields.add(fieldYear);
    fields.add(fieldOccur);
    TableSchema schema = new TableSchema();
    schema.setFields(fields);
    loadConfig.setSchema(schema);

    // Also set custom delimiter or header rows to skip here....
    // [not shown].

    Insert insert = bigquery.jobs().insert(projectId, job);
    insert.setProjectId(projectId);
    JobReference jobRef =  insert.execute().getJobReference();

    // ... see rest of codelab for waiting for job to complete.
    return jobRef;
    //return jobId;
}

/**
 * Polls the status of a BigQuery job, returns Job reference if "Done"
 * 
 * @param bigquery
 *            an authorized BigQuery client
 * @param projectId
 *            a string containing the current project ID
 * @param jobId
 *            a reference to an inserted query Job
 * @return a reference to the completed Job
 * @throws IOException
 * @throws InterruptedException
 */
private static Job checkQueryResults(Bigquery bigquery, String projectId,
        JobReference jobId) throws IOException, InterruptedException {
    // Variables to keep track of total query time
    long startTime = System.currentTimeMillis();
    long elapsedTime;

    while (true) {
        Job pollJob = bigquery.jobs().get(projectId, jobId.getJobId())
                .execute();
        elapsedTime = System.currentTimeMillis() - startTime;
        System.out.format("Job status (%dms) %s: %s\n", elapsedTime,
                jobId.getJobId(), pollJob.getStatus().getState());
        if (pollJob.getStatus().getState().equals("DONE")) {
            return pollJob;
        }
        // Pause execution for one second before polling job status again,
        // to
        // reduce unnecessary calls to the BigQUery API and lower overall
        // application bandwidth.
        Thread.sleep(1000);
    }
}

/**
 * Helper to load client ID/Secret from file.
 * 
 * @return a GoogleClientSecrets object based on a clientsecrets.json
 */
private static GoogleClientSecrets loadClientSecrets() {
    try {
        System.out.println("A");
        System.out.println(CLIENTSECRETS_LOCATION);
        GoogleClientSecrets clientSecrets = GoogleClientSecrets.load(new JacksonFactory(),
                    new FileInputStream(new File(
                        RESOURCE_PATH)));
        return clientSecrets;
    } catch (Exception e) {
        System.out.println("Could not load file Client_Screts");
        e.printStackTrace();
    }
    return clientSecrets;
}

/**
 * Exchange the authorization code for OAuth 2.0 credentials.
 * 
 * @return an authorized Google Auth flow
 */
static Credential exchangeCode(String authorizationCode) throws IOException {
    GoogleAuthorizationCodeFlow flow = getFlow();
    GoogleTokenResponse response = flow.newTokenRequest(authorizationCode)
            .setRedirectUri(REDIRECT_URI).execute();
    return flow.createAndStoreCredential(response, null);
}

/**
 * Build an authorization flow and store it as a static class attribute.
 * 
 * @return a Google Auth flow object
 */
static GoogleAuthorizationCodeFlow getFlow() {
    if (flow == null) {
        HttpTransport httpTransport = new NetHttpTransport();
        JacksonFactory jsonFactory = new JacksonFactory();

        flow = new GoogleAuthorizationCodeFlow.Builder(httpTransport,
                jsonFactory, clientSecrets, SCOPES)
                .setAccessType("offline").setApprovalPrompt("force")
                .build();
    }
    return flow;
}

} }

Status informed "DONE" in creating table but when checking my project in console.gooogle.com Currently, we also don't see any error or issue or exception but it's not upload any data to my table (Table size 0B). 状态在创建表时被告知“完成”,但是在console.gooogle.com中检查我的项目时,目前,我们也看不到任何错误,问题或异常,但是它没有将任何数据上传到我的表中(表大小0B)。 I tried to create table without any data and than to upload data to this table but it isn't. 我尝试创建没有任何数据的表,然后将数据上传到该表,但事实并非如此。

Job ID = {"jobId":"job_MqfuhuAU1Ms0GIOSbiePFGlc6TE","projectId":"ads.com:compute"}
Job status (451ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (2561ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (6812ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (8273ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (9695ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (11146ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (12466ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (13948ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (15392ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (16796ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: PENDING
Job status (18296ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: RUNNING
Job status (19755ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: RUNNING
Job status (21587ms) job_fOtciwR1pfytbkeMaQ9RvvH18qc: DONE

I would appreciate any help, 我将不胜感激任何帮助,

Thanks 谢谢

pollJob.getStatus().getState().equals("DONE") would tell you when the job is done but it does not give you the exit code. pollJob.getStatus()。getState()。equals(“ DONE”)会告诉您作业何时完成,但不会提供退出代码。

You should be checking errorresult explicitly pollJob.getStatus().getErrorResult(); 您应该显式检查errorresult的结果pollJob.getStatus()。getErrorResult();

    while (true) {
     Job pollJob = getBigQuery().jobs().get(projectId, jobId.getJobId()).execute();
     elapsedTime = System.currentTimeMillis() - startTime;
     if (pollJob.getStatus().getErrorResult() != null) {
        // The job ended with an error.
         System.out.format("Job %s ended with error %s", jobId.getJobId(),pollJob.getStatus().getErrorResult().getMessage(), projectId);
         throw new RuntimeException(String.format("Job %s ended with error %s", jobId.getJobId(), 
                 pollJob.getStatus().getErrorResult().getMessage()));       
     }       
     System.out.format("Job status (%dms) %s: %s\n", elapsedTime,
       jobId.getJobId(), pollJob.getStatus().getState());

     if (pollJob.getStatus().getState().equals("DONE")) {
       break;
     }
     Thread.sleep(5000);
    }

So the problem seems to be the code asks for "string,string,string,integer,integer", but the provided data comes as "string, string,integer,string,integer" - hence BigQuery can't transform the string in column 4 to an integer. 因此,问题似乎出在代码中,要求输入“ string,string,string,integer,integer”,但是提供的数据以“ string,string,integer,string,integer”的形式出现-因此BigQuery无法转换列中的字符串4到整数。

To get the exact error message run bq show -j job_yourjobid . 要获取确切的错误消息,请运行bq show -j job_yourjobid

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从BigQuery读取数据并将其写入云存储中的avro文件格式 - Read data from BigQuery and write it into avro file format on cloud storage Bigquery Java API无法加载数据-“文件在云中不存在” - Bigquery java api failing to load data - “file does not exist on cloud” 将嵌套的BigQuery数据导出到云存储 - Export nested BigQuery data to cloud storage 是否可以使用Java编程地将数据从Google Cloud Storage加载到Big Query中? - Options to load data into Big Query from Google Cloud Storage pro-grammatically in Java? 如何使用 JAVA API 从 BigQuery 查询数据 - How to query data from BigQuery using JAVA API 使用 Clud Dataflow 将数据从 Google Cloud Sql 读取到 BigQuery - Read the data from Google Cloud Sql to BigQuery using Clud Dataflow 如何从BigQuery导出数据并将其作为.csv存储在Google存储空间中 - How to export data from BigQuery and store it as .csv in Google Storage 如何从Java应用程序将数据提取到BigQuery - How Ingest data to BigQuery from Java application 如何从 Java 中的 BigQuery 获取分页数据? - How to get paginated data from BigQuery in Java? 如何使用Google App Engine(Java)创建剩余端点以将多部分数据上传到Google云存储 - How to create a rest endpoint using Google App Engine (Java) to upload multi part data to google cloud storage
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM