简体   繁体   中英

Load data from MySQL to BigQuery using Dataflow

I want to load data from MySQL to BigQuery using Cloud Dataflow. Anyone can share article or work experience about load data from MySQL to BigQuery using Cloud Dataflow with Python language?

Thank you

You can use apache_beam.io.jdbc to read from your MySQL database, and the BigQuery I/O to write on BigQuery.

Beam knowledge is expected, so I recommend looking at Apache Beam Programming Guide first.

If you are looking for something pre-built, we have the JDBC to BigQuery Google-provided template , which is open-source ( here ), but it is written in Java.

If you only want to copy data from MySQL to BigQuery , you can firstly export your MySql data to Cloud Storage , then load this file to a BigQuery table.

I think no need using Dataflow in this case because you don't have complex transformations and business logics. It only corresponds to a copy.

Export the MySQL data to Cloud Storage via a sql query and gcloud cli:

gcloud sql export csv INSTANCE_NAME gs://BUCKET_NAME/FILE_NAME \
--database=DATABASE_NAME \
--offload \
--query=SELECT_QUERY \
--quote="22" \
--escape="5C" \
--fields-terminated-by="2C" \
--lines-terminated-by="0A"

Load the csv file to a BigQuery table via gcloud cli and bq :

bq load \
  --source_format=CSV \
  mydataset.mytable \
  gs://mybucket/mydata.csv \
  ./myschema.json

./myschema.json is the BigQuery table schema.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM