[英]ingest streaming data from API to Bigquery in Google Cloud
I want to ingest data from an api in stream to bigquery.我想从 stream 中的 api 摄取数据到 bigquery。
I guess that the best option is to use cloud dataflow in order to ingest this data into bigquery, but I don't know how to extract the data from the API: https://developer.tomtom.com/traffic-api我想最好的选择是使用云数据流将这些数据提取到 bigquery 中,但我不知道如何从 API 中提取数据: https://developer.tomtom.com/traffic-api
Can I extract the data in the same dataflow pipeline or I have to create an instance and extract the data from there to cloud PUB/SUB and then use dataflow to move this data to bigquery?我可以在同一个数据流管道中提取数据,还是必须创建一个实例并将数据从那里提取到云 PUB/SUB,然后使用数据流将此数据移动到 bigquery?
my assumption is you have an api, from which you want to send data to bigquery.我的假设是你有一个 api,你想从中发送数据到 bigquery。 Since you cannot stream directly the API you have to hit on a batch interval it can be hourly or minute based on the API limitations.
由于您不能 stream 直接 API ,因此您必须根据 API 限制按小时或分钟设置批处理间隔。
You can have a job to read data from this API, and pump into PUB/SUB and use a data flow to pump data to BQ.你可以有一个工作从这个 API 读取数据,然后泵入 PUB/SUB 并使用数据流将数据泵入 BQ。 Or you can use the job directly to pump data to BQ.
或者您可以直接使用该作业将数据泵送到 BQ。 it's up to your data volume/backup strategy and business requirements.
这取决于您的数据量/备份策略和业务需求。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.