BigQuery无法准确返回结果

Question

I'm using GoogleApp Engine and occasionally when I send a query to BigQuery via the JSON API, I will get incorrect results. 我正在使用GoogleApp Engine，偶尔通过JSON API向BigQuery发送查询时，会得到不正确的结果。 It is usually only confined to a single table within BigQuery (I make a new table for every batch job that is created). 它通常只限于BigQuery中的单个表（我为创建的每个批处理作业创建一个新表）。 When I run into this issue in production, I log the Query i submitted and try running it via the BigQuery dashboard which runs longer than expected but returns the expected results. 在生产中遇到此问题时，我记录了我提交的查询，并尝试通过BigQuery仪表板运行该查询，该仪表板的运行时间比预期的要长，但会返回预期的结果。

There is nothing in the response indicating an issue. 响应中没有任何内容表明存在问题。 the jobComplete comes back as True but I see no rows , just the jobReference , schema , and totalRows = 0 . jobComplete返回为True但我看不到任何rows ，只有jobReference ， schema和totalRows = 0 。

In such situations is is appropriate to do a call to get the job results even though I should expect the current call to return the results? 在这种情况下，即使我希望当前的调用返回结果，还是应该进行调用以获取工作结果？

Relevant Code: 相关代码：

http = httplib2.Http(memcache)
self.credentials = AppAssertionCredentials(scope='https://www.googleapis.com/auth/bigquery')
self.http = self.credentials.authorize(http=http)
self.service = build('bigquery','v2',http=self.http)
jobs = self.service.jobs()
result = jobs.query(projectId=settings.GOOGLE_APIS_PROJECT_ID,
                                body={'query': query}).execute()

Response: 响应：

{u'totalRows': u'0', u'kind': u'bigquery#queryResponse', u'jobComplete': True, u'jobReference': {u'projectId': u'<REMOVED>', u'jobId': u'<REMOVED>'}, u'schema': {u'fields': [<REMOVED>]}}

No matter how many times I try to re-run the query in production, the same results are returned (Could this be due to the caching done via memcache with incorrect results being cached as a response?) 无论我尝试在生产环境中重新运行该查询多少次，都将返回相同的结果（这可能是由于通过memcache进行的缓存，并且错误的结果被缓存为响应吗？）

Answer 1

The issue was a mix of the following: 问题是以下各项的混合：

The shared http object is NOT threadsafe! 共享的http对象不是线程安全的！ (https://developers.google.com/api-client-library/python/guide/thread_safety). （https://developers.google.com/api-client-library/python/guide/thread_safety）。 Although most exmaples of usign BigQuery on GAE use a shared httplib2 object, this is incorrect usage. 尽管在GAE上大多数usign BigQuery的示例都使用共享的httplib2对象，但这是不正确的用法。 Only the credentials store is threadsafe and can be shared 仅凭据存储区是线程安全的并且可以共享
There is 10s timeout on queries on BigQuery. BigQuery查询的超时时间为10秒。

I was doing multiple calls to BigQuery in parallel using a shared http object & taskqueues and the queries were taking over 10s to complete. 我正在使用共享的http对象和任务队列并行调用BigQuery，而查询要花10多个时间才能完成。 This is why responses would get mixed between calls and the results would not be as expected. 这就是为什么呼叫之间的响应混在一起，结果却不符合预期的原因。 Eg - I sometimes received the discovery response to my query request 例如-我有时收到对查询请求的发现回复

The Fix: 解决方法：

Re-write my BigQuery client code to not share the httplib2 object between calls and de-couple my process to submit BigQuery jobs to run queries vs using the query() call. 重新编写我的BigQuery客户端代码，以在调用之间不共享httplib2对象，并解耦我的过程以提交BigQuery作业以运行查询，而不是使用query（）调用。 There is a lot more overhead in managing the calls and checking on statuses and receiving results, but at least it works now and the responses make sense. 管理呼叫，检查状态和接收结果还有很多开销，但是至少现在可以正常工作，并且响应是有意义的。

BigQuery无法准确返回结果

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-12-19 21:20:36

BigQuery无法准确返回结果

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-12-19 21:20:36

解决方案1
1 已采纳 2012-12-19 21:20:36