简体   繁体   English

Python 中的 Google BigQuery 查询在使用 result() 时有效,但在使用 to_dataframe() 时出现权限问题

[英]Google BigQuery query in Python works when using result(), but Permission issue when using to_dataframe()

I've run into a problem after upgrades of my pip packages and my bigquery connector that returns query results suddenly stopped working with following error message升级我的 pip 包和返回查询结果的 bigquery 连接器突然停止工作并出现以下错误消息后,我遇到了问题

from google.cloud import bigquery
from google.oauth2 import service_account

credentials = service_account.Credentials.from_service_account_file('path/to/file', scopes=['https://www.googleapis.com/auth/cloud-platform',
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/bigquery'
])

client = bigquery.Client(credentials=credentials)
data = client.query('select * from dataset.table').to_dataframe()

PermissionDenied: 403 request failed: the user does not have bigquery.readsessions.create' permission PermissionDenied:403 请求失败:用户没有 bigquery.readsessions.create' 权限

But!但! If you switched the code to如果您将代码切换为

data = client.query('select * from dataset.table').result()

(dataframe -> result) you received the data in RowIterator format and were able to properly read them. (dataframe -> result) 您收到了 RowIterator 格式的数据并且能够正确读取它们。

The same script using to_dataframe with the same credentials was working on the server.使用具有相同凭据的 to_dataframe 的相同脚本正在服务器上运行。 Therefore I set my bigquery package to the same version 2.28.0, which still did not help.因此,我将我的 bigquery package 设置为相同的版本 2.28.0,但仍然没有帮助。

I could not find any advices on this error / topic anywhere, so I just want to share if any of you faced the same thing.我在任何地方都找不到关于这个错误/主题的任何建议,所以如果你们中的任何人遇到同样的事情,我只想分享一下。

Resolution解析度

Along with google-cloud-bigquery package, I also had installed package google-cloud-bigquery-storage .除了 google-cloud-bigquery 包,我还安装了包google-cloud-bigquery-storage Once I uninstalled that one using一旦我卸载了那个使用

pip uninstall google-cloud-bigquery-storage

everything started working again!一切又开始工作了! Unfortunately, the error message was not so straightforward so it took some time to figure out :)不幸的是,错误消息并不是那么简单,所以花了一些时间才弄清楚:)

There are different ways of receiving data from of bigquery.从 bigquery 接收数据的方式有多种。 Using the BQ Storage API is considered more efficient for larger result sets compared to the other options:与其他选项相比,对于较大的结果集,使用BQ Storage API被认为更有效:

The BigQuery Storage Read API provides a third option that represents an improvement over prior options. BigQuery 存储读取 API 提供了第三个选项,它代表了对先前选项的改进。 When you use the Storage Read API, structured data is sent over the wire in a binary serialization format.当您使用存储读取 API 时,结构化数据以二进制序列化格式通过网络发送。 This allows for additional parallelism among multiple consumers for a set of results这允许在多个消费者之间为一组结果提供额外的并行性

The Python BQ library internally determines whether it can use the BQ Storage API or not. Python BQ 库在内部确定它是否可以使用 BQ 存储 API。 For the result method, it uses the tradtional tabledata.list method internally, whereas the to_dataframe method uses the BQ Storage API if the according package is installed.对于 result 方法,它在内部使用传统的 tabledata.list 方法,而 to_dataframe 方法如果安装了相应的包,则使用 BQ Storage API。

However, using the BQ Storage API requires you to have the bigquery.readSessionUser Role respectively the readsessions.create right which in your case seems to be lacking.但是,使用 BQ 存储 API 需要您分别拥有 bigquery.readSessionUser 角色和 readsessions.create 权限,在您的情况下这似乎是缺乏的。

By uninstalling the google-cloud-bigquery-storage, the google-cloud-bigquery package was falling back to the list method.通过卸载 google-cloud-bigquery-storage,google-cloud-bigquery 包将退回到 list 方法。 Hence, by de-installing this package, you were working around the lack of rights.因此,通过卸载此软件包,您可以解决缺乏权限的问题。

See the BQ Python Libary Documentation for details.有关详细信息,请参阅BQ Python 库文档

Just set刚设置
create_bqstorage_client=False create_bqstorage_client=False

from google.cloud import bigquery
import os
client = bigquery.Client()
query_job = client.query(query)
df = query_job.result().to_dataframe(create_bqstorage_client=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 to_dataframe() 作为 BigQuery 管理员角色时出现 BigQuery 权限错误 - BigQuery Permission error when using to_dataframe() as BigQuery Admin role 将 Google Data Studio 社区连接器与 BigQuery 结合使用时的时间戳查询问题 - Timestamp query issue when using Google Data Studio community connector with BigQuery 使用 python 在 Google BigQuery 中进行多个更新查询 - Multiple UPDATE queries in Google BigQuery using python 使用 BigQuery 存储时 golang 中的 BigQuery 可为空类型写入 API - BigQuery nullable types in golang when using BigQuery storage write API 使用 Spark BigQuery 连接器查询 BigQuery 视图时未启用缓存 - Cache not enabled when querying BigQuery view using Spark BigQuery connector 将 Go 与 BigQuery 结合使用时上下文被取消 - Context Canceled when using Go with BigQuery 如何使用 python 在 bigquery 客户端查询中限制执行时间 - How to limit execution time in bigquery client query using python 编写 Google Apps 脚本以在 BigQuery“查询执行期间超出资源”时引发错误 - Program Google Apps Script to throw error when BigQuery "Resources exceeded during query execution" Google AppScript 大型多行查询字符串 (BigQuery) 问题 - Google AppScript large multi-line query string (BigQuery) issue 权限被拒绝使用 API 密钥从 GAE 使用 Go 访问 Bigquery - Permission denied accessing Bigquery with Go from GAE using the API key
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM