简体   繁体   English

BigQuery 中如何使用路径表达式

[英]How are Path Expressions used in BigQuery

BigQuery documentation describes path expressions , which look like this: BigQuery 文档描述了路径表达式,如下所示:

foo.bar
foo.bar/25
foo/bar:25
foo/bar/25-31
/foo/bar
/25/foo/bar

But it doesn't say a lot about how and where these path expressions are used.但它并没有说明这些路径表达式的使用方式和使用位置。 It only briefly mentions:它只是简单地提到:

A path expression describes how to navigate to an object in a graph of objects.路径表达式描述了如何导航到对象图中的 object。

  • But what is this graph of objects?但是这个对象图是什么?
  • How would you use this syntax with a graph of objects?您将如何将此语法用于对象图?
  • What's the meaning of a path expression like foo/bar/25-31 ?foo/bar/25-31这样的路径表达式是什么意思?

My question is: what are these Path Expressions the official documentation describes?我的问题是:官方文档描述的这些路径表达式是什么?

I've searched through BigQuery docs but haven't managed to find any other mention of these path expressions.我搜索了 BigQuery 文档,但没有找到任何其他提到这些路径表达式的地方。 Is this syntax actually part of BigQuery SQL at all?这个语法实际上是 BigQuery SQL 的一部分吗?

What I've found out so far到目前为止我发现了什么

There is an existing question , which asks roughly the same thing, but for some reason it's downvoted and none of the answers are correct.有一个existing question ,它询问大致相同的事情,但由于某种原因它被否决并且没有一个答案是正确的。 Though the question it asks is more about a specific detail of the path expression syntax.尽管它提出的问题更多是关于路径表达式语法的特定细节。

Anyway, the answers there propose a few hypotheses as to what path expressions are:无论如何,那里的答案提出了一些关于什么是路径表达式的假设:

It's not a syntax for referencing tables它不是引用表的语法

The BigQuery Legacy SQL uses syntax that's similar to path expressions for referencing tables: BigQuery Legacy SQL 使用类似于引用表的路径表达式的语法:

SELECT state, year FROM [bigquery-public-data:samples.natality]

But that syntax is only valid in BigQuery Legacy SQL. In the new Google Standard SQL it produces a syntax error.但该语法仅在 BigQuery Legacy SQL 中有效。在新的 Google 标准 SQL 中,它会产生语法错误。 There's a separate documentation for table path syntax , which is different from path expression syntax. 表路径语法有单独的文档,它与路径表达式语法不同。

It's not JSONPath syntax这不是 JSONPath 语法

JSONPath syntax is documented elsewhere and looks like: JSONPath语法记录在别处,看起来像:

SELECT JSON_QUERY(json_text, '$.class.students[0]')

It's not a syntax for accessing JSON object graph这不是访问 JSON object 图的语法

There's a separate JSON subscript operator syntax, which looks like so:有一个单独的JSON 下标运算符语法,如下所示:

SELECT json_value.class.students[0]['name']

My current hypothesis我目前的假设

My best guess is that BigQuery doesn't actually support such syntax, and the description in the docs is a mistake.我最好的猜测是 BigQuery 实际上并不支持这种语法,文档中的描述是错误的。

But please, prove me wrong.但是请证明我错了。 I'd really like to know because I'm trying to write a parser for BigQuery SQL, and to do so, I need to understand the whole syntax that BigQuery allows.我真的很想知道,因为我正在尝试为 BigQuery SQL 编写解析器,为此,我需要了解 BigQuery 允许的整个语法。

I believe that a "path expression" is the combination of identifiers that points to specific objects/tables/columns/etc.我相信“路径表达式”是指向特定对象/表/列/等的标识符的组合。 So `project.dataset.table.struct.column` is a path expression comprising of 5 identifiers.所以`project.dataset.table.struct.column`是一个包含 5 个标识符的路径表达式。 I also think that alias.column within the context of a query is a path expression with 2 identifiers (although the alias is probably expanded behind the scenes).我还认为查询上下文中的alias.column是具有 2 个标识符的路径表达式(尽管别名可能在幕后扩展)。

If you scroll up a bit in your link, there is a section with some examples of valid path expressions, which also happens to be right after the identifiers section.如果您在链接中向上滚动一点,就会有一个包含一些有效路径表达式示例的部分,它也恰好位于identifiers部分之后。

With this in mind, I think a JSON path expression is a certain type of path expression, as parsing JSON requires a specific set of identifiers to get to a specific data element.考虑到这一点,我认为 JSON 路径表达式是某种类型的路径表达式,因为解析 JSON 需要一组特定的标识符才能获取特定的数据元素。

As for the "graph" terminology, perhaps BQ parses the query and accesses data using a graph methodology behind the scenes, I can't really say.至于“图形”术语,也许 BQ 在幕后使用图形方法解析查询和访问数据,我真的不能说。 I would guess "path expressions" probably makes more sense to the team working on BigQuery rather than users using BigQuery.我猜想“路径表达式”对于使用 BigQuery 的团队而不是使用 BigQuery 的用户来说可能更有意义。 I don't think there is any special syntax for you to "use" path expressions.我不认为有任何特殊的语法可以让您“使用”路径表达式。

If you are writing a parser, maybe take some inspiration from this ZetaSQL parser , which has several references to path expressions.如果您正在编写一个解析器,也许可以从这个ZetaSQL 解析器中获得一些灵感,它有几个对路径表达式的引用。

Looks this syntax comes from ZetaSQL parser , which includes the exact same documentation .看起来这个语法来自ZetaSQL 解析器,其中包含完全相同的文档 BigQuery most likely uses ZetaSQL internally as its parser (ZetaSQL supports all of BigQuery syntax and they're both from Google). BigQuery 很可能在内部使用 ZetaSQL 作为其解析器(ZetaSQL 支持所有 BigQuery 语法,它们都来自 Google)。

According to ZetaSQL grammar a path expression beginning with / and containing : and - can be used for referencing tables in FROM clause.根据ZetaSQL 语法,以/开头并包含:-的路径表达式可用于引用 FROM 子句中的表。 Looks like the / and : are simply part of identifier names, like the - is part of identifier names in BigQuery.看起来/:只是标识符名称的一部分,就像-是 BigQuery 中标识符名称的一部分。

But the support for the : and / characters in ZetaSQL path expressions can be toggled on or off, and it seems that in BigQuery it's been toggled off.但是 ZetaSQL 路径表达式中对:/字符的支持可以打开或关闭,而且似乎在 BigQuery 中它已被关闭。 BigQuery doesn't allow : and / characters in table names - not even when they're quoted. BigQuery 不允许在表名中使用:/字符 - 即使它们被引用也不行。

ZetaSQL also allows to toggle the support of - in identifier names, which BigQuery does allow. ZetaSQL 还允许在标识符名称中切换对-的支持,而 BigQuery 确实允许这样做。

My conclusion: it's a ZetaSQL parser feature, the documentation of which has been mistakenly copy-pasted to BigQuery documentation.我的结论:这是一个 ZetaSQL 解析器功能,其文档被错误地复制粘贴到 BigQuery 文档中。

Thanks to rtenha for pointing out the ZetaSQL parser, of which I wasn't aware before.感谢rtenha指出我以前不知道的 ZetaSQL 解析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM