简体   繁体   English

如何在 BigQuery UDF 体内声明变量?

[英]How to declare variable inside BigQuery UDF body?

I am trying to create a UDF function with a while-loop on BigQuery, but I am not seeing any syntactic guidelines in the documentation, which addresses this case specifically, nor which addresses the declaration of variables in side the UDF body.我正在尝试在 BigQuery 上创建一个带有 while 循环的 UDF function,但我没有在文档中看到任何语法指南,它专门针对这种情况,也没有解决 UDF 主体中的变量声明。

Context: I'm trying to build a function to apply title case to a string.上下文:我正在尝试构建一个 function 以将标题大小写应用于字符串。

I tried:我试过:

CREATE CREATE OR REPLACE FUNCTION mydataset.title_case(word STRING) as (
    DECLARE i INT64;
    SET i = ARRAY_LENGTH(SPLIT(word, " "));
    ...
);

However it doesn't like the DECLARE or SET in the UDF body.但是它不喜欢 UDF 主体中的 DECLARE 或 SET。 What's the right syntax for this?正确的语法是什么?

Regarding your question about how to use DECLARE and SET within an UDF , you have to declare and set the variable in the beginning of your code.关于如何在UDF中使用DECLARESET的问题,您必须在代码开头声明和设置变量。 Then, you pass it as an argument to your UDF , the syntax would be;然后,将其作为参数传递给UDF ,语法为;

DECLARE x ARRAY <String>; 
SET x = (SELECT ARRAY_LENGTH(SPLIT(word, " ")) FROM `project_id.dataset.table`);

CREATE CREATE OR REPLACE FUNCTION mydataset.title_case(word STRING, x INT64) as (
#your function...
);

Notice that the variable is set according to a value from a table, using SELECT .请注意,变量是根据表中的值设置的,使用SELECT Also, you it is passed as an argument to the UDF .此外,您将它作为参数传递给UDF

In addition, I was able to create a JavaScript UDF to apply title case to a string without SET and DECLARE.此外,我能够创建一个 JavaScript UDF 以将标题大小写应用于没有 SET 和 DECLARE 的字符串。 I have only used JS's builtin methods.我只使用过 JS 的内置方法。 You can use it as follows:您可以按如下方式使用它:

CREATE TEMP FUNCTION title_case(str String)
RETURNS string
LANGUAGE js AS """
  str = str.split(' ');
  for(var i = 0; i < str.length; i++){
    str[i] = str[i].charAt(0).toUpperCase() + str[i].slice(1); 
  }
  return str.join(' ');
""";

WITH data AS (
SELECT "jack sparrow" AS name
)

SELECT title_case(name) as new_name FROM data

and the output,和 output,

Row new_name    
1   Jack Sparrow

Context: I'm trying to build a function to apply title case to a string.上下文:我正在尝试构建一个 function 以将标题大小写应用于字符串。

Instead of answering directly the question - I rather want to address what I believe drove the question to be asked in first place而不是直接回答问题 - 我宁愿先解决我认为促使问题被问到的问题

It is obvious from my experience here on SO that frequently OPs ask questions of literally asking to help them to go wrong direction.从我在 SO 上的经验可以明显看出,OP 经常会问一些字面上的问题,帮助他们 go 错误的方向。 in many cases it is sad experience as you understand that you are not doing good help to such person but rather quite opposite, I am guilty to be part of it many times because it is not always really clear what real use case is, so there is no much options to help rather then to answer exact question as it is asked在很多情况下,这是一种悲伤的经历,因为你明白你并没有为这样的人提供好的帮助,而是恰恰相反,我多次参与其中感到内疚,因为并不总是很清楚真正的用例是什么,所以有没有太多选择可以帮助而不是回答被问到的确切问题

I think in this case - above question has good hint of real purpose / use-case - so as I already said I want to answer it (the use-case)我认为在这种情况下——上面的问题很好地暗示了真正的目的/用例——所以正如我已经说过的,我想回答它(用例)

You don't really need to do loop in most cases - you rather should try to achieve thing(s) in a sql way - set-based!在大多数情况下你真的不需要做循环 - 你应该尝试以 sql 的方式实现事物 - 基于集合!

So, the hint is in below statement所以,提示在下面的语句中

Context: I'm trying to build a function to apply title case to a string.上下文:我正在尝试构建一个 function 以将标题大小写应用于字符串。

The simple way to handle title case function is as below处理title case function的简单方法如下

#standardSQL
CREATE TEMP FUNCTION TitleCase(text STRING) AS ((
  SELECT STRING_AGG(UPPER(SUBSTR(part, 1, 1)) || SUBSTR(part, 2), ' ' ORDER BY OFFSET)
  FROM UNNEST(SPLIT(text, ' ')) part WITH OFFSET
));
SELECT text, 
  TitleCase(text) transformed_text
FROM `project.dataset.table`

you can test above with dummy data as in below example您可以在上面使用虚拟数据进行测试,如下例所示

#standardSQL
CREATE TEMP FUNCTION TitleCase(text STRING) AS ((
  SELECT STRING_AGG(UPPER(SUBSTR(part, 1, 1)) || SUBSTR(part, 2), ' ' ORDER BY OFFSET)
  FROM UNNEST(SPLIT(text, ' ')) part WITH OFFSET
));
WITH `project.dataset.table` AS (
  SELECT 1 id, "google cloud platform" AS text UNION ALL
  SELECT 2, "o'brian"
)
SELECT text, 
  TitleCase(text) transformed_text
FROM `project.dataset.table`

with output as below output 如下

Row text                        transformed_text     
1   google cloud platform       Google Cloud Platform    
2   o'brian                     O'brian  

As you can see, your initial approach with using space as a delimiter to split text is not the best way - O'brian didn't get b capitalized如您所见,您最初使用空格作为分隔符来拆分文本的方法并不是最好的方法O'brian没有将b大写

To address this - you can use below approach要解决这个问题 - 您可以使用以下方法

#standardSQL
CREATE TEMP FUNCTION TitleCase(text STRING) AS ((
  SELECT STRING_AGG(char, '' ORDER BY OFFSET)
  FROM (
    SELECT IF(REGEXP_CONTAINS(LAG(char) OVER(ORDER BY OFFSET), r'\w'), char, UPPER(char)) char, OFFSET
    FROM UNNEST(SPLIT(text, '')) char WITH OFFSET
    )
));
SELECT text, 
  TitleCase(text) transformed_text
FROM `project.dataset.table`

Now, when applied to same dummy data - result is more appropriate现在,当应用于相同的虚拟数据时 - 结果更合适

Row text                        transformed_text     
1   google cloud platform       Google Cloud Platform    
2   o'brian                     O'Brian    

Note: above is just one(or rather two) examples of how to avoid noneffective cursor based processing and rather do all in one (set-based) turn注意:以上只是一个(或两个)示例,说明如何避免无效的基于 cursor 的处理,而是在一个(基于集合的)回合中完成所有操作

For people arriving here to find how to declare and set a variable inside a function (as the question title indicates), the answer is that you cannot do so by using DECLARE and SET, but it is not necessary to declare outside (which is not possible with permanent functions): it is possible with WITH statements.对于到这里来查找如何在 function 内声明和设置变量的人(如问题标题所示),答案是您不能使用 DECLARE 和 SET 这样做,但没有必要在外部声明(这不是可以使用永久函数):可以使用 WITH 语句。

Say you want func1(phrase_in) to return results from table1 where the phrase value is the same length as phrase_in .假设您希望func1(phrase_in)table1返回结果,其中phrase值与phrase_in的长度相同。 This might be attempted as:这可能被尝试为:

CREATE OR REPLACE TABLE FUNCTION mydataset.func1(phrase_in STRING) as (
    DECLARE phrase_len INT64;
    SET phrase_len = ARRAY_LENGTH(SPLIT(phrase_in, " "));
    
    SELECT phrase, date, user
    FROM `mydataset.table1`
    WHERE ARRAY_LENGTH(SPLIT(phrase, " ")) = phrase_len
);

This will raise an error, but the desired result is possible with这将引发错误,但可以使用

CREATE OR REPLACE TABLE FUNCTION mydataset.func1(phrase_in STRING) as (
    WITH phrase_len AS (
        SELECT ARRAY_LENGTH(SPLIT(phrase_in, " ")) x
    )
    
    SELECT phrase, date, user
    FROM `mydataset.table1`
    WHERE ARRAY_LENGTH(SPLIT(phrase, " ")) = (SELECT x FROM phrase_len)
);

This is obviously overkill for such a simple example, but I have used this approach when the phrase_len variable is not calculated by a simple function of the input variable but is calculated using SELECT statements from other tables, and is maybe re-used several times within the UDF (hence wanting to declare to avoid making the same sub-query multiple times).对于这样一个简单的例子,这显然有点过分了,但是当phrase_len变量不是通过输入变量的简单 function 计算而是使用来自其他表的 SELECT 语句计算的,并且可能在其中多次重复使用时,我使用了这种方法UDF(因此想要声明以避免多次进行相同的子查询)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM