简体   繁体   中英

How to declare variable inside BigQuery UDF body?

I am trying to create a UDF function with a while-loop on BigQuery, but I am not seeing any syntactic guidelines in the documentation, which addresses this case specifically, nor which addresses the declaration of variables in side the UDF body.

Context: I'm trying to build a function to apply title case to a string.

I tried:

CREATE CREATE OR REPLACE FUNCTION mydataset.title_case(word STRING) as (
    DECLARE i INT64;
    SET i = ARRAY_LENGTH(SPLIT(word, " "));
    ...
);

However it doesn't like the DECLARE or SET in the UDF body. What's the right syntax for this?

Regarding your question about how to use DECLARE and SET within an UDF , you have to declare and set the variable in the beginning of your code. Then, you pass it as an argument to your UDF , the syntax would be;

DECLARE x ARRAY <String>; 
SET x = (SELECT ARRAY_LENGTH(SPLIT(word, " ")) FROM `project_id.dataset.table`);

CREATE CREATE OR REPLACE FUNCTION mydataset.title_case(word STRING, x INT64) as (
#your function...
);

Notice that the variable is set according to a value from a table, using SELECT . Also, you it is passed as an argument to the UDF .

In addition, I was able to create a JavaScript UDF to apply title case to a string without SET and DECLARE. I have only used JS's builtin methods. You can use it as follows:

CREATE TEMP FUNCTION title_case(str String)
RETURNS string
LANGUAGE js AS """
  str = str.split(' ');
  for(var i = 0; i < str.length; i++){
    str[i] = str[i].charAt(0).toUpperCase() + str[i].slice(1); 
  }
  return str.join(' ');
""";

WITH data AS (
SELECT "jack sparrow" AS name
)

SELECT title_case(name) as new_name FROM data

and the output,

Row new_name    
1   Jack Sparrow

Context: I'm trying to build a function to apply title case to a string.

Instead of answering directly the question - I rather want to address what I believe drove the question to be asked in first place

It is obvious from my experience here on SO that frequently OPs ask questions of literally asking to help them to go wrong direction. in many cases it is sad experience as you understand that you are not doing good help to such person but rather quite opposite, I am guilty to be part of it many times because it is not always really clear what real use case is, so there is no much options to help rather then to answer exact question as it is asked

I think in this case - above question has good hint of real purpose / use-case - so as I already said I want to answer it (the use-case)

You don't really need to do loop in most cases - you rather should try to achieve thing(s) in a sql way - set-based!

So, the hint is in below statement

Context: I'm trying to build a function to apply title case to a string.

The simple way to handle title case function is as below

#standardSQL
CREATE TEMP FUNCTION TitleCase(text STRING) AS ((
  SELECT STRING_AGG(UPPER(SUBSTR(part, 1, 1)) || SUBSTR(part, 2), ' ' ORDER BY OFFSET)
  FROM UNNEST(SPLIT(text, ' ')) part WITH OFFSET
));
SELECT text, 
  TitleCase(text) transformed_text
FROM `project.dataset.table`

you can test above with dummy data as in below example

#standardSQL
CREATE TEMP FUNCTION TitleCase(text STRING) AS ((
  SELECT STRING_AGG(UPPER(SUBSTR(part, 1, 1)) || SUBSTR(part, 2), ' ' ORDER BY OFFSET)
  FROM UNNEST(SPLIT(text, ' ')) part WITH OFFSET
));
WITH `project.dataset.table` AS (
  SELECT 1 id, "google cloud platform" AS text UNION ALL
  SELECT 2, "o'brian"
)
SELECT text, 
  TitleCase(text) transformed_text
FROM `project.dataset.table`

with output as below

Row text                        transformed_text     
1   google cloud platform       Google Cloud Platform    
2   o'brian                     O'brian  

As you can see, your initial approach with using space as a delimiter to split text is not the best way - O'brian didn't get b capitalized

To address this - you can use below approach

#standardSQL
CREATE TEMP FUNCTION TitleCase(text STRING) AS ((
  SELECT STRING_AGG(char, '' ORDER BY OFFSET)
  FROM (
    SELECT IF(REGEXP_CONTAINS(LAG(char) OVER(ORDER BY OFFSET), r'\w'), char, UPPER(char)) char, OFFSET
    FROM UNNEST(SPLIT(text, '')) char WITH OFFSET
    )
));
SELECT text, 
  TitleCase(text) transformed_text
FROM `project.dataset.table`

Now, when applied to same dummy data - result is more appropriate

Row text                        transformed_text     
1   google cloud platform       Google Cloud Platform    
2   o'brian                     O'Brian    

Note: above is just one(or rather two) examples of how to avoid noneffective cursor based processing and rather do all in one (set-based) turn

For people arriving here to find how to declare and set a variable inside a function (as the question title indicates), the answer is that you cannot do so by using DECLARE and SET, but it is not necessary to declare outside (which is not possible with permanent functions): it is possible with WITH statements.

Say you want func1(phrase_in) to return results from table1 where the phrase value is the same length as phrase_in . This might be attempted as:

CREATE OR REPLACE TABLE FUNCTION mydataset.func1(phrase_in STRING) as (
    DECLARE phrase_len INT64;
    SET phrase_len = ARRAY_LENGTH(SPLIT(phrase_in, " "));
    
    SELECT phrase, date, user
    FROM `mydataset.table1`
    WHERE ARRAY_LENGTH(SPLIT(phrase, " ")) = phrase_len
);

This will raise an error, but the desired result is possible with

CREATE OR REPLACE TABLE FUNCTION mydataset.func1(phrase_in STRING) as (
    WITH phrase_len AS (
        SELECT ARRAY_LENGTH(SPLIT(phrase_in, " ")) x
    )
    
    SELECT phrase, date, user
    FROM `mydataset.table1`
    WHERE ARRAY_LENGTH(SPLIT(phrase, " ")) = (SELECT x FROM phrase_len)
);

This is obviously overkill for such a simple example, but I have used this approach when the phrase_len variable is not calculated by a simple function of the input variable but is calculated using SELECT statements from other tables, and is maybe re-used several times within the UDF (hence wanting to declare to avoid making the same sub-query multiple times).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM