繁体   English   中英

从 Bigquery 中的多行字符串中提取多个匹配字符串

[英]Extract multiple matching strings from a multiline string in Bigquery

我有两个字段, key和一个名为description的多行字符串字段,如下所示:

钥匙 描述
1个 [多行字符串如下]
2个 [多行字符串如下]
3个 [多行字符串如下]

“描述”字段示例:

Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here |

我最终想做的是 output 键的每个“系统”实例:

钥匙 系统
1个 我需要的字符串A
1个 我需要的字符串B
1个 我需要的字符串 C

我曾尝试使用正则表达式,但很难找到下一个换行符,因此尝试从中嵌套,例如

SELECT
*,
SUBSTR(system2, INSTR(system2, 'System: ')+8) AS system2_desc,
FROM (
    SELECT
    *,
    SUBSTR(description, INSTR(description, 'System: ')+8) AS system1_desc,
    FROM (
        SELECT
        CASE WHEN INSTR(description, 'System: ') > 0 THEN 1 ELSE 0 END AS contains_system,
        description,
        FROM my_table
    )
)

然后稍后使用 CHR(10) 查找并删除,但很快这将是不可持续的,并且在不知道系统字符串数量的情况下,我必须考虑比我预期的更多的方式。

是否有 function 可以按照上面的预期 output 提取系统字符串,或者在数组中提取系统字符串,然后我可以对其进行交叉连接?

试试这个:

with mytable as (
select 
    1 as key, 
    """Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here |
""" as description
)
select key, system
from mytable, unnest(REGEXP_EXTRACT_ALL(description, r"System: (.*)\n")) as system 

在此处输入图像描述

尝试这个

with mytable as
(
    select """
Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here
""" as descp
),
cte as 
(
    select split(descp,'\n') as ls 
    from mytable
)
select replace(str, "System:", '') from cte,unnest(ls) as str
where starts_with(str,"System:" )

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM