[英]Extract multiple matching strings from a multiline string in Bigquery
我有两个字段, key
和一个名为description
的多行字符串字段,如下所示:
钥匙 | 描述 |
---|---|
1个 | [多行字符串如下] |
2个 | [多行字符串如下] |
3个 | [多行字符串如下] |
“描述”字段示例:
Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here |
我最终想做的是 output 键的每个“系统”实例:
钥匙 | 系统 |
---|---|
1个 | 我需要的字符串A |
1个 | 我需要的字符串B |
1个 | 我需要的字符串 C |
我曾尝试使用正则表达式,但很难找到下一个换行符,因此尝试从中嵌套,例如
SELECT
*,
SUBSTR(system2, INSTR(system2, 'System: ')+8) AS system2_desc,
FROM (
SELECT
*,
SUBSTR(description, INSTR(description, 'System: ')+8) AS system1_desc,
FROM (
SELECT
CASE WHEN INSTR(description, 'System: ') > 0 THEN 1 ELSE 0 END AS contains_system,
description,
FROM my_table
)
)
然后稍后使用 CHR(10) 查找并删除,但很快这将是不可持续的,并且在不知道系统字符串数量的情况下,我必须考虑比我预期的更多的方式。
是否有 function 可以按照上面的预期 output 提取系统字符串,或者在数组中提取系统字符串,然后我可以对其进行交叉连接?
试试这个:
with mytable as (
select
1 as key,
"""Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here |
""" as description
)
select key, system
from mytable, unnest(REGEXP_EXTRACT_ALL(description, r"System: (.*)\n")) as system
尝试这个
with mytable as
(
select """
Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here
""" as descp
),
cte as
(
select split(descp,'\n') as ls
from mytable
)
select replace(str, "System:", '') from cte,unnest(ls) as str
where starts_with(str,"System:" )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.