简体   繁体   中英

Extract multiple matching strings from a multiline string in Bigquery

I have two fields, key and a multiline string field called description that looks like this:

Key Description
1 [multiline string as below]
2 [multiline string as below]
3 [multiline string as below]

Example of 'description' field:

Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here |

What I am ultimately trying to do is output each instance of 'system' to the key:

Key System
1 string that I needA
1 string that I needB
1 string that I needC

I have tried to use regex but struggled to find the next line break, so instead tried doing a nested from, eg

SELECT
*,
SUBSTR(system2, INSTR(system2, 'System: ')+8) AS system2_desc,
FROM (
    SELECT
    *,
    SUBSTR(description, INSTR(description, 'System: ')+8) AS system1_desc,
    FROM (
        SELECT
        CASE WHEN INSTR(description, 'System: ') > 0 THEN 1 ELSE 0 END AS contains_system,
        description,
        FROM my_table
    )
)

and then later find and remove using CHR(10) but very quickly this would be unsustainable and without knowing the number of System strings, I'd have to account for way more than I might expect.

Is there a function that can extract the System strings either as expected output above or in array that I can then do a cross join on?

Try this one:

with mytable as (
select 
    1 as key, 
    """Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here |
""" as description
)
select key, system
from mytable, unnest(REGEXP_EXTRACT_ALL(description, r"System: (.*)\n")) as system 

在此处输入图像描述

Try this

with mytable as
(
    select """
Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here
""" as descp
),
cte as 
(
    select split(descp,'\n') as ls 
    from mytable
)
select replace(str, "System:", '') from cte,unnest(ls) as str
where starts_with(str,"System:" )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM