简体   繁体   中英

Extract a specific word within a string

I have a table called HEADER , within the table is a specific field called SUMMARY .

Summary contains an overview of an incident like this fake example:

TYPE: CUSTOMER 1
Address: ,123 S 3456 E,,FAIRYLAND,UT 12345-6789 USA
SITE: 1234-NEVADA-46789
Device Name: ROUTER
INCIDENT TYPE: Down
INCIDENT START TIME: 2022-02-10T08:32:25Z 

What I am trying, and failing to do is extract only "Down" after the "INCIDENT TYPE:" line and add this as a new field call "INC_TYPE"

Can someone please help?

You may use REGEXP_SUBSTR with a capture group:

SELECT HEADER.*, REGEXP_SUBSTR(SUMMARY, 'INCIDENT TYPE: (.*) INCIDENT START TIME:', 1, 1, NULL, 1) AS INC_TYPE
FROM HEADER;

If you have a new line character after the incident type then you can just use a simple capture group:

select h.*,
  regexp_substr(h.summary, 'INCIDENT TYPE: (.*)', 1, 1, null, 1) as inc_type
from header h

Based on your example that is probably what you need; but if the values might all be on one line then you can restrict the match based on the following field (assuming that is always there), and including newlines in the match with the 'n' flag:

select h.*,
  regexp_substr(h.summary, 'INCIDENT TYPE: (.*)INCIDENT START TIME:', 1, 1, 'n', 1) as inc_type
from header h

... though that will leave the newline character or space as part of the inc_type , so you would probably want to trim that off:

select h.*,
  rtrim(
    regexp_substr(h.summary, 'INCIDENT TYPE: (.*)INCIDENT START TIME:', 1, 1, 'n', 1),
    chr(32)||chr(13)||chr(10) -- trim trailing space, newline, carriage return
  ) as inc_type
from header h

db<>fiddle

You can use REGEXP_SUBSTR with the multi-line flag to ensure that you match from the start of each line:

SELECT REGEXP_SUBSTR(summary, '^INCIDENT TYPE:\s*(.*)', 1, 1, 'm', 1)
         AS inc_type
FROM   header;

Which, for the sample data:

CREATE TABLE header (summary) AS
SELECT 'TYPE: CUSTOMER 1
Address: ,123 S 3456 E,,FAIRYLAND,UT 12345-6789 USA
SITE: 1234-NEVADA-46789
Device Name: ROUTER
INCIDENT TYPE: Down
INCIDENT START TIME: 2022-02-10T08:32:25Z' FROM DUAL;

Outputs:

INC_TYPE
Down

and:

SELECT REGEXP_SUBSTR(summary, '^TYPE:\s*(.*)', 1, 1, 'im', 1) AS type,
       REGEXP_SUBSTR(summary, '^ADDRESS:\s*(.*)', 1, 1, 'im', 1) AS address,
       REGEXP_SUBSTR(summary, '^SITE:\s*(.*)', 1, 1, 'im', 1) AS site,
       REGEXP_SUBSTR(summary, '^DEVICE NAME:\s*(.*)', 1, 1, 'im', 1)
         AS device_name,
       REGEXP_SUBSTR(summary, '^INCIDENT TYPE:\s*(.*)', 1, 1, 'im', 1)
         AS inc_type,
       TO_TIMESTAMP_TZ(
         REGEXP_SUBSTR(summary, '^INCIDENT START TIME:\s*(.*)', 1, 1, 'im', 1),
         'YYYY-MM-DD"T"HH24:MI:SSTZR'
       ) AS incident_start_time
FROM   header;

Outputs:

TYPE ADDRESS SITE DEVICE_NAME INC_TYPE INCIDENT_START_TIME
CUSTOMER 1 ,123 S 3456 E,,FAIRYLAND,UT 12345-6789 USA 1234-NEVADA-46789 ROUTER Down 2022-02-10 08:32:25.000000000 +00:00

db<>fiddle here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM