简体   繁体   中英

Teradata SQL select string of multiple capital letters in a text field

Any help would be much appreciated on figuring out how to identify Acronyms within a text field that has mixed upper and lower case letters.
For example, we might have " we used the BBQ sauce on the Chicken " I need my query to SELECT "BBQ" and nothing else in the cell. There could be multiple capitalized string per row The output should include the uppcase string.

Any ideas are much appreciated!!

This is going to be kind of ugly. I tried to use REGEXP_SPLIT_TO_TABLE to just pull out the all caps words, but couldn't make it work.

I would do it by first using strtok_split_to_table , so each word will end up in it's own row.

First, some dummy data:

create volatile table vt 
(id integer,
col1 varchar(20))
on commit preserve rows;

insert into vt
values (1,'foo BAR');

insert into vt
values (2,'fooBAR');

insert into vt
values(3,'blah FOO FOO blah');

We can use strtok_split_to_table on this:

select
t.*
from table
(strtok_split_to_table(vt.id ,vt.col1,' ')
returns
(tok_key integer 
,tok_num INTEGER
,tok_value VARCHAR(30)
)) AS t

That will split each value into separate rows, using a space as a delimiter.

Finally, we can compare each of those values to that value in upper case:

select
vt.id,
vt.col1,
tok_key,
tok_num,
tok_value,
case when upper(t.tok_value) = t.tok_value (CASESPECIFIC) then tok_value else '0' end
from
(
select
t.*
from table
(strtok_split_to_table(vt.id ,vt.col1,' ')
returns
(tok_key integer 
,tok_num INTEGER
,tok_value VARCHAR(30)
)) AS t
) t
inner join vt
    on t.tok_key = vt.id
order by id,tok_num

Taking our lovely sample data, you'll get:

+----+-------------------+---------+---------+-----------+-------------+
| id |       col1        | tok_key | tok_num | tok_value | TEST_OUTPUT |
+----+-------------------+---------+---------+-----------+-------------+
|  1 | foo BAR           |       1 |       1 | foo       | 0           |
|  1 | foo BAR           |       1 |       2 | BAR       | BAR         |
|  2 | fooBAR            |       2 |       1 | fooBAR    | 0           |
|  3 | blah FOO FOO blah |       3 |       1 | blah      | 0           |
|  3 | blah FOO FOO blah |       3 |       2 | FOO       | FOO         |
|  3 | blah FOO FOO blah |       3 |       3 | FOO       | FOO         |
|  3 | blah FOO FOO blah |       3 |       4 | blah      | 0           |
+----+-------------------+---------+---------+-----------+-------------+

Defining acronyms as all uppercase words with 2 to 5 characters with a '\\b[AZ]{2,5}\\b' regex:

WITH cte AS
 ( -- using @Andrew's Volatile Table 
   SELECT * 
   FROM vt
   -- only rows containing acronyms
   WHERE RegExp_Similar(col1, '.*\b[A-Z]{2,5}\b.*') = 1 
 )
SELECT
   outkey,
   tokenNum,
   CAST(RegExp_Substr(Token, '[A-Z]*') AS VARCHAR(5)) AS acronym -- 1st uppercase word 
   --,token
FROM TABLE
    ( RegExp_Split_To_Table
        ( cte.id,
          cte.col1,

             -- split before an acronym, might include additional characters after
             -- [^A-Z]*? = any number of non uppercase letters (removed)
             -- (?= ) = negative lookahead, i.e. check, but don't remove
          '[^A-Z]*?(?=\b[A-Z]{2,5}\b)',

          '' -- defaults to case sensitive
        ) RETURNS
            ( outKey INT,
              TokenNum INT,
              Token VARCHAR(30000) -- adjust to match the size of your input column 
            )
    ) AS t
WHERE acronym <> ''

I am not 100% sure what are you trying to do but I thing you have many options. Ie:

Option 1) check if the acronym (like BBQ) exist in the string (basic syntax)

SELECT CHARINDEX ('BBQ',@string)

In this case you would need a table of all know acronyms you want to check for and then loop through each of them to see if there is a match for your string and then return the acronym.

DECLARE @string VARCHAR(100)
SET @string = 'we used the BBQ sauce on the Chicken'

create table : [acrs]
--+--- acronym-----+
--+    BBQ         +
--+    IBM         +
--+    AMD         +
--+    ETC         +
--+----------------+

SELECT acronym FROM [acrs] WHERE CHARINDEX ([acronym], @string ) > 0)

This should return : 'BBQ'

Option 2) load up all the upper case characters into a temp table etc. for further logic and processing. I think you could use something like this...

DECLARE @string VARCHAR(100)
SET @string = 'we used the BBQ sauce on the Chicken'

-- make table of all Upper case letters and process individually
;WITH cte_loop(position, acrn)
 AS (
        SELECT 1, SUBSTRING(@string, 1, 1)
        UNION ALL
        SELECT position + 1, SUBSTRING(@string, position + 1, 1)
        FROM cte_loop
        WHERE position < LEN(@string) 
 )
SELECT position, acrn, ascii(acrn) AS [ascii]
FROM cte_loop
WHERE ascii(acrn) > 64 AND ascii(acrn) < 91 -- see the ASCII table for all codes

This would return table like this:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM