简体   繁体   English

如何使用正则表达式将 hive 查询转换为 oracle

[英]How to convert a hive query with regex to oracle

I have this text:我有这段文字:

Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples

I just want to get the part after 'Process explanation' but not include 'final activity...'我只想得到“过程解释”之后的部分,但不包括“最终活动……”

So like this:所以像这样:

The bottle is then melted to form liquid glass.

This is the current hive query which I want to convert to oracle:这是我要转换为 oracle 的当前 hive 查询:

SELECT REGEXP_EXTRACT(
               'Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples',
               '.*(process[ \t]*(explanation)?[ \t]*:[ \t]*)(.*?)([ \t]*;[ \t]*final[ \t]+activity[ \t]+for[ \t]+manager.*$|$)',
               3) as extracted
FROM my_table

If those substrings are just like you said, there's a pretty simple option - substr + instr functions.如果这些子字符串就像您所说的那样,那么有一个非常简单的选项 - substr + instr函数。

SQL> with test (col) as
  2    (select 'Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples' from dual)
  3  select substr(col, instr(col, 'Process explanation') + length('Process explanation') + 1,
  4                     instr(col, 'Final activity') - instr(col, 'Process explanation') -
  5                       length('Process explanation') - 2
  6               ) result
  7  from test;

RESULT
----------------------------------------------
The bottle is then melted to form liquid glass

SQL>

I've come up with something like this:我想出了这样的事情:

with strings as
(SELECT '1Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples' str FROM DUAL
union all
SELECT '2Process explanation:The bottle is then melted to form liquid glass;' str FROM DUAL
union all
SELECT '3Process :The bottle is then melted to form liquid glass' str FROM DUAL
union all
SELECT '4Process explanation: plasma gasification combined with centrifugal activity' str FROM DUAL
union all
SELECT '5Final activity for manager:Labeling of previous samples' str FROM DUAL
)
SELECT str
, REGEXP_SUBSTR(
               str,
           '(.*process[[:blank:]]*(explanation)?[[:blank:]]*:[[:blank:]]*)([A-Za-z0-9 ]*)([[:blank:]]*;[[:blank:]]*final[[:blank:]]*activity[[:blank:]]*for[[:blank:]]*manager.*$)?',
           1, 1, 'i',3)
                as extracted
FROM strings

Resulting in:导致:

STR强度 EXTRACTED提取
1Process explanation:The bottle is then melted to form liquid glass;Final activity for manager:Labeling of previous samples 1流程说明:然后将瓶子熔化形成液态玻璃;经理的最后活动:对先前样品进行标记 The bottle is then melted to form liquid glass然后将瓶子熔化形成液体玻璃
2Process explanation:The bottle is then melted to form liquid glass; 2工艺说明:然后将瓶子熔化,形成液态玻璃; The bottle is then melted to form liquid glass然后将瓶子熔化形成液体玻璃
3Process:The bottle is then melted to form liquid glass 3工艺:然后将瓶子熔化形成液体玻璃 The bottle is then melted to form liquid glass然后将瓶子熔化形成液体玻璃
4Process explanation: plasma gasification combined with centrifugal activity 4工艺说明:等离子气化结合离心活动 plasma gasification combined with centrifugal activity等离子气化结合离心活动
5Final activity for manager:Labeling of previous samples 5经理的最终活动:标记先前样品 - -

Assuming matching blank group instead of your space and tab list [ \t] is ok.假设匹配空白组而不是您的空格和制表符列表 [ \t] 是可以的。 Edit: Modified the regexp a bit cause with possibility of last group being empty '.*' kept catching entire line.编辑:修改了正则表达式,因为最后一组可能为空 '.*' 一直捕获整行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM