簡體   English   中英

在 Oracle SQL 中拆分和比較兩個字符串

[英]Split and compare two Strings in Oracle SQL

我有一個包含三列結構的表格,如下所示:

+------------------------+------------------------------+--------------+
| left                   | right                        | pattern      |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | Kiki Cola 50 ml bottle       |              |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | 50 ml Kiki Cola bottle       |              |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | Kiki Cola 50 ml              |              |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | Kiki Cola Light bottle 50 ml |              |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | Coca Cola 50 ml bottle       |              |
+------------------------+------------------------------+--------------+

現在我想執行一個 Oracle-SQL 查詢,它為我提供左右兩個字符串的編輯模式。 結果應如下所示:

+------------------------+------------------------------+--------------+
| left                   | right                        | pattern      |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | Kiki Cola 50 ml bottle       | SAME         |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | 50 ml Kiki Cola bottle       | SWAPPED      |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | Kiki Cola 50 ml              | CONTAINED_IN |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | Kiki Cola Light bottle 50 ml | CONTAINS     |
+------------------------+------------------------------+--------------+
| Kiki Cola 50 ml bottle | Coca Cola 50 ml bottle       | NOT_SAME     |
+------------------------+------------------------------+--------------+ 

我對 REGEX_SPLIT 和 CONNECT BY 的嘗試沒有成功。 你有什么想法如何解決這個問題嗎?

您可以創建集合數據類型:

CREATE TYPE stringlist IS TABLE OF VARCHAR2(200);

然后將字符串拆分為單詞集合並比較這些集合:

SELECT left,
       right,
       CASE
       WHEN left = right THEN 'same'
       WHEN left_words = right_words THEN 'swapped'
       WHEN left_words SUBMULTISET OF right_words THEN 'contains'
       WHEN right_words SUBMULTISET OF left_words THEN 'contained in'
       ELSE 'not_same'
       END AS pattern
FROM   (
  SELECT left,
         right,
         ( SELECT CAST(
                    COLLECT( REGEXP_SUBSTR( left, '[^ ]+', 1, LEVEL ) )
                    AS stringlist
                  )
           FROM   DUAL
           CONNECT BY
                  LEVEL <= REGEXP_COUNT( left, '[^ ]+' )
         ) AS left_words,
         ( SELECT CAST(
                    COLLECT( REGEXP_SUBSTR( right, '[^ ]+', 1, LEVEL ) )
                    AS stringlist
                  )
           FROM   DUAL
           CONNECT BY
                  LEVEL <= REGEXP_COUNT( right, '[^ ]+' )
         ) AS right_words
  FROM   test_data t
)

所以對於你的測試數據:

CREATE TABLE test_data ( left, right ) AS
SELECT 'Kiki Cola 50 ml bottle', 'Kiki Cola 50 ml bottle' FROM DUAL UNION ALL
SELECT 'Kiki Cola 50 ml bottle', '50 ml Kiki Cola bottle' FROM DUAL UNION ALL
SELECT 'Kiki Cola 50 ml bottle', 'Kiki Cola 50 ml' FROM DUAL UNION ALL
SELECT 'Kiki Cola 50 ml bottle', 'Kiki Cola Light 50 ml bottle' FROM DUAL UNION ALL
SELECT 'Kiki Cola 50 ml bottle', 'Coca Cola 50 ml bottle' FROM DUAL;

查詢輸出:

\n左 | 右 | 圖案     \n :--------------------- |  :--------------------------- |  :-----------\n Kiki 可樂 50 毫升瓶 |  Kiki 可樂 50 毫升瓶 | 相同的        \n Kiki 可樂 50 毫升瓶 |  50 毫升 Kiki 可樂瓶 | 交換     \n Kiki 可樂 50 毫升瓶 | 琪琪可樂 50 毫升 | 包含在\n Kiki 可樂 50 毫升瓶 |  Kiki Cola Light 50 毫升瓶| 包含    \n Kiki 可樂 50 毫升瓶 | 可口可樂 50 毫升瓶| 不一樣    \n

db<> 在這里擺弄

with t( lt, rt) as (
    select 'Kiki Cola 50 ml bottle', 'Kiki Cola 50 ml bottle' from dual union all
    select 'Kiki Cola 50 ml bottle', '50 ml Kiki Cola bottle' from dual union all
    select 'Kiki Cola 50 ml bottle', 'Kiki Cola 50 ml' from dual union all
    select 'Kiki Cola 50 ml bottle', 'Kiki Cola Light bottle 50 ml' from dual union all
    select 'Kiki Cola 50 ml bottle', 'Coca Cola 50 ml bottle' from dual ),
  q as (select rownum rn, lt, rt, 
               '"'||replace(lt, ' ', '", "')||'"' ltx, 
               '"'||replace(rt, ' ', '", "')||'"' rtx from t )
select rn, lt, rt,
       case when lt = rt           then 'same'
            when fl = 0 and fr = 0 then 'swapped'
            when fl = 1 and fr = 0 then 'contains'
            when fl = 0 and fr = 1 then 'contained in'
            else 'not same'
       end pattern
  from (
    select coalesce(l.rn, r.rn) rn,
           max(case when l.rn is null then 1 else 0 end) fl,
           max(case when r.rn is null then 1 else 0 end) fr
      from      (select rn, trim(column_value) lw from q, xmltable(ltx)) l
      full join (select rn, trim(column_value) rw from q, xmltable(rtx)) r 
        on l.rn = r.rn and l.lw = r.rw
      group by coalesce(l.rn, r.rn))
  join q using (rn)

結果:

    RN LT                     RT                           PATTERN
------ ---------------------- ---------------------------- ------------
     1 Kiki Cola 50 ml bottle Kiki Cola 50 ml bottle       same
     2 Kiki Cola 50 ml bottle 50 ml Kiki Cola bottle       swapped
     3 Kiki Cola 50 ml bottle Kiki Cola 50 ml              contained in
     4 Kiki Cola 50 ml bottle Kiki Cola Light bottle 50 ml contains
     5 Kiki Cola 50 ml bottle Coca Cola 50 ml bottle       not same

將字符串拆分為單詞(此處為 xml-way,也connect by工作或函數進行connect by ),使用完全連接進行比較,計算空值,分組並使用case when顯示模式。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM