简体   繁体   English

Select 使用正则表达式匹配的字符串的特定部分

[英]Select specific portion of string using Regex match

Please consider the below table.请考虑下表。 I am trying to retrieve only the EUR amount within the Tax strings.我试图仅检索Tax字符串中的欧元金额。 Some records vary more than the other in size, but the float numbers are always there.有些记录在大小上比其他记录变化更大,但浮点数始终存在。

OrderID    SKU      Price    Tax
****       ****     ****     [<TV<standard#21.0#false#21.36#EUR>VT>]
****       ****     ****     [<TV<standard#21.0#false#7.21#EUR>VT>]
****       ****     ****     [<TV<standard#17.0#false#5.17#EUR>VT>]

I wrote a regular expression that matches what I need: \d+\W\d+ returns me both float values within the string.我写了一个符合我需要的正则表达式: \d+\W\d+返回字符串中的两个浮点值。 In Oracle SQL I can simply get the second occurrence with a query like:在 Oracle SQL 中,我可以通过如下查询简单地获得第二次出现:

SELECT REGEXP_SUBSTR(column, '\d+\W\d+',1,2) FROM table

Using the above approach I retrieve 21.36 , 7.21 and 5.17 for those three records.使用上述方法,我检索了这三个记录的21.367.215.17

How can I achieve this with SQL Server?如何使用 SQL 服务器实现此目的?

Obviously regex would be the likely tool of choice here.显然,正则表达式可能是这里的首选工具。 But SQL Server does not have much native regex support.但是 SQL 服务器没有太多的原生正则表达式支持。 Here is a pure SQL Server solution making use of PATINDEX and CHARINDEX .这是一个使用PATINDEXCHARINDEX的纯 SQL 服务器解决方案。 It is a bit verbose, but gets the job done:这有点冗长,但可以完成工作:

SELECT
    SUBSTRING(Tax,
              CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax) + 3) + 1,
              CHARINDEX('#', Tax, CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax) + 3) + 1) -
              CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax) + 3) - 1)
FROM yourTable;

从下面的演示链接截屏

Demo 演示

Please try the following solution.请尝试以下解决方案。

The approach is using XML for tokenization of the tax column.该方法使用 XML 对税收列进行标记化。 It is producing an XML like below for each row:它为每一行生成如下所示的 XML:

<root>
  <r>[&lt;TV&lt;standard</r>
  <r>21.0</r>
  <r>false</r>
  <r>21.36</r>
  <r>EUR&gt;VT&gt;]</r>
</root>

4th r element is a m.netary value in question.第 4 个 r 元素是一个有问题的 m.netary 值。

SQL SQL

-- DDL and sample data population, start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tax VARCHAR(MAX));
INSERT INTO @tbl (Tax) VALUES
('[<TV<standard#21.0#false#21.36#EUR>VT>]'),
('[<TV<standard#21.0#false#7.21#EUR>VT>]'),
('[<TV<standard#17.0#false#5.17#EUR>VT>]');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = '#';

SELECT t.*
    , c.value('(/root/r[4]/text())[1]', 'DECIMAL(10,2)') AS result
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
        REPLACE(tax, @separator, ']]></r><r><![CDATA[') + 
        ']]></r></root>' AS XML)) AS t1(c);

Output Output

+----+-----------------------------------------+--------+
| ID |                   Tax                   | result |
+----+-----------------------------------------+--------+
|  1 | [<TV<standard#21.0#false#21.36#EUR>VT>] |  21.36 |
|  2 | [<TV<standard#21.0#false#7.21#EUR>VT>]  |   7.21 |
|  3 | [<TV<standard#17.0#false#5.17#EUR>VT>]  |   5.17 |
+----+-----------------------------------------+--------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM