简体   繁体   English

在MySql中使用特定模式提取子字符串

[英]Extract substring with a specific pattern in MySql

I have a text field which looks like: 我有一个文本字段,看起来像:

option[A]sum[A]g3et[B]

I want to get the text which is inside the [ ] without duplicates. 我想得到[ ]里面没有重复的文本。 Meaning to get: 意思是得到:

A
B

There can't be a case of double like [ [ ] ] . 不能像[ [ ] ]这样的双重情况。

I know this is a horrible way to save data in databases. 我知道这是一种在数据库中保存数据的可怕方法。 I can not change how the data is saved. 我无法更改数据的保存方式。 I just need to get a very specific (one time) information from this column. 我只需要从本专栏中获取非常具体的(一次)信息。

I tried to do: 我试着这样做:

SELECT substring_index(substring_index(sentence, '[', -1),']', 1)
FROM (SELECT 'THIS[A] IS A TEST' AS sentence) temp;

This gives me A , but it will not work for many [] . 这给了我A ,但它对许多[]不起作用。

I thought of using regex however I don't know how many [ ] I have. 我想过使用正则表达式,但我不知道有多少[ ]

How do I do that? 我怎么做?

It is not job for DB but it is possible: 这不是DB工作,但有可能:

CREATE TABLE tab(id INT, col VARCHAR(100));           
INSERT INTO tab(id, col) 
VALUES (1, 'option[A]sum[A]g3et[B]'), (2, '[Cosi]sum[A]g3et[ZZZZ]');      

SELECT DISTINCT *
FROM (
  SELECT id, RIGHT(val, LENGTH(val) - LOCATE('[', val)) AS val
  FROM
  (
    SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ']', n.n), ']', -1) AS val
    FROM tab t 
    CROSS JOIN 
    (
     SELECT a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ) n
    WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  ) sub
) s
WHERE val <> ''
ORDER BY ID;

SqlFiddleDemo

Note: 注意:

Depending on col maximum length you may need to generate more numbers in CROSS JOIN section. 根据col最大长度,您可能需要在CROSS JOIN部分生成更多数字。 For now it is up to 100. 现在它高达100。

Output: 输出:

在此输入图像描述

How it works: 这个怎么运作:

  1. Generate number table with CROSS JOIN 使用CROSS JOIN生成数字表
  2. Split string based on ] as delimeter 基于]拆分字符串作为分隔符
  3. RIGHT(val, LENGTH(val) - LOCATE('[', val)) remove the part up to [ RIGHT(val, LENGTH(val) - LOCATE('[', val))删除部分到[
  4. filter out empty records 过滤掉空记录
  5. Get only DISTINCT values 仅获取DISTINCT

Inner most query: 最内部查询:

╔════╦══════════╗
║ id ║   val    ║
╠════╬══════════╣
║  1 ║ option[A ║
║  1 ║ sum[A    ║
║  1 ║ g3et[B   ║
║  1 ║          ║
╚════╩══════════╝

Second subquery: 第二个子查询:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║ A   ║
║  1 ║ A   ║
║  1 ║ B   ║
║  1 ║     ║
╚════╩═════╝

And outermost query: 最外层的查询:

╔════╦═════╗
║ id ║ val ║
╠════╬═════╣
║  1 ║ A   ║
║  1 ║ B   ║
╚════╩═════╝

I need the result of query per row.. not combined 我需要每行查询的结果..没有合并

So add simple: 所以添加简单:

WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  AND t.id = ?

EDIT 2: 编辑2:

see http://sqlfiddle.com/#!9/8ee95/1 your query works partially for my data. 请参阅http://sqlfiddle.com/#!9/8ee95/1您的查询部分用于我的数据。 I also changed the type to longtext. 我还将类型更改为longtext。

You want to parse JSON in MySQL. 你想在MySQL中解析JSON。 As I said before parse and get value in application layer. 正如我之前所说,解析并获得应用层的价值。 This answer is only for demo/toys purpose and will have very low performamce. 这个答案仅用于演示/玩具目的,并且性能非常低。

If you still insist on SQL solution: 如果你仍然坚持使用SQL解决方案:

SELECT id, val,s.n
FROM (
  SELECT id, RIGHT(val, LENGTH(val) - LOCATE('[', val)) AS val,n
  FROM
  (
    SELECT id, SUBSTRING_INDEX(SUBSTRING_INDEX(t.col, ']', n.n), ']', -1) AS val, n.n
    FROM (SELECT id, REPLACE(col, '[]','') as col FROM tab) t
    CROSS JOIN 
    (
     SELECT e.N * 10000 + d.N * 1000 + c.N * 100 + a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
      ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) e

    ) n
    WHERE n.n <= 1 + (LENGTH(t.col) - LENGTH(REPLACE(t.col, ']', '')))
  ) sub
) s
WHERE val <> ''
GROUP BY id, val
HAVING n <> MAX(n)
ORDER BY id,n;

SqlFiddleDemo

Output: 输出:

╔═════╦═════════════╦════╗
║ id  ║    val      ║ n  ║
╠═════╬═════════════╬════╣
║  1  ║ CE31285LV4  ║  1 ║
║  1  ║ D32E        ║  3 ║
║  1  ║ GTX750      ║  5 ║
║  1  ║ M256S       ║  7 ║
║  1  ║ H2X1T       ║  9 ║
║  1  ║ FMLANE4U4   ║ 11 ║
╚═════╩═════════════╩════╝

EDIT 3: 编辑3:

What exactly is done there? 究竟是做了什么的? Why do you need n 你为什么需要n

CROSS JOIN and entire subquery is only tally table. CROSS JOIN和整个子查询只是tally表。 That'is all. 就这些。 If MySQL has function to generate number sequence (like generate_series or prepopulated number table there is no need for CROSS JOIN . 如果MySQL具有生成数字序列的功能(如generate_series或预先填充的数字表,则不需要CROSS JOIN

Number table is needed for SUBSTRING_INDEX : SUBSTRING_INDEX需要数字表:

SUBSTRING_INDEX(str,delim,count) SUBSTRING_INDEX(STR,DELIM,计数)

Returns the substring from string str before count occurrences of the delimiter delim. 在分隔符delim的计数出现之前,从字符串str返回子字符串。 If count is positive, everything to the left of the final delimiter (counting from the left) is returned. 如果count为正数,则返回最终分隔符左侧的所有内容(从左侧开始计算)。 If count is negative, everything to the right of the final delimiter (counting from the right) is returned. 如果count为负数,则返回最终分隔符右侧的所有内容(从右侧开始计算)。 SUBSTRING_INDEX() performs a case-sensitive match when searching for delim. 搜索delim时,SUBSTRING_INDEX()执行区分大小写的匹配。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM