简体   繁体   English

MySQL - 搜索模式

[英]MySQL - search for patterns

I'm trying to figure out if someone has an elegant way to look for patterns in data stored in a varchar field where a value is not known -- meaning I can't use LIKE. 我试图找出某人是否有一种优雅的方式来查找存储在varchar字段中的数据模式,其中值是未知的 - 这意味着我不能使用LIKE。 For example, say a table called test looked like this: 例如,假设一个名为test的表看起来像这样:

id, str

and the data looked like this: 并且数据看起来像这样:

1, YUUUY
2, DDDMM
3, MMMMT
4, XMXMX

and I want to do a select that will return anything where the value of str has a pattern that matches the pattern ABABA. 我想做一个选择,它将返回str的值具有与ABABA模式匹配的模式的任何内容。 ABABA here shows a pattern and not literal letters. ABABA在这里显示的是一个模式,而不是文字字母。 So the only one that matches this pattern would be id = 4. Is there a regular expression that I can use to pattern match like this? 所以匹配这个模式的唯一一个是id = 4.是否有一个正则表达式,我可以使用这样的模式匹配? To make sure I'm clear regarding the patterns: 为了确保我清楚这些模式:

The pattern for id=1 is ABBBA.  
The pattern for id=2 is AAABB.  
The pattern for id=3 is AAAAB.

When running the query, all I will know is the pattern to search for. 运行查询时,我将知道要搜索的模式。

Alternatively, if it makes it easier, I can have the table set up like: 或者,如果它更容易,我可以将表设置为:

id,c1,c2,c3,c4,c5

and the data would look like this: 并且数据看起来像这样:

1,Y,U,U,U,Y
2,D,D,D,M,M
3,M,M,M,M,T
4,X,M,X,M,X

Not sure if that makes it easier, but I think regexp is out the window if the data is set up like that. 不确定这是否更容易,但我认为如果数据设置如此,regexp就在窗外。

Unfortunately, it doesn't look like MySQL supports regex groups. 不幸的是,它看起来不像MySQL支持正则表达式组。 I was hoping you could do something like this to match ABBBA for example: 我希望你可以做这样的事情来匹配ABBBA,例如:

([A-Z])([A-Z])\2\2\1

Example here: http://regexr.com/3d8gu 示例: http//regexr.com/3d8gu

It looks like there is a MySQL plugin that might support it: 看起来有一个MySQL插件可能支持它:

https://github.com/mysqludf/lib_mysqludf_preg https://github.com/mysqludf/lib_mysqludf_preg

Here is a real hacky way to do it. 这是一个真正的hacky方式来做到这一点。

ABBBA (or YUUUY, etc): ABBBA(或YUUUY等):

SELECT id, name FROM table WHERE    
  substring(name,1,1) = substring(name,5,1) AND      
  substring(name,2,1) = substring(name,3,1) AND
  substring(name,3,1) = substring(name,4,1);

AAABB (or DDDMM, etc): AAABB(或DDDMM等):

SELECT id, name FROM table WHERE    
  substring(name,1,1) = substring(name,2,1) AND      
  substring(name,2,1) = substring(name,3,1) AND
  substring(name,4,1) = substring(name,5,1);

AAAAB (or MMMMT, etc): AAAAB(或MMMMT等):

SELECT id, name FROM table WHERE    
  substring(name,1,1) = substring(name,2,1) AND      
  substring(name,2,1) = substring(name,3,1) AND
  substring(name,3,1) = substring(name,4,1) AND
  substring(name,4,1) != substring(name,5,1);

You get the picture... 你得到的照片......

It would be similar if you separated the data into different columns. 如果将数据分成不同的列,则类似。 Instead of comparing substrings you would just compare the columns. 您只需比较列,而不是比较子字符串。

No regular expression support in MySQL to do that kind of pattern matching, no. 在MySQL中没有正则表达式支持来进行那种模式匹配,没有。

SQL wasn't specifically designed for pattern matching of strings (or patterns of values in separate columns.) SQL不是专门为字符串的模式匹配(或单独列中的值模式)而设计的。

But... we could come up with something workable, even if it's not a regular expression and it's not elegant. 但是......我们可以提出一些可行的东西,即使它不是一个正则表达式而且它不优雅。

Assuming we don't have a custom built user-defined function, and we want to use native MySQL functions and expression... 假设我们没有自定义构建的用户定义函数,我们想使用本机MySQL函数和表达式......

And assuming that the patterns we are looking for are guaranteed to consist of only two distinct characters... 并假设我们正在寻找的模式保证只包含两个不同的字符......

And assuming that we're looking at exactly five character positions... 假设我们正在查看五个角色......

And assuming that the pattern string we're matching to will always begin with the letter 'A', and the "other" letter in the pattern will also be 'B' 假设我们匹配的模式字符串将始终以字母“A”开头,并且模式中的“其他”字母也将为“B”

It wouldn't be overly ugly to do something like this: 做这样的事情不会太难看:

SELECT t.id
     , t.str
  FROM myable t         
WHERE CONCAT('A'
        ,IF(MID(t.str,2,1)=MID(t.str,1,1),'A','B')
        ,IF(MID(t.str,3,1)=MID(t.str,1,1),'A','B')
        ,IF(MID(t.str,4,1)=MID(t.str,1,1),'A','B')
        ,IF(MID(t.str,5,1)=MID(t.str,1,1),'A','B')
      ) = 'ABBBA'

The first character in the string is automatically converted to an 'A'. 字符串中的第一个字符自动转换为“A”。

The second character, if that matches the first character, then it's also an 'A' otherwise it's a 'B'. 第二个字符,如果匹配第一个字符,那么它也是'A',否则它是'B'。

We do the same thing for the third, fourth and fifth characters. 我们对第三,第四和第五个角色做同样的事情。

Concatenate the 'A' and 'B' characters into a single string, and we can now perform an equality comparison to a pattern string, consisting of 'A' and 'B', starting with an 'A'. 将'A'和'B'字符连接成一个字符串,现在我们可以对模式字符串进行相等比较,模式字符串由'A'和'B'组成,以'A'开头。

But that is going to fall apart if the stated assumptions aren't true. 但如果陈述的假设不正确,那将会分崩离析。 If str is less than five characters in length, if it contains more than two distinct characters (we'll see the first character as matching... this would see str=XYYZX as matching pattern ABBBA. (First character is automatic match to A, and the fifth character matches the first, so it's an A, and all of the other characters don't match, so they are 'B', even though they aren't the same. 如果str的长度小于5个字符,如果它包含两个以上不同的字符(我们将看到第一个字符匹配...这将看到str = XYYZX作为匹配模式ABBBA。(第一个字符是自动匹配A ,并且第五个字符与第一个字符匹配,因此它是A,并且所有其他字符都不匹配,因此它们是'B',即使它们不相同。

And so on. 等等。

We could add some additional checks. 我们可以添加一些额外的检查。

For example, to guaranteed that str is exactly five characters in length... 例如,为了保证str的长度恰好是五个字符......

AND CHAR_LENGTH(t.str)=5

Note that the default collation in MySQL is case insensitive. 请注意,MySQL中的默认排序规则不区分大小写。 That means means a str value of MmmmM would be converted to 'AAAAA', not 'ABBBA'. 这意味着MmmmM的str值将转换为'AAAAA',而不是'ABBBA'。 And a str value of MmmKk would match 'AAABB'. 并且MmmKk的str值将匹配'AAABB'。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM