简体   繁体   English

如何在mysql中对字母数字数据进行排序?

[英]How to sort alphanumeric data in mysql?

Firstly I want to point out that I have tried almost everything. 首先,我想指出我几乎尝试过所有事情。 I am trying since last 8 hours to make my list in order, and I have applied dozen of solutions found here. 我从最近8个小时开始尝试按顺序列出我的列表,并且我已经应用了这里找到的十几个解决方案。

Here is SQL Fiddle with the sample data. 这是SQL Fiddle的示例数据。 I have found a page that manages to sort my list in the right order, and that is: 我找到了一个能够按正确的顺序对列表进行排序的页面 ,即:

1
2
2.B3
5
9
10 A-1
10 A-3
10 B-4
10 B-5
11
12
B3-43
B3-44
B3 - 48
B3 - 49
Basztowa 3
Basztowa 4
Basztowa 5
Basztowa 7
Basztowa 9
D.1
D.2
D.10
D.11
D.12
Kabaty ul. Pod lipą 4

But I am not able to reproduce this using MySQL. 但我无法使用MySQL重现这一点。

I would appreciate any help as I have no more ideas. 我会感激任何帮助,因为我没有更多的想法。 I consider using PHP to sort my list but as far as I know DBMS are optimized for this kid of operations so if it's possible I would like to avoid doing this using PHP. 我考虑使用PHP来对我的列表进行排序,但据我所知,DBMS针对这个操作孩子进行了优化,所以如果可能的话我想避免使用PHP来做这件事。

@UPDATE @UPDATE

Thanks to @Jakumi I have created two functions that helps me to solve my problem. 感谢@Jakumi我创建了两个函数来帮助我解决我的问题。

You need to create a column to store your values in sort-friendly format ( zeropadded_name ), create trigger on update and insert to fill zeropadded_name when name changes and that's all! 您需要创建一个列以按类型友好格式( zeropadded_name )存储您的值,在更新时创建触发器并在名称更改时插入以填充zeropadded_name ,这就是全部! Now just order by zeropadded_name and enjoy! 现在只需通过zeropadded_name订购并享受!

Helper functions 助手功能

  1. regex_replace - Its task is to help us sanitize value by removing all non-alphanumeric characters. regex_replace - 它的任务是通过删除所有非字母数字字符来帮助我们清理值。
  2. lpad_numbers - pads every number in our string. lpad_numbers - lpad_numbers字符串中的每个数字。 It's a bit ugly, as I don't know MySQL functions much, but hey, it works, quite fast. 这有点难看,因为我不太了解MySQL的功能,但嘿,它的工作原理非常快。

Example: 例:

SELECT lpad_numbers(regex_replace('[^a-zA-Z0-9]', ' ', 'B3 - A-5'));
#B0003A0005

DROP FUNCTION IF EXISTS regex_replace;
CREATE FUNCTION `regex_replace`(
  pattern     VARCHAR(1000)
              CHARSET utf8
              COLLATE utf8_polish_ci,
  replacement VARCHAR(1000)
              CHARSET utf8
              COLLATE utf8_polish_ci,
  original    VARCHAR(1000)
              CHARSET utf8
              COLLATE utf8_polish_ci
) RETURNS varchar(1000) CHARSET utf8
    DETERMINISTIC
BEGIN
    DECLARE temp VARCHAR(1000)
    CHARSET utf8
    COLLATE utf8_polish_ci;
    DECLARE ch VARCHAR(1)
    CHARSET utf8
    COLLATE utf8_polish_ci;
    DECLARE i INT;
    SET i = 1;
    SET temp = '';
    IF original REGEXP pattern
    THEN
      loop_label: LOOP
        IF i > CHAR_LENGTH(original)
        THEN
          LEAVE loop_label;
        END IF;
        SET ch = SUBSTRING(original, i, 1);
        IF NOT ch REGEXP pattern
        THEN
          SET temp = CONCAT(temp, ch);
        ELSE
          SET temp = CONCAT(temp, replacement);
        END IF;
        SET i = i + 1;
      END LOOP;
    ELSE
      SET temp = original;
    END IF;
    RETURN temp;
  END;

DROP FUNCTION IF EXISTS lpad_numbers;
CREATE FUNCTION `lpad_numbers`(str VARCHAR(256)) RETURNS varchar(256) CHARSET utf8 COLLATE utf8_polish_ci
BEGIN
    DECLARE i, len SMALLINT DEFAULT 1;
    DECLARE ret VARCHAR(256) DEFAULT '';
    DECLARE num VARCHAR(256) DEFAULT '';
    DECLARE c CHAR(1);

    IF str IS NULL
    THEN
      RETURN "";
    END IF;

    SET len = CHAR_LENGTH(str);
    REPEAT
      BEGIN
        SET c = MID(str, i, 1);
        IF c BETWEEN '0' AND '9'
        THEN
          SET num = c;
          SET i = i + 1;
          REPEAT
            BEGIN
              SET c = MID(str, i, 1);
              SET num = CONCAT(num, c);
              SET i = i + 1;
            END;
          UNTIL c NOT BETWEEN '0' AND '9' END REPEAT;
          SET ret = CONCAT(ret, LPAD(num, 4, '0'));
        ELSE
          SET ret = CONCAT(ret, c);
          SET i = i + 1;
        END IF;
      END;
    UNTIL i > len END REPEAT;
    RETURN ret;
  END;

splitting according to underlying structure 根据底层结构分裂

Technically, the mysql sorting mechanism works correctly but your strings are formatted in the wrong way. 从技术上讲,mysql排序机制正常工作,但您的字符串格式错误 The underlying structure of your data is something like the following ( Original column kept for ease of association to the example): 您的数据的基础结构类似于以下内容(保留Original列以便于与示例关联):

alpha1   num1 alpha2 num2 ...   Original      
            1                   1             
            2                   2             
            2      B    3       2.B3          
            5                   5             
            9                   9             
           10      A    1       10 A-1        
           10      A    3       10 A-3        
           10      B    4       10 B-4        
           10      B    5       10 B-5        
           11                   11            
           12                   12            
B           3          43       B3-43         
B           3          44       B3-44         
B           3          48       B3 - 48       
B           3          49       B3 - 49       
Basztowa    3                   Basztowa 3    
Basztowa    4                   Basztowa 4    
Basztowa    5                   Basztowa 5    
Basztowa    7                   Basztowa 7    
Basztowa    9                   Basztowa 9    
D           1                   D.1           
D           2                   D.2           
D          10                   D.10          
D          11                   D.11          
D          12                   D.12          

If you would sort them now with ORDER BY alpha1, num1, alpha2, num2 they would be sorted as you want them. 如果您现在使用ORDER BY alpha1, num1, alpha2, num2它们进行排序ORDER BY alpha1, num1, alpha2, num2它们将按您的需要进行排序。 But the already "formatted" version (the Original column) cannot be sorted easily, because the parts that shall be sorted alphabetically and the parts that shall be sorted numerically are mixed together. 但是已经“格式化”的版本( Original列)无法轻易排序,因为应按字母顺序排序的部分和应按数字排序的部分混合在一起。

zeropadding zeropadding

There is a somewhat less extensive alternative needing only one extra column where you assume no number ever goes beyond let's say 10000 and you can now replace every number (not digit!) with a zero-padded version, so 10 A-1 would become 0010A0001 (which is 0010 and A and 0001 , obviously), but I don't see this being made on-the-fly in an ORDER BY statement. 有一个稍微不那么广泛的替代方案,只需要一个额外的列,你假设没有任何数字超过让我们说10000并且你现在可以用零填充版本替换每个数字(不是数字!),所以10 A-1将成为0010A0001 (显然是0010A0001 ),但我不认为这是在ORDER BY语句中即时进行的。

But for this example, the zeropadded version (Assumption: every number < 10000): 但是对于这个例子,zeropadded版本(假设:每个数字<10000):

Original      Zeropadded 
1             0001       
2             0002       
2.B3          0002B0003  
5             0005       
9             0009       
10 A-1        0010A0001  
10 A-3        0010A0003  
10 B-4        0010B0004  
10 B-5        0010B0005  
11            0011       
12            0012       
B3-43         B00030043  
B3-44         B00030043  
B3 - 48       B00030048  
B3 - 49       B00030049  
Basztowa 3    Baztowa0003
Basztowa 4    Baztowa0004
Basztowa 5    Baztowa0005
Basztowa 7    Baztowa0007
Basztowa 9    Baztowa0009
D.1           D0001      
D.2           D0002      
D.10          D0010      
D.11          D0011      
D.12          D0012      

This would be sortable to your wishes with ORDER BY zeropadded . 这可以通过ORDER BY zeropadded对您的愿望进行排序。

So in the end, you probably have to sort in php or create more columns that help you sort via reformatting/sanitizing/splitting your input. 因此,最后,您可能需要在php中排序或创建更多列,以帮助您通过重新格式化/清理/拆分输入进行排序。

update 更新

zeropadding explained (simplified) zeropadding解释(简化)

The main idea behind zeropadding is that the natural format of numbers is different from their format in the computer. zeropadding背后的主要思想是数字的自然格式与计算机中的格式不同。 In the computer the number 2 is effectively the sequence of digits 0..0002 (so the leading zeros are included) similar 10 (0..0010). 在计算机中, 数字 2实际上是数字序列 0..0002(因此包括前导零)类似10(0..0010)。 When the computer compares numbers, it will go from left to right until it finds different digits: 当计算机比较数字时,它将从左到右,直到找到不同的数字:

0...0002
0...0010
======!.    (the ! marks the point where the first digit is different)

And then it will determine which digit is bigger or smaller. 然后它将确定哪个数字更大或更小。 In this case 0 < 1, and therefore 2 < 10. (Of course the computer uses binary, but that doesn't change the idea). 在这种情况下,0 <1,因此2 <10.(当然,计算机使用二进制,但这并没有改变想法)。

Now, a string is technically a sequence of characters . 现在,字符串在技术上是一系列字符 String comparison works slightly differently. 字符串比较工作略有不同。 When two strings are compared, they are not (left) padded, so the first character of each string is really the first character and not a padding (like a space for example). 当比较两个字符串时,它们不会(左)填充,因此每个字符串的第一个字符实际上是第一个字符而不是填充(例如空格)。 So technically the string A10 is a sequence of characters A , 1 and 0 . 因此从技术上讲, 字符串 A10是字符序列A10 And since the string comparison is used, it is "smaller" than A2 , because the string comparison doesn't see the numbers as numbers but as characters (that are digits): 并且由于使用了字符串比较,它比A2 “小”,因为字符串比较不会将数字看作数字而是作为字符(即数字):

A10
A2
=!     (the ! marks the point where the first character is different)

and because 1 < 2 as characters, A10 < A2 . 因为1 < 2为字符, A10 < A2 Now to circumvent this problem, we force the format of numbers in the string to be the same as it would be in numerical comparisons, by padding the numbers to the same length which is aligning the digits according to their place value : 现在为了避免这个问题,我们强制字符串中数字的格式与数字比较中的数字格式相同,方法是将数字填充到相同的长度,根据它们的位置值对齐数字:

A0010
A0002
===!.  (the ! marks the point where the first character is different)

Now it's effectively the same comparison you would expect in numerical comparisons. 现在,它实际上与您在数值比较中所期望的相同。 However, you have to make some assumption about the maximal length of numbers, so that you can choose the padding appropriately. 但是,您必须对数字的最大长度做出一些假设,以便您可以适当地选择填充。 Without that assumption, you'd have a problem. 没有这个假设,你就会遇到问题。

The only (logical) point that remains: When the compared string has an alphabetical character where the other has a number, what does the padding change? 剩下的唯一(逻辑)点:当比较的字符串具有字母字符而另一个具有数字时,填充会发生什么变化? The answer is: Nothing. 答案是:没什么。 We don't change numbers into letters, and numbers are smaller than letters, so everything stays in the same order in that case. 我们不会将数字更改为字母,并且数字小于字母,因此在这种情况下所有内容都保持相同的顺序。

The effect of zeropadding is: We adjust the "number" comparison in strings to be similar to the real number comparison by aligning the digit characters according their value. zeropadding的效果是:我们通过根据数字字符对齐数字字符,将字符串中的“数字”比较调整为与实数比较相似。

SELECT name FROM realestate ORDER BY name ASC;

This should sort your list in alphanumeric data... I don't see the issue. 这应该用字母数字数据对列表进行排序......我没有看到问题。

EDIT: OK, I still don't know if I really understood what is the goal of this issue (is it for a contest?), but I can submit this "twisted" query (that I hope I will never use in my career): 编辑:好的,我仍然不知道我是否真的明白这个问题的目标是什么(是为了比赛?),但我可以提交这个“扭曲”的查询(我希望我的职业生涯永远不会使用) ):

SELECT name FROM realestate
ORDER BY IF(SUBSTRING(name, 1, 2) REGEXP '[A-Z]', 100000, CAST(name AS UNSIGNED)) ASC,
SUBSTRING(name, 1, 2) ASC,
CAST(SUBSTRING(name FROM LOCATE('.', name)+1) AS UNSIGNED) ASC,
REPLACE(name, ' ', '') ASC;

Maybe someone can find an easier way, because I admit my answer is a bit complicated. 也许有人可以找到一种更简单的方法,因为我承认我的答案有点复杂。 BUT, Kamil and Jakumi solutions are much more tricky and complicated. 但是,Kamil和Jakumi解决方案更加棘手和复杂。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM