简体   繁体   中英

mysql collate latin1_german1_ci not working with order by

I have a mysql database where I need to perform a search on a varchar column. All data is encoded in latin1. Sometimes these columns have western accented characters in them (for me almost always French.) Using the default collation (latin1_swedish_ci) has always worked fine for me. But now I have a problem with some data containing umlauts. If I search for "nusserhof" I want mysql to return "nüsserhof", but it is not. Changing the collation to latin1_german1_ci solves the problem in the simplest sense, for instance this query works, returning all rows containing the word "nüsserhof":

select * from mytable where mycolumn like '%nusserhof%' collate latin1_german1_ci;

But if I add an order by clause it no longer works. This doesn't return any rows containing the word "nüsserhof":

select * from mytable where mycolumn like '%nusserhof%' order by mycolumn collate latin1_german1_ci;

Surprisingly, I can't find anything here or through google about this. Is this expected behavior? As a work around I'm just dropping the order by, and sorting after the select in PHP. But it seems like I should be able to get it to work.

Is this expected behavior?

Yes, it is.

In Swedish, the glyph ü represents the letter tyskt y ("German Y") and thus under latin1_swedish_ci it is a variation of the letter y rather than u . If, applying that collation, you were to search where mycolumn like '%nysserhof%' , your record containing nüsserhof would be returned.

In German, the glyph ü represents an accented variation (specifically an umlaut) of the base glyph and thus under latin1_german1_ci it is a variation of the letter u as expected. Thus you obtain the desired results when running your search under this collation.

It is because of local differences of this sort that we must choose appropriate collations for our data: no single collation can always be appropriate in the general case.

The problem that you observe when applying ORDER BY results from a misunderstanding of the COLLATE keyword: it is not part of the SELECT command (such that it instructs MySQL to use that collation for all comparisons within the command); rather, it is part of the immediately preceding string (such that it instructs MySQL to use that explicit collation for the immediately preceding string only).

That is, in your first case, the explicit latin1_german1_ci collation is applied to the '%nusserhof%' string literal with a coercibility of 0; the collation of mycolumn (which is presumably latin1_swedish_ci ) has a coercibility of 2. Since the former has a lower value, it is used when evaluating the expression.

In your second case, the explicit latin1_german1_ci collation is applied to mycolumn within the ORDER BY clause: thus the sorted results will place 'nüsserhof' between 'nu' and 'nv' instead of between 'ny' and 'nz' . However the explicit collation no longer applies to the filter expression within the WHERE clause, and so the column's default collation will apply.

If the data in mycolumn is all in the German language, you can simply change its default collation and no longer worry about specifying explicit collations within your SQL commands:

ALTER TABLE mytable MODIFY mycolumn <type> COLLATE latin1_german1_ci

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM