Reading the “new features” topic “Case insensitive database” for Oracle 12.2 it sounds as if you are back in the last century where data was stored case insensitive. But guess we are all on one page: that’s not what we want to have.
In reality this feature is not to store data indepentent but to search and sort data independent of upper or lower case or accents. That’s what “collation” stands for.
If you have ever developed an application using names, firstnames, etc. you know the problem of the “correct” syntax and therefore the need for conversions with comparing text strings. The easiest way is the “UPPER” or “LOWER” function in SQL like this:
SQL> SELECT persid, vorname, nachname FROM mitarbeiter WHERE upper(nachname) = 'MÜLLER' ORDER BY nachname, vorname; PERSID VORNAME NACHNAME ---------- -------------------- -------------------- 100242 Gerd müller 100229 Andreas Müller 100483 Carla Müller 101000 Günther Müller
That looks fine except for the sorting order. “Gerd müller” is listed before “Andreas Müller” because of a typo using a lower “m” instead of the upper “M”. That’s something you know and not a big issue. Serveral applications handle this using NLS_SORT as an session variable like this:
SQL> ALTER SESSION set NLS_SORT='XGERMAN_CI'; SQL> SELECT persid, vorname, nachname FROM mitarbeiter WHERE upper(nachname) = 'MÜLLER' ORDER BY nachname, vorname; PERSID VORNAME NACHNAME ---------- -------------------- -------------------- 100229 Andreas Müller 100483 Carla Müller 100242 Gerd müller 101000 Günther Müller
Looks fine, does it?
Not always: Due to the function “UPPER” an available index on “NACHNAME” will be ignored. And the same happens with the NLS-parameter as well. Internally the NLS_SORT is handled as a function and it is not easy to identify those changes to a session with products like Toad or SQL-Developer as the “explain plan” is using the current session and not the original one.
Back to the topic: The parameter “XGERMAN_CI” says that the german sort should be used case insensitive (CI).
Column Level Collation
The collation can be handelt on a per column basis in Oracle 12.2 per default like in the following example:
SQL> ALTER TABLE mitarbeiter MODIFY nachname COLLATE XGERMAN_AI; SQL> ALTER TABLE mitarbeiter MODIFY vorname COLLATE XGERMAN_AI; SQL> SELECT persid, vorname, nachname FROM mitarbeiter WHERE nachname like 'Müller' ORDER BY nachname, vorname; PERSID VORNAME NACHNAME ---------- -------------------- -------------------- 100229 Andreas Müller 100720 Anne Muller 100025 Blaine Muller 100483 Carla Müller 100594 Elysee Muller 100242 Gerd müller 101000 Günther Müller
In this example we were using “AI” (accent insensitive) instead of “CI” which leads into the fact that accents are ignored as well. So it doesn’t matter if I search for “Muller” or “Müller” both names are retrieved. But surely the application should know about that.
But what about the special german character “ß”? Now the difference between “GERMAN” and “XGERMAN” comes into play: with “GERMAN” the “ß” is a discrede character (sorted after the “s”) while with “XGERMAN” the “ß” is translated into “ss” like the following example explains:
SQL> SELECT persid, vorname, nachname FROM mitarbeiter WHERE nachname = 'Weiss' ORDER BY nachname, vorname; PERSID VORNAME NACHNAME ---------- -------------------- -------------------- 100645 Randi Weiß 100949 Selina Weiss
And what about indexes?
All indexes created before the change to the column do not know the new function saying that they will be ignored after the change. But for indexes newly created the function will be integrated. So there is no need to manually add function based indexes any longer.
Alternatively you can add a default collation for a table or a schema (ending up in the “case insensitive database”). But that only changes the behaviour for newly created fields (table) or newly created tables (schema). May be I can cover that in a different blog.
At this point it’s time to thank Tim Hall as his Blog “Column-Level Collation and Case-Insensitive Database in Oracle Database 12c Release 2 (12.2)” inspired me to test the functionality in more detail. And the blog shows that british colleagues do like german breweries.
The sorting of Oracle using XGERMAN_CI or just GERMAN isn’t correct, as it doesn’t respect umlauts. Umlauts are used just as a tiebreaker if there isn’t another difference between the sorted values.
Therefore, sorting by familynames ‘Pohl’ and ‘Pöhl’, ‘Pöhl’ as to be sorted later (higher), but if you include the first names like ‘Cindy’ and ‘Karl’, using familyname||firstname (as opposed to familyname, firstname, Cindy comes before Karl.
Is there a way to get this sorting correctly?