Oracle 12.2 Collation »

Reading the “new features” topic “Case insensitive database” for Oracle 12.2 it sounds as if you are back in the last century where data was stored case insensitive. But guess we are all on one page: that’s not what we want to have.

In reality this feature is not to store data indepentent but to search and sort data independent of upper or lower case or accents. That’s what “collation” stands for.

If you have ever developed an application using names, firstnames, etc. you know the problem of the “correct” syntax and therefore the need for conversions with comparing text strings. The easiest way is the “UPPER” or “LOWER” function in SQL like this:

SQL> SELECT persid, vorname, nachname
       FROM mitarbeiter  
      WHERE upper(nachname) = 'MÜLLER'
      ORDER BY nachname, vorname;

    PERSID VORNAME              NACHNAME 
---------- -------------------- --------------------
    100242 Gerd                 müller     
    100229 Andreas              Müller
    100483 Carla                Müller
    101000 Günther              Müller

That looks fine except for the sorting order. “Gerd müller” is listed before “Andreas Müller” because of a typo using a lower “m” instead of the upper “M”. That’s something you know and not a big issue. Serveral applications handle this using NLS_SORT as an session variable like this:

SQL> ALTER SESSION set NLS_SORT='XGERMAN_CI';

SQL> SELECT persid, vorname, nachname
       FROM mitarbeiter
      WHERE upper(nachname) = 'MÜLLER'
      ORDER BY nachname, vorname;

    PERSID VORNAME              NACHNAME
---------- -------------------- --------------------
    100229 Andreas              Müller
    100483 Carla                Müller 
    100242 Gerd                 müller 
    101000 Günther              Müller

Looks fine, does it?

Not always: Due to the function “UPPER” an available index on “NACHNAME” will be ignored. And the same happens with the NLS-parameter as well. Internally the NLS_SORT is handled as a function and it is not easy to identify those changes to a session with products like Toad or SQL-Developer as the “explain plan” is using the current session and not the original one.

Back to the topic: The parameter “XGERMAN_CI” says that the german sort should be used case insensitive (CI).

Column Level Collation

The collation can be handelt on a per column basis in Oracle 12.2 per default like in the following example:

SQL> ALTER TABLE mitarbeiter MODIFY nachname COLLATE XGERMAN_AI;
SQL> ALTER TABLE mitarbeiter MODIFY vorname COLLATE XGERMAN_AI;
SQL> SELECT persid, vorname, nachname
       FROM mitarbeiter
      WHERE nachname like 'Müller'
      ORDER BY nachname, vorname;

    PERSID VORNAME              NACHNAME
---------- -------------------- --------------------
    100229 Andreas              Müller
    100720 Anne                 Muller
    100025 Blaine               Muller
    100483 Carla                Müller
    100594 Elysee               Muller
    100242 Gerd                 müller
    101000 Günther              Müller

In this example we were using “AI” (accent insensitive) instead of “CI” which leads into the fact that accents are ignored as well. So it doesn’t matter if I search for “Muller” or “Müller” both names are retrieved. But surely the application should know about that.

But what about the special german character “ß”? Now the difference between “GERMAN” and “XGERMAN” comes into play: with “GERMAN” the “ß” is a discrede character (sorted after the “s”) while with “XGERMAN” the “ß” is translated into “ss” like the following example explains:

SQL> SELECT persid, vorname, nachname
       FROM mitarbeiter
      WHERE nachname = 'Weiss'
      ORDER BY nachname, vorname;
 
    PERSID VORNAME              NACHNAME
---------- -------------------- --------------------
    100645 Randi                Weiß
    100949 Selina               Weiss

And what about indexes?

All indexes created before the change to the column do not know the new function saying that they will be ignored after the change. But for indexes newly created the function will be integrated. So there is no need to manually add function based indexes any longer.

Alternatively you can add a default collation for a table or a schema (ending up in the “case insensitive database”). But that only changes the behaviour for newly created fields (table) or newly created tables (schema). May be I can cover that in a different blog.

At this point it’s time to thank Tim Hall as his Blog “Column-Level Collation and Case-Insensitive Database in Oracle Database 12c Release 2 (12.2)” inspired me to test the functionality in more detail. And the blog shows that british colleagues do like german breweries.

One Comment

Holger Jakobs • 21. November 2023 at 12:54

The sorting of Oracle using XGERMAN_CI or just GERMAN isn’t correct, as it doesn’t respect umlauts. Umlauts are used just as a tiebreaker if there isn’t another difference between the sorted values.

Therefore, sorting by familynames ‘Pohl’ and ‘Pöhl’, ‘Pöhl’ as to be sorted later (higher), but if you include the first names like ‘Cindy’ and ‘Karl’, using familyname||firstname (as opposed to familyname, firstname, Cindy comes before Karl.

Is there a way to get this sorting correctly?

Oracle 12.2 Collation

Column Level Collation

And what about indexes?

Teilen

Weitere interessante Artikel entdecken

One Comment