As mentioned in my blog “Character Set Conversion with CSALTER – Does that actually work?” it is however possible to change the character set of the database with CSSCAN or CSALTER, but that does not help if you saved US7ASCII characters in the database, as it is quite usual in Europe, because only the data dictionary is being adjusted but not the data in the tables.
So is nothing for it but download the data from the WE8 database and import it to a new Unicode database. As character set for the target system I again use AL32UTF8, that means a character set with at least one Byte (UTF8) and 4 Byte maximum (AL32). For the German characters this means that an Umlaut (Ä, Ö, Ü) needs 2 Byte and the “ß” even 3 Byte.
1. Try: Export/Import in DataPump
As my test databases are Oracle 11g Release 2 (11.2.0.2) in my favorite operating system OEL6, I will use DataPump, of course. With older systems it works similarly with the classic exp/imp.
First the schemas (DEMO and DEMOSTAMM) on the source system are exported and afterwards imported to the target system. The used directory object “DATA_PUMP_DIR” is by the way created automatically in Oracle 11g Release 2 when installing the database and points to the directory$ORACLE_BASE/admin/<ORACLE_SID>/dpdump.
Export Parameter File:
DUMPFILE="demoexpwe8.dmp" LOGFILE="exp_demoexpwe8.log" DIRECTORY=DATA_PUMP_DIR CONTENT=ALL SCHEMAS=('DEMOSTAMM', 'DEMO')
Import Parameter File:
DUMPFILE="demoexpwe8.dmp" LOGFILE="imp_demoimputf.log" DIRECTORY=DATA_PUMP_DIR STREAMS_CONFIGURATION=n TABLE_EXISTS_ACTION=APPEND SKIP_UNUSABLE_INDEXES=y CONTENT=ALL PARTITION_OPTIONS=none SCHEMAS=('DEMOSTAMM', 'DEMO')
The log of the import now looks like this:
... Master table "SYSTEM"."SYS_IMPORT_SCHEMA_01" successfully loaded/unloaded Starting "SYSTEM"."SYS_IMPORT_SCHEMA_01": system/******** parfile=demoimputf.dat Processing object type SCHEMA_EXPORT/USER Processing object type SCHEMA_EXPORT/SYSTEM_GRANT Processing object type SCHEMA_EXPORT/ROLE_GRANT Processing object type SCHEMA_EXPORT/DEFAULT_ROLE Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA Processing object type SCHEMA_EXPORT/SEQUENCE/SEQUENCE Processing object type SCHEMA_EXPORT/TABLE/TABLE Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA . . imported "DEMO"."POSITIONEN" 727 KB 30267 rows ORA-02374: conversion error loading table "DEMO"."AUFTRAEGE" ORA-12899: value too large for column AUFSTATUS (actual: 2, maximum: 1) ORA-02372: data for row: AUFSTATUS : 0X'DC' ORA-02374: conversion error loading table "DEMO"."AUFTRAEGE" ORA-12899: value too large for column AUFSTATUS (actual: 2, maximum: 1) ORA-02372: data for row: AUFSTATUS : 0X'DC' ... . . imported "DEMO"."AUFTRAEGE" 299.3 KB 9934 out of 10000 rows . . imported "DEMOSTAMM"."NACHNAMEN" 177.4 KB 11552 rows ... ORA-02374: conversion error loading table "DEMO"."STATUS" ORA-12899: value too large for column STATUSID (actual: 2, maximum: 1) ORA-02372: data for row: STATUSID : 0X'DC' . . imported "DEMO"."STATUS" 6.093 KB 5 out of 6 rows ... ORA-39083: Object type REF_CONSTRAINT failed to create with error: ORA-02298: cannot validate (DEMO.FK_POSITIONEN_AUFTRAEGE) - parent keys not found Failing sql is: ALTER TABLE "DEMO"."POSITIONEN" ADD CONSTRAINT "FK_POSITIONEN_AUFTRAEGE" FOREIGN KEY ("AUFID") REFERENCES "DEMO" ."AUFTRAEGE" ("AUFID") ON DELETE CASCADE ENABLE Job "SYSTEM"."SYS_IMPORT_SCHEMA_01" completed with 2 error(s) at 14:58:03
What happened?
Well: If you have a look at the error message you see that “0X DC” for the Umlaut “Ü” stands in the Unicode character set. The error message “ORA-12899: value too large for column ORDSTATUS (actual 2, maximum: 1)” gives the reason: the order status column was defined to a maximum length of 1 Byte – and the Umlaut which only need one Byte in the WE8 character set, does not fit anymore as it now takes two Byte.
By that this try failed, too!
Solution:
The tables must be adjusted according to the new character’s length. Therefore you could prolong all arrays accordingly, so make a CHAR(1) to CHAR(3) for example (for the “ß”). But since version 9i Oracle offers the possibility to define the length of a character field not in “BYTE” anymore but in “CHAR”. You call that the length semantic.
Unfortunately the DESCRIBE in SQL*Pus does not mean a length scematic, so you have to evade to the directory views.
An example:
NLS_LENGTH_SEMANTIC=BYTE
SELECT column_name, data_type, data_length, decode(char_used,'B','BYTE','C','CHAR') Semantic FROM user_tab_columns WHERE table_name = 'STATUS' COLUMN_NAME DATA_TYPE DATA_LENGTH SEMANTIC ------------------------------ ---------- ----------- ---------- STATUSID CHAR 1 BYTE KURZBESCHREIBUNG VARCHAR2 10 BYTE BESCHREIBUNG VARCHAR2 255 BYTE
In another database withNLS_LENGTH_SEMANTIC=CHAR
COLUMN_NAME DATA_TYPE DATA_LENGTH SEMANTIC ------------------------------ ---------- ----------- ---------- STATUSID CHAR 4 CHAR KURZBESCHREIBUNG VARCHAR2 40 CHAR BESCHREIBUNG VARCHAR2 1020 CHAR
The issue about this is that DataPump and also the classic import/export “take” the semantic. That means if the tables were created with BYTE they will keep this in the new database, independent from the length semantic that can be set on the system level (ALTER SYSTEM) or on the session level (ALTER SESSION).
So next the table definition must be extracted. Here you should be aware that the length semantic is not being extracted with it. With the function dbms_metadata.get_ddl these definitions can be created as a SQL command. Now it is enough to set the length semantic on the target system to “CHAR” and create the table.
Example for a SQL*Plus Script:
set heading off set feedback off set trimspool ON set pages 0 set linesize 1000 set long 2000000 spool create_tables.sql PROMPT ALTER SESSION SET NLS_LENGTH_SEMANTICS = CHAR prompt / -- -- no old storage clauses (MINEXTENTS etc.) -- EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'STORAGE',FALSE); -- -- ";" as space holder between commands -- EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'SQLTERMINATOR',TRUE); -- -- take over segment attributes like PCTFREE, etc. (set Default) -- EXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'SEGMENT_ATTRIBUTES',TRUE); SELECT DBMS_METADATA.GET_DDL ('TABLE' ,T.table_name,T.owner) FROM dba_tables T WHERE owner IN ('DEMO','DEMOSTAMM'); spool off
The created SL*Plus scriptcreate_table.sqlnow looks like this:
ALTER SESSION SET NLS_LENGTH_SEMANTICS = CHAR / ... CREATE TABLE "DEMO"."STATUS" ( "STATUSID" CHAR(1) CONSTRAINT "SYS_C003083" NOT NULL ENABLE, "SHORTDESCRIPTION" VARCHAR2(10), "DESCRIPTION" VARCHAR2(255), CONSTRAINT "PK_STATUS" PRIMARY KEY ("STATUSID") USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 TABLESPACE "USERS" ENABLE ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING TABLESPACE "USERS" ; ...
The “ALTER SESSION” was built in right away. So the script can be run on the target system directly.
Afterwards a DataPump import can follow, just take care to have the parameter TABLE_EXIST_ACTION set to APPEND or TRUNCATE and not to SKIP, so the data is actually inserted. Additionally in this case the schemas have to be created first.
But unfortunately this import fails, too!
The reason for this is, that when creating the objects the constraints (mainly foreign keys) are created and activated. An import afterwards entails the risk that the order of inserting does not meet the master to detail relation and by that “ORA-02291 integrity constraint violated – parent key not found” errors occur.
So to perform the final import, all foreign key constraints must be disabled.
You can best make yourself a little SQL*Plus script again that creates two more scripts: disable_cons.sql and enable_cons.sql.
set heading off set feedback off set trimspool ON set pages 0 set linesize 1000 spool disable_cons.sql SELECT 'ALTER TABLE '||owner||'.'||table_name||' DISABLE CONSTRAINT '||constraint_name||';' FROM all_constraints WHERE owner IN ('DEMO','DEMOSTAMM') AND constraint_type='R'; spool off spool enable_cons.sql SELECT 'ALTER TABLE '||owner||'.'||table_name||' ENABLE CONSTRAINT '||constraint_name||';' FROM all_constraints WHERE owner IN ('DEMO','DEMOSTAMM') AND constraint_type='R'; spool off
With this now the data import to the target database can finally follow.
This trial and error can only be performed on a test system, of course, which has an identical data stock to the production. Test data, encoded or invalid data here entail the risk that errors due to character’s length are not being detected. In the example this was still very obvious as CHAR(1 BYTE) literally cries for a change.
It becomes more difficult when the length of a field is set on several 100 BYTE or even worse, when the maximum size of VARCHAR2 is reached. Because this is still on 4000 BYTE and not on 4000 CHAR. That is a hard limit which can only be crossed by changing into a CLOB field.
CLOB = UCS-2
One more word about CLOB. When changing to AL32UFT8, for CLOB the UCS-2 character set is used always, that means every character becomes 2 (!) Byte long. That is why you should think about the impacts a switch to Unicode would have on to your tables with CLOB fields.
This is the advice from the Oracle documentation (Globalization Support Guide, Chapter 6):
Data in CLOB columns is stored in a format that is compatible with UCS-2 when the database character set is multibyte, such as UTF8 or AL32UTF8. This means that the storage space required for an English document doubles when the data is converted. Storage for an Asian language document in a CLOB column requires less storage space than the same document in a LONG column using UTF8, typically around 30% less, depending on the contents of the document.
In the next blog I am going to show you possibilities of simplifying this procedure (i.e. by using my favorite tool Toad) and how to minimize the application downtime.