{"id":215,"date":"2012-02-01T15:39:20","date_gmt":"2012-02-01T14:39:20","guid":{"rendered":"https:\/\/www.carajandb.com\/2012\/02\/01\/switch-from-we8-character-set-to-unicode-en\/"},"modified":"2018-02-09T16:43:31","modified_gmt":"2018-02-09T15:43:31","slug":"switch-from-we8-character-set-to-unicode-en","status":"publish","type":"post","link":"https:\/\/carajandb.com\/en\/2012\/02\/01\/switch-from-we8-character-set-to-unicode-en\/","title":{"rendered":"Switch from WE8 Character Set to Unicode"},"content":{"rendered":"<p>As mentioned in my blog \u201cCharacter Set Conversion with CSALTER \u2013 Does that actually work?\u201d it is however possible to change the character set of the database with <em>CSSCAN<\/em> or <em>CSALTER<\/em>, but that does not help if you saved US7ASCII characters in the database, as it is quite usual in Europe, because only the data dictionary is being adjusted but not the data in the tables.<br \/>\n<!--more--><br \/>\nSo is nothing for it but download the data from the WE8 database and import it to a new Unicode database. As character set for the target system I again use <em>AL32UTF8<\/em>, that means a character set with at least one Byte (<em>UTF8<\/em>) and 4 Byte maximum (<em>AL32<\/em>). For the German characters this means that an Umlaut (<em>\u00c4, \u00d6, \u00dc<\/em>) needs 2 Byte and the \u201c<em>\u00df<\/em>\u201d even 3 Byte.<\/p>\n<h2>1. Try: Export\/Import in DataPump<\/h2>\n<p>As my test databases are Oracle 11g Release 2 (11.2.0.2) in my favorite operating system OEL6, I will use DataPump, of course. With older systems it works similarly with the classic exp\/imp.<\/p>\n<p>First the schemas (<em>DEMO<\/em> and <em>DEMOSTAMM<\/em>) on the source system are exported and afterwards imported to the target system. The used directory object \u201c<samp>DATA_PUMP_DIR<\/samp>\u201d is by the way created automatically in Oracle 11g Release 2 when installing the database and points to the directory<samp><em>$ORACLE_BASE\/admin\/&lt;ORACLE_SID&gt;\/dpdump<\/em><\/samp>.<\/p>\n<h3>Export Parameter File:<\/h3>\n<pre>\r\nDUMPFILE=\"demoexpwe8.dmp\"\r\nLOGFILE=\"exp_demoexpwe8.log\"\r\nDIRECTORY=DATA_PUMP_DIR\r\nCONTENT=ALL SCHEMAS=('DEMOSTAMM', 'DEMO')\r\n<\/pre>\n<h3>Import Parameter File:<\/h3>\n<pre>\r\nDUMPFILE=\"demoexpwe8.dmp\"\r\nLOGFILE=\"imp_demoimputf.log\"\r\nDIRECTORY=DATA_PUMP_DIR\r\nSTREAMS_CONFIGURATION=n\r\nTABLE_EXISTS_ACTION=APPEND\r\nSKIP_UNUSABLE_INDEXES=y\r\nCONTENT=ALL\r\nPARTITION_OPTIONS=none\r\nSCHEMAS=('DEMOSTAMM', 'DEMO')\r\n<\/pre>\n<p>The log of the import now looks like this:<\/p>\n<pre>\r\n...\r\nMaster table \"SYSTEM\".\"SYS_IMPORT_SCHEMA_01\" successfully loaded\/unloaded\r\nStarting \"SYSTEM\".\"SYS_IMPORT_SCHEMA_01\":  system\/******** parfile=demoimputf.dat\r\nProcessing object type SCHEMA_EXPORT\/USER\r\nProcessing object type SCHEMA_EXPORT\/SYSTEM_GRANT\r\nProcessing object type SCHEMA_EXPORT\/ROLE_GRANT\r\nProcessing object type SCHEMA_EXPORT\/DEFAULT_ROLE\r\nProcessing object type SCHEMA_EXPORT\/PRE_SCHEMA\/PROCACT_SCHEMA\r\nProcessing object type SCHEMA_EXPORT\/SEQUENCE\/SEQUENCE\r\nProcessing object type SCHEMA_EXPORT\/TABLE\/TABLE\r\nProcessing object type SCHEMA_EXPORT\/TABLE\/TABLE_DATA\r\n. . imported \"DEMO\".\"POSITIONEN\"                           727 KB   30267 rows\r\nORA-02374: conversion error loading table \"DEMO\".\"AUFTRAEGE\"\r\nORA-12899: value too large for column AUFSTATUS (actual: 2, maximum: 1)\r\nORA-02372: data for row: AUFSTATUS : 0X'DC'\r\nORA-02374: conversion error loading table \"DEMO\".\"AUFTRAEGE\"\r\nORA-12899: value too large for column AUFSTATUS (actual: 2, maximum: 1)\r\nORA-02372: data for row: AUFSTATUS : 0X'DC'\r\n...\r\n. . imported \"DEMO\".\"AUFTRAEGE\"                          299.3 KB    9934 out of 10000 rows\r\n. . imported \"DEMOSTAMM\".\"NACHNAMEN\"                     177.4 KB   11552 rows\r\n...\r\nORA-02374: conversion error loading table \"DEMO\".\"STATUS\"\r\nORA-12899: value too large for column STATUSID (actual: 2, maximum: 1)\r\nORA-02372: data for row: STATUSID : 0X'DC'\r\n. . imported \"DEMO\".\"STATUS\"                             6.093 KB       5 out of 6 rows\r\n...\r\nORA-39083: Object type REF_CONSTRAINT failed to create with error:\r\nORA-02298: cannot validate (DEMO.FK_POSITIONEN_AUFTRAEGE) - parent keys not\r\nfound\r\nFailing sql is: ALTER TABLE \"DEMO\".\"POSITIONEN\" ADD CONSTRAINT \"FK_POSITIONEN_AUFTRAEGE\" FOREIGN KEY (\"AUFID\") REFERENCES\r\n\"DEMO\" .\"AUFTRAEGE\" (\"AUFID\") ON DELETE CASCADE ENABLE\r\nJob \"SYSTEM\".\"SYS_IMPORT_SCHEMA_01\" completed with 2 error(s) at 14:58:03\r\n<\/pre>\n<p>What happened?<\/p>\n<p>Well: If you have a look at the error message you see that \u201c<em>0X DC<\/em>\u201d for the Umlaut \u201c<em>\u00dc<\/em>\u201d stands in the Unicode character set. The error message <strong><em>\u201cORA-12899: value too large for column ORDSTATUS (actual 2, maximum: 1)\u201d<\/em> <\/strong>gives the reason: the order status column was defined to a maximum length of 1 Byte \u2013 and the Umlaut which only need one Byte in the WE8 character set, does not fit anymore as it now takes two Byte.<\/p>\n<p>By that this try failed, too!<\/p>\n<h2>Solution:<\/h2>\n<p>The tables must be adjusted according to the new character\u2019s length. Therefore you could prolong all arrays accordingly, so make a <em>CHAR(1)<\/em> to <em>CHAR(3)<\/em> for example (for the \u201c<em>\u00df<\/em>\u201d). But since version 9i Oracle offers the possibility to define the length of a character field not in \u201c<em>BYTE<\/em>\u201d anymore but in \u201c<em>CHAR<\/em>\u201d. You call that the length semantic.<\/p>\n<p>Unfortunately the <em>DESCRIBE<\/em> in SQL*Pus does not mean a length scematic, so you have to evade to the directory views.<\/p>\n<p>An example:<\/p>\n<p><samp><em>NLS_LENGTH_SEMANTIC=BYTE<\/em><\/samp><\/p>\n<pre>\r\nSELECT column_name,\r\n       data_type,\r\n       data_length,\r\n       decode(char_used,'B','BYTE','C','CHAR') Semantic\r\nFROM   user_tab_columns\r\nWHERE  table_name = 'STATUS'\r\nCOLUMN_NAME                    DATA_TYPE  DATA_LENGTH SEMANTIC\r\n------------------------------ ---------- ----------- ----------\r\nSTATUSID                       CHAR                 1 BYTE\r\nKURZBESCHREIBUNG               VARCHAR2            10 BYTE\r\nBESCHREIBUNG                   VARCHAR2           255 BYTE\r\n<\/pre>\n<p>In another database with<samp>NLS_LENGTH_SEMANTIC=CHAR<\/samp><\/p>\n<pre>\r\nCOLUMN_NAME                    DATA_TYPE  DATA_LENGTH SEMANTIC\r\n------------------------------ ---------- ----------- ----------\r\nSTATUSID                       CHAR                 4 CHAR\r\nKURZBESCHREIBUNG               VARCHAR2            40 CHAR\r\nBESCHREIBUNG                   VARCHAR2          1020 CHAR\r\n<\/pre>\n<p>The issue about this is that DataPump and also the classic import\/export \u201ctake\u201d the semantic. That means if the tables were created with <em>BYTE<\/em> they will keep this in the new database, independent from the length semantic that can be set on the system level (<em>ALTER SYSTEM<\/em>) or on the session level (<em>ALTER SESSION<\/em>).<\/p>\n<p>So next the table definition must be extracted. Here you should be aware that the length semantic is not being extracted with it. With the function <em>dbms_metadata.get_ddl<\/em> these definitions can be created as a SQL command. Now it is enough to set the length semantic on the target system to \u201c<em>CHAR<\/em>\u201d and create the table.<\/p>\n<h3>Example for a SQL*Plus Script:<\/h3>\n<pre>\r\nset heading off\r\nset feedback off\r\nset trimspool ON\r\nset pages 0\r\nset linesize 1000\r\nset long 2000000\r\nspool create_tables.sql\r\nPROMPT ALTER SESSION SET NLS_LENGTH_SEMANTICS = CHAR\r\nprompt \/\r\n--\r\n-- no old storage clauses (MINEXTENTS etc.)\r\n--\r\nEXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'STORAGE',FALSE);\r\n--\r\n-- \";\" as space holder between commands\r\n--\r\nEXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'SQLTERMINATOR',TRUE);\r\n--\r\n-- take over segment attributes like PCTFREE, etc. (set Default)\r\n--\r\nEXECUTE DBMS_METADATA.SET_TRANSFORM_PARAM (DBMS_METADATA.SESSION_TRANSFORM, 'SEGMENT_ATTRIBUTES',TRUE);\r\nSELECT DBMS_METADATA.GET_DDL ('TABLE' ,T.table_name,T.owner)\r\nFROM dba_tables T\r\nWHERE owner IN ('DEMO','DEMOSTAMM');\r\nspool off\r\n<\/pre>\n<p>The created SL*Plus script<samp><em>create_table.sql<\/em><\/samp>now looks like this:<\/p>\n<pre>\r\nALTER SESSION SET NLS_LENGTH_SEMANTICS = CHAR\r\n\/\r\n...\r\nCREATE TABLE \"DEMO\".\"STATUS\"\r\n(  \"STATUSID\" CHAR(1) CONSTRAINT \"SYS_C003083\" NOT NULL ENABLE,\r\n   \"SHORTDESCRIPTION\" VARCHAR2(10),\r\n   \"DESCRIPTION\" VARCHAR2(255),\r\n      CONSTRAINT \"PK_STATUS\" PRIMARY KEY (\"STATUSID\")\r\nUSING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255\r\nTABLESPACE \"USERS\"  ENABLE\r\n) SEGMENT CREATION IMMEDIATE\r\nPCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING\r\nTABLESPACE \"USERS\" ;\r\n...\r\n<\/pre>\n<p>The &#8220;<em>ALTER SESSION<\/em>&#8221; was built in right away. So the script can be run on the target system directly.<\/p>\n<p>Afterwards a DataPump import can follow, just take care to have the parameter <em>TABLE_EXIST_ACTION<\/em> set to <em>APPEND<\/em> or <em>TRUNCATE<\/em> and not to <em>SKIP<\/em>, so the data is actually inserted. Additionally in this case the schemas have to be created first.<\/p>\n<p>But unfortunately this import fails, too!<\/p>\n<p>The reason for this is, that when creating the objects the constraints (mainly foreign keys) are created and activated. An import afterwards entails the risk that the order of inserting does not meet the master to detail relation and by that <em>\u201cORA-02291 integrity constraint violated \u2013 parent key not found\u201d<\/em> errors occur.<\/p>\n<p>So to perform the final import, all foreign key constraints must be <em>disabled<\/em>.<\/p>\n<p>You can best make yourself a little SQL*Plus script again that creates two more scripts: <em>disable_cons.sql<\/em> and <em>enable_cons.sql<\/em>.<\/p>\n<pre>\r\nset heading off\r\nset feedback off\r\nset trimspool ON\r\nset pages 0\r\nset linesize 1000\r\nspool disable_cons.sql\r\nSELECT 'ALTER TABLE '||owner||'.'||table_name||' DISABLE CONSTRAINT '||constraint_name||';'\r\n  FROM all_constraints\r\n WHERE owner IN ('DEMO','DEMOSTAMM')\r\n   AND constraint_type='R';\r\nspool off\r\nspool enable_cons.sql\r\nSELECT 'ALTER TABLE '||owner||'.'||table_name||' ENABLE CONSTRAINT '||constraint_name||';'\r\n  FROM all_constraints\r\n WHERE owner IN ('DEMO','DEMOSTAMM')\r\n   AND constraint_type='R';\r\nspool off\r\n<\/pre>\n<p>With this now the data import to the target database can finally follow.<\/p>\n<p>This trial and error can only be performed on a test system, of course, which has an identical data stock to the production. Test data, encoded or invalid data here entail the risk that errors due to character\u2019s length are not being detected. In the example this was still very obvious as <em>CHAR(1 BYTE)<\/em> literally cries for a change.<\/p>\n<p>It becomes more difficult when the length of a field is set on several 100 BYTE or even worse, when the maximum size of VARCHAR2 is reached. Because this is still on 4000 BYTE and not on 4000 CHAR. That is a hard limit which can only be crossed by changing into a CLOB field.<\/p>\n<h2>CLOB = UCS-2<\/h2>\n<p>One more word about CLOB. When changing to AL32UFT8, for CLOB the UCS-2 character set is used always, that means every character becomes 2 (!) Byte long. That is why you should think about the impacts a switch to Unicode would have on to your tables with CLOB fields.<\/p>\n<p>This is the advice from the Oracle documentation (Globalization Support Guide, Chapter 6):<br \/>\n<em>Data in CLOB columns is stored in a format that is compatible with UCS-2 when the database character set is multibyte, such as UTF8 or AL32UTF8. This means that the storage space required for an English document doubles when the data is converted. Storage for an Asian language document in a CLOB column requires less storage space than the same document in a LONG column using UTF8, typically around 30% less, depending on the contents of the document.<\/em><br \/>\nIn the next blog I am going to show you possibilities of simplifying this procedure (i.e. by using my favorite tool Toad) and how to minimize the application downtime.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As mentioned in my blog \u201cCharacter Set Conversion with CSALTER \u2013 Does that actually work?\u201d it is however possible to change the character set of the database with CSSCAN or CSALTER, but that does not help if you saved US7ASCII characters in the database, as it is quite usual in Europe, because only the data dictionary is being adjusted but not the data in the tables.<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_crdt_document":"","_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[],"tags":[24,59,22,23,54],"class_list":["post-215","post","type-post","status-publish","format-standard","hentry","tag-migration","tag-nls_length_semantics","tag-oracle","tag-oracle-11g","tag-unicode"],"acf":{"occupation":"Write the Occupation of the Person","person_can-be-speaker":true,"person_is-team":true,"person_related-user-account":null,"person_ordering-number":50,"publication_speakers":null,"publication_content-language":["de"],"publication_files":null,"publication_event":"","publication_date":null},"spectra_custom_meta":{"_fgj2wp_old_id":["215"],"_yoast_wpseo_metadesc":["Unicode Migration"],"_yoast_wpseo_metakeywords":["Oracle 11g, Migration, Unicode, NLS_LENGTH_SEMANTICS"],"layout_show-author-box":["1"],"_layout_show-author-box":["field_5a64ee9cc6490"],"_edit_lock":["1518191048:3"],"_edit_last":["3"],"_vc_post_settings":["a:1:{s:10:\"vc_grid_id\";a:0:{}}"],"_yoast_wpseo_content_score":["60"],"layout_header":["light"],"_layout_header":["field_5a48111185e5f"],"layout_background":[""],"_layout_background":["field_5a481193c0995"],"layout_hide_services":["0"],"_layout_hide_services":["field_5a4811c5c0996"],"layout_show-categories-tags":["1"],"_layout_show-categories-tags":["field_5a64eee4c6491"],"blogpost_related-publications":[""],"_blogpost_related-publications":["field_5a480debdac62"],"blogpost_related-files":[""],"_blogpost_related-files":["field_5a480f377db29"],"_yoast_wpseo_primary_category":[""],"rank_math_primary_category":[""],"rank_math_description":["Unicode Migration"],"rank_math_news_sitemap_robots":["index"],"rank_math_robots":["a:1:{i:0;s:5:\"index\";}"],"astra_style_timestamp_css":["1773758492"],"rank_math_internal_links_processed":["1"],"wpil_sync_report3":["1"],"wpil_links_inbound_internal_count":["0"],"wpil_links_inbound_internal_count_data":["eJxLtDKwqq4FAAZPAf4="],"wpil_links_outbound_internal_count":["0"],"wpil_links_outbound_internal_count_data":["eJxLtDKwqq4FAAZPAf4="],"wpil_links_outbound_external_count":["0"],"wpil_links_outbound_external_count_data":["eJxLtDKwqq4FAAZPAf4="],"wpil_sync_report2_time":["2024-08-27T11:47:31+00:00"],"_uag_css_file_name":["uag-css-215.css"]},"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"Johannes Ahrends","author_link":"https:\/\/carajandb.com\/en\/author\/9aa6cdb2095bd409\/"},"uagb_comment_info":0,"uagb_excerpt":"As mentioned in my blog \u201cCharacter Set Conversion with CSALTER \u2013 Does that actually work?\u201d it is however possible to change the character set of the database with CSSCAN or CSALTER, but that does not help if you saved US7ASCII characters in the database, as it is quite usual in Europe, because only the data&hellip;","_links":{"self":[{"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/posts\/215","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/comments?post=215"}],"version-history":[{"count":0,"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/posts\/215\/revisions"}],"wp:attachment":[{"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/media?parent=215"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/categories?post=215"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/carajandb.com\/en\/wp-json\/wp\/v2\/tags?post=215"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}