Feedback
 
Did this article resolve your question/issue?

   

Your feedback is appreciated.

Please tell us how we can make this article more useful. Please provide us a way to contact you, should we need clarification on the feedback provided or if you need further assistance.

Characters Remaining: 1025

 


Article

How to convert an existing database to Unicode UTF-8?

« Go Back

Information

 
Article Number000001296
EnvironmentProduct: Progress
Version: 9.x
Product: OpenEdge
Version: 10.x, 11x
OS: All Supported Operating Systems
Question/Problem Description
How to convert an existing database to Unicode UTF-8?
How to convert an existing Database to Unicode
How to convert an existing Database to UTF8 with dump and load.
Can an existing database be converted to UTF-8 without dump and load?
Can binary dump and binary load be used for different codepages?
Steps to Reproduce
Clarifying Information
Error Message
Defect/Enhancement Number
Cause
Resolution
To convert an existing Database to UTF8, two approaches are possible, either with or without a dump and load.

Before starting either process: PROUTIL is not aware of the database codepage (-cpinternal). It must be specified in all PROUTIL command lines that modify data to ensure that the command executes correctly and does not introduce unexpected data corruption resulting from code page differences. See Article 000032161, Inconsistent behaviour of PROUTIL when -cpinternal is not specified   
 

OPTION 1:- To convert an existing database to UTF-8 without dump and load:

1.  Compile a new version of word break table for UTF-8 to a rule number <N> in order to customize word rules.

$   proutil <dbname> -C wbreak-compiler <DLC>\prolang\convmap\utf8-bas.wbt <N>

Where:
<N> - is a number between 1 and 255.
<DLC> - Use an absolute path to DLC instead of the environment variable in the command line. 


Starting with OpenEdge 10.1A and later, this Step is not absolutely necessary, a default UTF-8 word rule file named: proword.254, is provided in the installation directory which can be used instead

2.  Either place the new created file proword.<N> in the install directory (DLC) or define the environment variable (available since Progress Version 9.0A):

PROWD<N>=<file-directory>\proword.<N> 

3.  Convert the database to UTF-8:

$   proutil <dbname>  -C convchar convert UTF-8

4.  Apply the new word-rules to the database:

$   proutil <dbname> -C word-rules <N>

Example:

$  proutil <dbname> -C word-rules 254

5.  Use the Data Administration tool to load the file: <DLC>\prolang\utf\_tran.df in order to change the database collation.

If the database was created before OpenEdge 10.1A (in example 10.0B) and _tran.df will be loaded in OpenEdge 10.1A or later versions, then the database schema needs to first be updated to 10.1A or later so that the _tran.df schema can be loaded for the database collation. 

$   proutil <database> -C updateschema

6.  Rebuild all indexes:

$   proutil <database> -C idxbuild ALL -cpinternal UTF-8


OPTION 2:- Using Dump and Load to convert the Database Collation:

A.  To convert an existing database to UTF-8 using Data Administration dump and load (ASCII):

Once Steps 3 and 4 below have been completed , they do not need to be repeated for subsequent UTF-8 databases on the same system.

1.  ASCII Dump the existing database using the Data Administration tool.

2.  Create a new empty UTF-8 database. The utf8\empty database will have the collation defined by default as basic.

$   prodb <new_database> <DLC>\prolang\utf\empty.db

3.  Compile a new version of word break table for UTF-8 to a rule number <N>.  

$   proutil <dbname> -C wbreak-compiler <DLC>\prolang\convmap\utf8-bas.wbt <N>

Where:
<N> - is a number between 1 and 255.
<DLC> - Use an absolute path to DLC instead of the environment variable in the command line. 

In OpenEdge 10.1A and later, if word rules have not been customized then this is not absolutely necessary,  a default UTF-8 word rule file named proword.254 provided in the installation directory which can be used instead

4.  Either place the new created file proword.<N> in the install directory (DLC) or define the environment variable (available since Progress Version 9.0A):

PROWD<N>=<file-directory>\proword.<N>

5.  Apply the new word-rules to the database:

$   proutil <dbname> -C word-rules <N>

Example:

$  proutil <dbname> -C word-rules 254

6.  Load the database using the Data Administration tool. 

7. Indexes need to be built after the database has been converted and the data loads have completed, if they were loaded as INACTIVE in the definition file.

$   proutil <database> -C idxbuild ALL -cpinternal UTF-8


B.  To convert an existing database to UTF-8 using binary dump and load:

When a Binary dump and load strategy is used;

  • Prior to OpenEdge 10.0A, binary dump does not record the code page of the text being written to the dump file (.bd). Use the (ASCII) Data Dictionary Dump Table Contents plus either the Data Dictionary Load Table Contents or the bulkload utility.  
  • Starting with OpenEdge 10 and later, binary dump does record the code page of the text being written to the dump file (.bd). This can only be loaded into a database that uses the same code page.  This ensures that there is no possibility of data corruption when loading the data resulting from code page differences.   Failure to load into a database with the same code page will result in error 10855:

Code page of .bd file (<namne>) does not match code page of database(<name>). (10855)
 

1.  Create an Empty database with the same codepage as the database the data was binary dumped from (eg: iso8859-1)

$   prodb <new_database> <DLC>\prolang\ame\empty.db

2. Load schema (<database>.df) files to the Empty database through the Data Dictionary

4. Load binary dump files to database copy: <new_database>.  
This step includes all steps associated with a binary load including rebuilding indexes.

$   proutil dbname -C load (dumped files.bd) -i
$   proutil dbname -C idxbuild all -TM 31 -TB 32 -SG 65 -T /tmp

5. Convert the database copy to UTF-8.

$   proutil <dbname>  -C convchar convert UTF-8

6. Assign word rules to the database copy. 

Either place the new created file proword.<N> in the install directory (DLC) or define the environment variable (available since Progress Version 9.0A):   PROWD<N>=<file-directory>\proword.<N> 

$   proutil <dbname> -C word-rules <N>

Example:

$  proutil <dbname> -C word-rules 254

7.  Use the Data Administration tool to load the file: <DLC>\prolang\utf\_tran.df in order to change the database collation.  
Otherwise the collation will not be changed. It will still be the originating database's collation, (eg ISO basic) not the utf-8 basic collation basic for example.

If the database was created before OpenEdge 10.1A (in example 10.0B) and _tran.df will be loaded in OpenEdge 10.1A or later versions, then the database schema needs to first be updated to 10.1A or later so that the _tran.df schema can be loaded for the database collation. 

$   proutil <database> -C updateschema

8. Rebuild all database indexes

$   proutil <database> -C idxbuild ALL -cpinternal UTF-8

Workaround
Notes
Attachment 
Last Modified Date2/6/2018 5:22 PM