Unicode full case mappings PreviousNext

About full case mappings

Traditionally, STRING provides as_lower and as_upper for casing Latin-1 strings. The Gobo kernel library adds full Unicode versions in class UC_TYPE (and additionally a routine for providing a title-cased string. Title casing differs from upper casing the first character and lower-casing the rest, as some Unicode characters have a different form for title casing than for upper casing). However all these routines assume that strings don't change length under case mappings. This is not always true. For example, the german word "eßen" (to eat) traditionally upper cases to "ESSEN". The upper cased version is one character longer than the lower cased version.

Except in legacy applications which expect strings not to change length, the traditional casing routines should not be used. Instead the full case mapping routines provided in this library should be used to get correct behaviour. Note that these casings are default mappings. The Unicode database provides additional locale-dependent and contextual forms (an example of the latter is final Greek sigma). These additional forms will be provided in the future on demand.

How to access the full case mapping routines provided by the library

The full case mapping routines provided by the library are defined in the deferred class ST_UNICODE_CASE_MAPPING_INTERFACE. In order to gain access to these routines you can either inherit from ST_UNICODE_FULL_CASE_MAPPING, which provides information from the latest available version of unicode. Or you can inherit from a class for a particular version of Unicode (e.g. ST_UNICODE_V5000_FULL_CASE_MAPPING for Unicode version 5.0.0).

Alternatively, you can access these classes as clients, by inheriting from ST_IMPORTED_UNICODE_FULL_CASE_MAPPING or ST_IMPORTED_UNICODE_V500_FULL_CASE_MAPPING, which provide the routines case_mapping and case_mapping_v500 respectively.

The latter method is necessary if you want to be able to use full case mappings from two different versions of Unicode within the same class.

Copyright © 2007, Colin Adams
Last Updated: 4 November 2007