LTI Procedures for Coding Ligatures in Unicode

Jan. 6, 2011

Proper encoding of ligature symbols in Unicode is a recurring issue. Of course LTI's practice is to follow the standard, MARC-21 Specifications for Record Structure, Character Sets, and Exchange Media (Dec. 2007).

According to the standard, there are two special cases of diacritic encoding that require special treatment in the conversion of MARC-8 character encoding to Unicode. Both cases relate to combining diacritics that span two characters: (1) the ligature; and, (2) the double tilde.

The problem is the apparent discrepancy between LC's standard and what in fact it actually does. For both of the above diacritics the standard calls for placing a single character representing the entire diacritic (rather than 2 halves) between the letters modified. Note 1 of the standard addresses the ligature:

"The Ligature that spans two characters is constructed of two halves in MARC-8: EB (Ligature, first half) and EC (Ligature, second half). The preferred Unicode/UTF-8 mapping is to the single character Ligature that spans two characters, U+0361. The single character Ligature is encoded between the two characters to be spanned. The two half Ligatures in Unicode, to which the Ligature has been mapped since 1996, are indicted in the mapping as alternatives, but their use is not recommended. It is expected that font support for the single character Ligature mark will be more easily obtained than for the two halves." (LTI emphasis)

Because LTI follows the standard, the ligature is coded using the single character ligature between the two spanned characters. It appears that LC and Voyager have not yet implemented the use of the single character ligature mark.