Previous section.
Portable Layout Services: Context-dependent and Directional Text
Portable Layout Services:
Context-dependent and Directional Text
Copyright © 1997 The Open Group
LO_LTYPE Locale Category
This chapter describes the LO_LTYPE category, which
is in many respects an extension to the existing LC_CTYPE locale
category.
LO_LTYPE can be implemented as part of the layout object, or its
keywords may be added in the future to the existing locale categories.
The keywords define character classifications, mappings and
character attributes.
These are used by some of the reordering and shaping
algorithms embedded in the
m_transform_layout()
and
m_wtransform_layout()
functions.
In the following descriptions there are references to lists of such
characters in appropriate standards. These lists are quoted for
illustration only and do not imply dependence on a specific encoding
scheme.
Character Classifications Related to Directionality
- left_to_right
Left-to-right directionality. For example the letters A, B, C, D ... Z have
a left-to-right directionality.
- right_to_left
Right-to-left directionality. For example the letters of the Hebrew
alphabet have a right-to-left directionality.
- num_terminator
Numeric terminator required
by the directional algorithms. For example in complex-text languages, the
dollar sign or plus sign could be identified by the directional algorithm as
numeral terminators.
- num_separator
Separators of numerals of the portable character set (but not of national
numerals). The term numerals of the portable character set is used to
indicate numbers represented with the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
which are part of the POSIX portable character set. An example of
num_separator is the period, which can be used to separate numbers
represented with Arabic digits (0, 1, 2 ... 9), but is not used with Hindi
numbers (to avoid confusion with the value zero represented in
Hindic digits by a period).
- common_num_separator
Numbers separators both for the numerals of the
portable character set and for national numerals. For example, a colon can
be identified by a directional algorithm as a number separator both for
Arabic numbers as well as for Hindi numbers.
- segment_separator
Characters to be identified by a directional algorithm
as segment separator characters. A segment is a portion of text, in general
shorter than one line, embedded within a wider text that has a different
directionality.
- block_separator
Characters to be identified by a directional algorithm as
block separator characters. A block is a larger part of text (one or more
paragraphs) with a distinct directionality that may differ from the
directionality of other parts of text in a document.
Character Classifications of Control Characters
- direction_control
Characters to be classified as direction control characters,
such as those listed in the ISO/IEC 10646 standard.
Examples of direction control are:
Start Directed String (SDS) and Start Reversed String (SRS).
- sym_swap_layout
Characters to be classified as symmetrical swap layout characters, such as
those listed in the ISO/IEC 10646 standard. Examples of symmetrical swap layout characters
are INHIBIT SYMMETRIC SWAPPING and ACTIVATE SYMMETRIC SWAPPING.
- char_shape_selector
Characters to be classified as character shaping selectors, such as those
listed in the ISO/IEC 10646 standard. Examples of character shaping selectors are INHIBIT
ARABIC FORM SHAPING and ACTIVATE ARABIC FORM SHAPING.
- num_shape_selector
Characters classified as numeric shaping selectors, such as those listed in
the ISO/IEC 10646 standard. Examples of numeric shaping selectors are NATIONAL DIGIT SHAPES
and NOMINAL DIGIT SHAPES.
Character Classifications of National Numbers
- national_number
Characters to be classified as national numbers. Examples are Hindi
numerals used in Arabic countries in Arabic script, Thai numerals used in
Thai script, Chinese numerals used in Chinese vertical script, Bengali
numerals used in Bengali script.
Character Classifications of Composite Graphic Symbols
- non_spacing
Characters to form composite graphic symbols, such as a
character representing a diacritical mark in the ISO/IEC 6429 standard,
or tone-marks, upper-vowels and lower-vowels in Thai.
Mapping Keywords
- tosymmetric
This operand consists of character pairs, separated by semicolons. The
characters in each pair are separated by a comma, and the pair is surrounded
by parentheses. The first character of each pair is to be swapped with the
second one. Symmetric characters are listed in the ISO/IEC 10646 standard. Examples are
RIGHT and LEFT PARENTHESIS, GREATER THAN and LESS THAN signs, and so on.
- tonational
This maps to national digits. The operand consists of character pairs,
separated by semicolons. The characters in each pair are separated
by a comma, and the pair is surrounded by parentheses. The first character
in the pair represents a nominal digit, while the second represents a
national digit. A nominal digit is one that belongs to the set of
digits in the portable character set.
- todigit
This operand consists of character pairs, separated by semicolons. The
characters in each pair are separated by a comma and the pair is surrounded
by parentheses. The first character in the pair represents a national digit,
while the second represents a nominal digit.
Character Classification Related to Character Connectivity
- Normal_connect
-
Characters that connect both to the left and to the right.
This applies, for example, to many Arabic characters.
- R_connect
-
Characters that connect only to characters to their right
and not to the left.
In Arabic, for example, this includes characters like the
reh, dal, waw, all the lamalefs, and the alef maksoura.
- No_connect
-
Characters that do not connect to the characters neither to their
left nor to their right and cannot be overridden.
For example, all the Latin characters, the box characters
and the punctuation marks.
- No_connect-space
-
These are neither left nor right connectors, but they may be
overridden if a neighbouring character needs to be expanded.
For example, in Arabic these are the space, RSP and tail.
- Vowel_connect
-
All the connectable vowels.
Vowels do not influence connectivity, but they need special
consideration in scripts such as Arabic.
- Special1
-
Characters that need special handling.
In Arabic, the Lam character when followed by Alef will form the
ligature LamAlef, provided that no character of a class
different than the Vowel class falls in between.
- Special2
-
Characters that need special handling.
In Arabic, the Alef character when preceded by Lam will form the ligature
LamAlef.
- Special3
-
Characters that need special handling.
In Arabic, Seen, Sheen, Sad and Dad when displayed on two cells.
Special3 may differ for scripts of different languages.
Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy
of this publication.