Previous section.

Portable Layout Services: Context-dependent and Directional Text

Portable Layout Services:
Context-dependent and Directional Text
Copyright © 1997 The Open Group

LO_LTYPE Locale Category

This chapter describes the LO_LTYPE category, which is in many respects an extension to the existing LC_CTYPE locale category.

LO_LTYPE can be implemented as part of the layout object, or its keywords may be added in the future to the existing locale categories.

The keywords define character classifications, mappings and character attributes. These are used by some of the reordering and shaping algorithms embedded in the m_transform_layout() and m_wtransform_layout() functions. In the following descriptions there are references to lists of such characters in appropriate standards. These lists are quoted for illustration only and do not imply dependence on a specific encoding scheme.

Character Classifications Related to Directionality

left_to_right

Left-to-right directionality. For example the letters A, B, C, D ... Z have a left-to-right directionality.

right_to_left

Right-to-left directionality. For example the letters of the Hebrew alphabet have a right-to-left directionality.

num_terminator

Numeric terminator required by the directional algorithms. For example in complex-text languages, the dollar sign or plus sign could be identified by the directional algorithm as numeral terminators.

num_separator

Separators of numerals of the portable character set (but not of national numerals). The term numerals of the portable character set is used to indicate numbers represented with the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, which are part of the POSIX portable character set. An example of num_separator is the period, which can be used to separate numbers represented with Arabic digits (0, 1, 2 ... 9), but is not used with Hindi numbers (to avoid confusion with the value zero represented in Hindic digits by a period).

common_num_separator

Numbers separators both for the numerals of the portable character set and for national numerals. For example, a colon can be identified by a directional algorithm as a number separator both for Arabic numbers as well as for Hindi numbers.

segment_separator

Characters to be identified by a directional algorithm as segment separator characters. A segment is a portion of text, in general shorter than one line, embedded within a wider text that has a different directionality.

block_separator

Characters to be identified by a directional algorithm as block separator characters. A block is a larger part of text (one or more paragraphs) with a distinct directionality that may differ from the directionality of other parts of text in a document.

Character Classifications of Control Characters

direction_control

Characters to be classified as direction control characters, such as those listed in the ISO/IEC 10646 standard. Examples of direction control are: Start Directed String (SDS) and Start Reversed String (SRS).

sym_swap_layout

Characters to be classified as symmetrical swap layout characters, such as those listed in the ISO/IEC 10646 standard. Examples of symmetrical swap layout characters are INHIBIT SYMMETRIC SWAPPING and ACTIVATE SYMMETRIC SWAPPING.

char_shape_selector

Characters to be classified as character shaping selectors, such as those listed in the ISO/IEC 10646 standard. Examples of character shaping selectors are INHIBIT ARABIC FORM SHAPING and ACTIVATE ARABIC FORM SHAPING.

num_shape_selector

Characters classified as numeric shaping selectors, such as those listed in the ISO/IEC 10646 standard. Examples of numeric shaping selectors are NATIONAL DIGIT SHAPES and NOMINAL DIGIT SHAPES.

Character Classifications of National Numbers

national_number

Characters to be classified as national numbers. Examples are Hindi numerals used in Arabic countries in Arabic script, Thai numerals used in Thai script, Chinese numerals used in Chinese vertical script, Bengali numerals used in Bengali script.

Character Classifications of Composite Graphic Symbols

non_spacing

Characters to form composite graphic symbols, such as a character representing a diacritical mark in the ISO/IEC 6429 standard, or tone-marks, upper-vowels and lower-vowels in Thai.

Mapping Keywords

tosymmetric

This operand consists of character pairs, separated by semicolons. The characters in each pair are separated by a comma, and the pair is surrounded by parentheses. The first character of each pair is to be swapped with the second one. Symmetric characters are listed in the ISO/IEC 10646 standard. Examples are RIGHT and LEFT PARENTHESIS, GREATER THAN and LESS THAN signs, and so on.

tonational

This maps to national digits. The operand consists of character pairs, separated by semicolons. The characters in each pair are separated by a comma, and the pair is surrounded by parentheses. The first character in the pair represents a nominal digit, while the second represents a national digit. A nominal digit is one that belongs to the set of digits in the portable character set.

todigit

This operand consists of character pairs, separated by semicolons. The characters in each pair are separated by a comma and the pair is surrounded by parentheses. The first character in the pair represents a national digit, while the second represents a nominal digit.

Character Classification Related to Character Connectivity

Normal_connect

Characters that connect both to the left and to the right. This applies, for example, to many Arabic characters.

R_connect

Characters that connect only to characters to their right and not to the left. In Arabic, for example, this includes characters like the reh, dal, waw, all the lamalefs, and the alef maksoura.

No_connect

Characters that do not connect to the characters neither to their left nor to their right and cannot be overridden. For example, all the Latin characters, the box characters and the punctuation marks.

No_connect-space

These are neither left nor right connectors, but they may be overridden if a neighbouring character needs to be expanded. For example, in Arabic these are the space, RSP and tail.

Vowel_connect

All the connectable vowels. Vowels do not influence connectivity, but they need special consideration in scripts such as Arabic.

Special1

Characters that need special handling. In Arabic, the Lam character when followed by Alef will form the ligature LamAlef, provided that no character of a class different than the Vowel class falls in between.

Special2

Characters that need special handling. In Arabic, the Alef character when preceded by Lam will form the ligature LamAlef.

Special3

Characters that need special handling. In Arabic, Seen, Sheen, Sad and Dad when displayed on two cells.

Special3 may differ for scripts of different languages.


Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy of this publication.

Contents Next section Index