Previous section.

Portable Layout Services: Context-dependent and Directional Text

Portable Layout Services:
Context-dependent and Directional Text
Copyright © 1997 The Open Group

Glossary

Some of the terms defined below are from The American National Dictionary for Information Processing, by the Computer and Business Equipment Manufacturers Association, 1977. These terms are identified by the acronym ANSI following their definitions.

algorithm

A finite set of well-defined rules for the solution of a problem in a finite number of steps. For example, a full statement of an arithmetic procedure for evaluating sin x to a stated precision. (ANSI)

Arabic country

Any of the countries in which Arabic script is the predominant writing system. The countries include Algeria, Bahrain, Egypt, Iraq, Jordan, Kuwait, Lebanon, Libya, Morocco, Oman, Qatar, Saudi Arabia, Sudan, Syria, Tunisia, United Arab Emirates, and Yemen.

Arabic numerals

The characters 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0. Contrast with Hindi numerals. See also Common Naming for Layout Values .

Arabic script

A cursive script used in Arabic countries. Other writing systems such as Latin, Japanese, and Hebrew have a cursive handwritten form, but usually are typeset or printed in discrete letter form. Arabic script has only the cursive form.
Note:
Arabic script is also used for Urdu (which is spoken in Pakistan, Bangladesh, and India), Farsi (or Persian, which is spoken in Iran, Iraq, and Afghanistan) and other languages that are not Arabic.

ascender

The parts of certain letters, such as b, d or f, which rise above the top edge of other letters such as a, c and e. Contrast with Common Naming for Layout Values .

ASMO

Arab Standards and Metrology Organization.

AttrObject

AttrObject is a generic object which can be a container of many opaque objects. A locale is an example of the type of object that can be attached to the AttrObject. AttrObject is an object type other than an array type that can hold values that represent the locale-specific information necessary for all locale categories.

base shape

The form of an Arabic character that identifies it without specifying its presentation shape. See also Common Naming for Layout Values and Common Naming for Layout Values .

bidirectional languages

Languages such as Arabic, Hebrew and Yiddish whose general flow of text proceeds horizontally from right to left, but numbers, English and other left-to-right language text such as addresses, acronyms and quotations are written from left to right.

cell

A group of character elements that belong to the same composed character. Also called display cell.

character elements

The components of a composed character such as a "Thai Character", namely base line consonants, upper vowels, lower vowels, base line vowels, tone marks, diacritics, and so on.

charset

An encoding with a uniform, state-independent mapping from character to code points. Usually (but not necessarily), the code points are related to adequate presentation glyphs, that when presented use associated fonts.

complex-text languages

A collective name used to designate those languages that have different layouts for processing the text and for presenting it. The complex-text languages include the bidirectional languages (such as Arabic, Farsi, Urdu, Hebrew, Yiddish), and Asian languages such as Thai, Lao, Korean and the Indian ones. Because they are dealt with separately, the languages that use mainly an ideographic script, such as Chinese and Japanese, are excluded from this definition.

composed character

A collection of character elements in some scripts, such as Thai or Lao, whose presentation forms compose a glyph that occupies a definite space called a presentation cell. Also called combined character.

composing character element

A character element, such as a Thai tone mark, a Thai upper or lower vowel or diacritic, that together with the non-composing character element and possibly other composing-character elements forms a composed character. Sometimes called a composing character or a combining character. In a string of character elements it follows the non-composing character element. When presented the composing character element does not occupy, normally, a separate presentation cell, and it shares the same cell with a non-composing character element and possibly with other composing character elements.

composite sequence

A sequence of graphic characters consisting of a non-composing character followed by one or more composing characters.

control character

A character that denotes the start, modification, or end of a control function. A control character can be recorded for use in a subsequent action, and it can have a graphic representation.

cursive script

Script whose adjacent characters might touch or be connected to each other. Arabic, Farsi and Urdu scripts are always cursive, while Latin script is cursive only in handwriting.

data entry

The method of entering data into a computer system for processing, usually in a field-oriented environment where the entry is governed by a program.

descender

The part of the character that extends from the baseline to the bottom of the character cell. Examples of letters with descenders are g, j, p, q, y and Q. Contrast with Common Naming for Layout Values .

deshaping

The opposite of shaping; the transformation of an Arabic language text to a layout used for processing. The different shapes of the same character are folded into a single, basic shape.

diacritic

Modifying mark of a character. For example, the accent marks in Latin scripts (acute, tilde and ogonek), the vowel marks in Hebrew, and the consonant pronunciations in Thai and Lao.

display cell

The group of character elements that form a composed character.

enable (national languages)

To design a product for economical and easy adaptation to any culture, convention or language of the user.

encoding scheme

A set of specific definitions that describe the philosophy used to represent character data. The number of bits, the number of bytes, the allowable ranges of bytes, the maximum number of characters, and the meanings assigned to some generic and specific bit patterns, are some examples of specifications found in such a definition.

ECMA

(European Computer Manufacturers Association) A not for profit organisation formed by European computer vendors to promulgate standards applicable to the functional design and use of data processing equipments.

explicit algorithm

In a bidirectional text, an algorithm that identifies segments of different directionality, or other peculiarities of characters (such as shaping). The algorithm uses explicit control sequences (directional and other) embedded in the text. See also Common Naming for Layout Values .

field attributes

The data description governing the presentation and handling of data in the associated data field. For example, direction (left-to-right, right-to-left) is a field attribute important in bidirectional applications.

folding

The substitution of one graphic character for another. Folding generally maps a larger character set into a subset, and may result in some loss of information. For example, folding allows printing of upper-case graphic characters when lower-case characters are not available.

global orientation

The predominant orientation of a bidirectional text. For example, an Arabic text with a right-to-left global orientation may have some left-to-right English names embedded in it.

glyph

A member of a set of symbols that represent data. Glyphs can be letters, digits, punctuation marks, or other symbols.

graphic character

A member of a set of symbols that represent data. Graphic characters can be letters, digits, punctuation marks, or other symbols. Synonymous with glyph.

graphic symbol

The visual representation of a graphic character or of a composite sequence.

A graphic symbol for a composite sequence generally consists of the combination of the graphic symbols of each character in the sequence.

Hangul

The Korean alphabet that consist of fourteen basic consonants and ten basic vowels. Hangul was created by a team of scholars in the 15th century at the behest of King Sejong. See also Common Naming for Layout Values .

Hanja

The Korean term for characters derived from Chinese.

Hindi numerals

A set of numerals used in many Arabic countries instead of or in addition to the "Arabic" ones. Hindi numeral shapes are:
which correspond to the Arabic numeral shapes of 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. Contrast with Common Naming for Layout Values .

ideographic language

A written language in which each character represents a thing or an idea. An example of such a language is Chinese. Contrast with Common Naming for Layout Values .

implicit algorithm

An algorithm that recognises directional segments based on the implicit characteristics of the characters. Segments are inverted accordingly. Bidirectional text transformed using an implicit algorithm is stored in logical order. See also Common Naming for Layout Values and Common Naming for Layout Values .

JAMO

A set of consonants and vowels used in Korean Hangul. The word JAMO (or jamo) is derived from ja, which means consonant, and mo, which means vowel. See also Common Naming for Layout Values .

language layer

A keyboard may have several language layers. For example, the Hebrew keyboard may have three layers: Hebrew, English and APL, with each layer supporting up to three shifts (lower-case, upper-case and alternate shifts).

Latin alphabet

An alphabet comprising the letters a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, and z in upper-case and lower-case, with or without accents and ligatures. Contrast with Common Naming for Layout Values .

layout

In this document layout stands for the layout of a text: the direction of the segments and the shape of the characters.

LayoutObject

An opaque object containing all the data and methods necessary to perform the layout operations on context-dependent or directional characters. In particular it contains a set of layout values.

layout transformation

A transformation between the layout of a text as processed and the layout of text when presented. A layout transformation may involve determination of embedded directional segments, segment inversion, character shaping, or character deshaping.

layout value

A set of text attributes and processing indicators needed by the layout transformation functions. See also Common Naming for Layout Values .

ligature

A graphic character consisting of two or more characters joined together. For example, joining A and E forms the ligature Æ. Ligatures are very common and important in Arabic.

logical order (or logical sequence)

A bidirectional text is said to be stored in logical order if the data elements in each segment are sequenced physically in keystroke order (order of entry): that is the order they would be read from a screen or spoken aloud. The segments presented with an opposite directionality to the global orientation, need to be inverted to be stored in logical order.

lower case

The small alphabetic characters, whether accented or not, as distinguished from the capital alphabetic characters. The concept of case also applies to alphabets such as Cyrillic and Greek, but not to Arabic, Hebrew, Thai and some other languages. Examples of lower-case letters are a, b and c. Contrast with Common Naming for Layout Values .

monocasing

The translation of alphabetic characters from one case (usually lower case) to their equivalents in another case (usually upper case).

national numbers

Numbers as written in a text of a language that has its own glyphs for digits. As an example, a Thai text may have numbers represented by national numbers called Thai numerals. Note that the national numbers in the Arabic languages are the Hindi numerals and not the Arabic numerals. See also Common Naming for Layout Values and Common Naming for Layout Values .

nesting

The situation in which a directional segment is embedded within another directional segment. It is possible to have more than one level of nesting. A left-to-right number can be nested, for example, within a right-to-left Hebrew text, which itself is nested within a left-to-right English text.

non-composing character element

A character element such as a Thai consonant around which all the other character elements are composed. Sometimes called a non-composing character or a non-combining character. In a string of character elements it is the first element of a composed character. When presented the non-composing character element occupies, alone or with composing elements a presentation cell.

non-Latin-based alphabet

An alphabet that is not a Latin alphabet. Examples are Greek and Arabic alphabets. Contrast with Common Naming for Layout Values .

numbers

Numbers express either quantity (cardinal) or order (ordinal). Many cultures have different forms for cardinal and ordinal numbers. For example in French the cardinal number five is cinq, but the ordinal fifth is cinquième or 5eme or 5è. Numbers are written with symbols usually referred to as numerals. See Common Naming for Layout Values and Common Naming for Layout Values .

phonetic language

A written language in which one or more characters represent a sound. Examples of phonetic languages are English, Greek and Russian. Contrast with Common Naming for Layout Values .

physical order (or physical sequence)

A bidirectional text is said to be stored in physical order if each data element of each segment of the text is stored in the same physical sequence that is presented.

presentation

Printing or displaying.

presentation form

In the presentation of some scripts, a form of a graphic symbol representing a character that depends on the position of the character relative to other characters.

presentation layout

The layout of text when presented on a screen or on a printer. See also Common Naming for Layout Values .

presentation shape

The shape of a character such as an Arabic character when presented to the user. See also Common Naming for Layout Values and Common Naming for Layout Values .

processing layout

The layout of text when processed.

push mode

An operating mode for entering text in reversed orientation where the cursor remains stationary while new characters are typed, and the beginning of the text is pushed as in a pocket calculator. It is also called calculator mode.

right-to-left mode

An input mode of a bidirectional text in which the cursor moves to the left after the entry of each character.

segment

A contiguous portion of text with one directionality that may or may not be embedded in an other portion of text which has a different directionality. For instance, in a bidirectional text, a left-to-right segment such as a number can be embedded in a text that has a right-to-left directionality.

ShapeCharset

The charset used to shape text (ShapeCharset is not necessarily identical to the encoding of the text before the shaping).

shape determination

A process that decides which of the several (up to four) shapes an Arabic character is to be used in current context. The shapes are initial, middle, final, and isolated. For each character, the decision is based on the linking capabilities of current and surrounding characters. See also Common Naming for Layout Values and Common Naming for Layout Values .

shaping

The process of presenting a cursive script text with characters properly shaped as initial, middle, final or isolated shape, according to their context. See Common Naming for Layout Values and Common Naming for Layout Values .

symmetrical swapping

The process of exchanging some characters (such as { or < ) with their symmetric twin character (such as } or > respectively). The symmetrical swapping may be performed during the inversion of segments of a bidirectional text.

text-type (or TypeOfText)

Text-type is used to indicate which reordering approach (visual, implicit or explicit) applies to a given bidirectional text. See also Common Naming for Layout Values , Common Naming for Layout Values and Common Naming for Layout Values .

text attribute

Text attributes such as text-type, compliance to symmetrical swapping, numerals shape or character shaping, describe the complex text being transformed. The text attributes are used by the layout transformation function. See also Common Naming for Layout Values and Common Naming for Layout Values .

upper case

The capital alphabetic characters, whether accented or not, as distinguished from the small alphabetic characters. The concept of case also applies to alphabets such as Cyrillic and Greek, but not to Arabic, Hebrew, Thai and some other languages. Examples of capital letters are A, B and C. Contrast with Common Naming for Layout Values .

visual data

Visual data is composed of data elements that are sequenced in the same order that they are presented on a screen or printer.
Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy of this publication.

Contents Index