Previous section.
Portable Layout Services: Context-dependent and Directional Text
Portable Layout Services:
Context-dependent and Directional Text
Copyright © 1997 The Open Group
Glossary
Some of the terms defined below are from The American National Dictionary
for Information Processing, by the Computer and Business Equipment
Manufacturers Association, 1977. These terms are identified by the acronym
ANSI following their definitions.
algorithm
A finite set of well-defined rules for the solution of a problem in a finite
number of steps. For example, a full statement of an arithmetic procedure
for evaluating sin x to a stated precision. (ANSI)
Arabic country
Any of the countries in which Arabic script is the predominant writing
system. The countries include Algeria, Bahrain, Egypt, Iraq, Jordan, Kuwait,
Lebanon, Libya, Morocco, Oman, Qatar, Saudi Arabia, Sudan, Syria, Tunisia,
United Arab Emirates, and Yemen.
Arabic numerals
The characters 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0. Contrast with
Hindi numerals. See also
Common Naming for Layout Values
.
Arabic script
A cursive script used in Arabic countries. Other writing systems such
as Latin, Japanese, and Hebrew have a cursive handwritten form, but usually
are typeset or printed in discrete letter form. Arabic script has only the
cursive form.
- Note:
- Arabic script is also used for Urdu (which is spoken in Pakistan,
Bangladesh, and India), Farsi (or Persian, which is spoken in Iran, Iraq,
and Afghanistan) and other languages that are not Arabic.
ascender
The parts of certain letters, such as b, d or f, which rise
above the top edge of other letters such as a, c and e.
Contrast with
Common Naming for Layout Values
.
ASMO
Arab Standards and Metrology Organization.
AttrObject
AttrObject is a generic object which can be a container of many opaque
objects. A locale is an example of the type of object that can be attached
to the AttrObject.
AttrObject is an object type other than an array type that can hold
values that represent the locale-specific information necessary for all
locale categories.
base shape
The form of an Arabic character that identifies it without specifying its
presentation shape. See also
Common Naming for Layout Values
and
Common Naming for Layout Values
.
bidirectional languages
Languages such as Arabic, Hebrew and Yiddish whose general flow of text
proceeds horizontally from right to left, but numbers, English and other
left-to-right language text such as addresses, acronyms and quotations are
written from left to right.
cell
A group of character elements that belong to the same
composed character.
Also called display cell.
character elements
The components of a composed character such as a "Thai
Character", namely base line consonants, upper vowels,
lower vowels, base line vowels, tone marks, diacritics, and so on.
charset
An encoding with a uniform, state-independent mapping from character
to code points.
Usually (but not necessarily), the code points are related to adequate
presentation glyphs, that when presented use associated fonts.
complex-text languages
A collective name used to designate those languages that have different
layouts for processing the text and for presenting it. The complex-text
languages include the bidirectional languages (such as Arabic, Farsi, Urdu,
Hebrew, Yiddish), and Asian languages such as Thai, Lao, Korean and the
Indian ones. Because they are dealt with separately, the languages that use
mainly an ideographic script, such as Chinese and Japanese, are excluded
from this definition.
composed character
A collection of character elements in some scripts, such as Thai
or Lao, whose presentation forms compose a glyph that occupies a
definite space called a presentation cell.
Also called combined character.
composing character element
A character element, such as a Thai tone mark, a Thai upper or
lower vowel or diacritic, that together with the non-composing
character element and possibly other composing-character
elements forms a composed character.
Sometimes called a composing character or a combining character.
In a string of character elements it follows the non-composing
character element.
When presented the composing character element does not
occupy, normally, a separate presentation cell, and it
shares the same cell with a non-composing character element and
possibly with other composing character elements.
composite sequence
A sequence of graphic characters consisting of a non-composing
character followed by one or more composing characters.
control character
A character that denotes the start, modification, or end of a control
function. A control character can be recorded for use in a subsequent
action, and it can have a graphic representation.
cursive script
Script whose adjacent characters might touch or be connected to each other.
Arabic, Farsi and Urdu scripts are always cursive, while
Latin script is cursive only in handwriting.
data entry
The method of entering data into a computer system for processing,
usually in a field-oriented environment where the entry is governed by
a program.
descender
The part of the character that extends from the baseline to the bottom of the
character cell. Examples of letters with descenders are g, j, p,
q, y and Q. Contrast with
Common Naming for Layout Values
.
deshaping
The opposite of shaping; the transformation of an Arabic language text to a
layout used for processing. The different shapes of the same character are
folded into a single, basic shape.
diacritic
Modifying mark of a character. For example, the accent marks in Latin
scripts (acute, tilde and ogonek), the vowel marks in Hebrew, and the
consonant pronunciations in Thai and Lao.
display cell
The group of character elements that form a composed character.
enable (national languages)
To design a product for economical and easy adaptation to any culture,
convention or language of the user.
encoding scheme
A set of specific definitions that describe the philosophy used to represent
character data. The number of bits, the number of bytes, the allowable
ranges of bytes, the maximum number of characters, and the meanings assigned
to some generic and specific bit patterns, are some examples of
specifications found in such a definition.
ECMA
(European Computer Manufacturers Association)
A not for profit organisation formed by European computer vendors
to promulgate standards applicable to the functional design
and use of data processing equipments.
explicit algorithm
In a bidirectional text, an algorithm that identifies segments of different
directionality, or other peculiarities of characters (such as shaping). The
algorithm uses explicit control sequences (directional and other)
embedded in the text. See also
Common Naming for Layout Values
.
field attributes
The data description governing the presentation and handling of data in the
associated data field. For example, direction (left-to-right, right-to-left)
is a field attribute important in bidirectional applications.
folding
The substitution of one graphic character for another. Folding generally
maps a larger character set into a subset, and may result in some loss of
information. For example, folding allows printing of upper-case graphic
characters when lower-case characters are not available.
global orientation
The predominant orientation of a bidirectional text. For example,
an Arabic text with a right-to-left global orientation may have some
left-to-right English names embedded in it.
glyph
A member of a set of symbols that represent data. Glyphs can be letters,
digits, punctuation marks, or other symbols.
graphic character
A member of a set of symbols that represent data. Graphic characters can be
letters, digits, punctuation marks, or other symbols. Synonymous with
glyph.
graphic symbol
The visual representation of a graphic character or of a
composite sequence.
A graphic symbol for a composite sequence generally consists of the
combination of the graphic symbols of each character in the sequence.
Hangul
The Korean alphabet that consist of fourteen basic consonants and ten basic
vowels. Hangul was created by a team of scholars in the 15th century at
the behest of King Sejong. See also
Common Naming for Layout Values
.
Hanja
The Korean term for characters derived from Chinese.
Hindi numerals
A set of numerals used in many Arabic countries instead of or in addition to
the "Arabic" ones. Hindi numeral shapes are:
which correspond to the Arabic numeral shapes of 0, 1, 2, 3, 4, 5, 6, 7, 8
and 9. Contrast with
Common Naming for Layout Values
.
ideographic language
A written language in which each character represents a thing or an idea.
An example of such a language is Chinese. Contrast with
Common Naming for Layout Values
.
implicit algorithm
An algorithm that recognises directional segments based on the
implicit characteristics of the characters. Segments are inverted
accordingly. Bidirectional text transformed using an implicit algorithm is
stored in logical order. See also
Common Naming for Layout Values
and
Common Naming for Layout Values
.
JAMO
A set of consonants and vowels used in Korean Hangul. The word JAMO (or
jamo) is derived from ja, which means consonant, and mo, which means vowel.
See also
Common Naming for Layout Values
.
language layer
A keyboard may have several language layers. For example, the Hebrew
keyboard may have three layers: Hebrew, English and APL, with each
layer supporting up to three shifts (lower-case, upper-case
and alternate shifts).
Latin alphabet
An alphabet comprising the letters a, b, c, d, e, f, g, h, i, j, k, l, m, n,
o, p, q, r, s, t, u, v, w, x, y, and z in upper-case and lower-case, with or
without accents and ligatures. Contrast with
Common Naming for Layout Values
.
layout
In this document layout stands for the layout
of a text: the direction of the segments and the shape
of the characters.
LayoutObject
An opaque object containing all the data and methods necessary to
perform the layout operations on context-dependent or directional
characters. In particular it contains a set of layout values.
layout transformation
A transformation between the layout of a text as processed
and the layout of text when presented. A layout transformation
may involve determination of embedded directional segments, segment
inversion, character shaping, or character deshaping.
layout value
A set of text attributes and processing indicators
needed by the layout transformation functions.
See also
Common Naming for Layout Values
.
ligature
A graphic character consisting of two or more characters joined together.
For example, joining A and E forms the ligature Æ. Ligatures are very
common and important in Arabic.
logical order (or logical sequence)
A bidirectional text is said to be stored in logical order if the data
elements in each segment are sequenced physically in keystroke order (order
of entry): that is the order they would be read from a screen or spoken
aloud. The segments presented with an opposite directionality to the global
orientation, need to be inverted to be stored in logical order.
lower case
The small alphabetic characters, whether accented or not, as distinguished
from the capital alphabetic characters. The concept of case also applies to
alphabets such as Cyrillic and Greek, but not to Arabic, Hebrew, Thai and
some other languages. Examples of lower-case letters are a, b and c.
Contrast with
Common Naming for Layout Values
.
monocasing
The translation of alphabetic characters
from one case (usually lower case) to their
equivalents in another case (usually upper case).
national numbers
Numbers as written in a text of a language that has its own
glyphs for digits. As an example, a Thai text may have
numbers represented by national numbers
called Thai numerals. Note that the national numbers
in the Arabic languages are the Hindi numerals and not
the Arabic numerals. See also
Common Naming for Layout Values
and
Common Naming for Layout Values
.
nesting
The situation in which a directional segment is embedded within another
directional segment. It is possible to have more than one level of nesting.
A left-to-right number can be nested, for example, within a right-to-left
Hebrew text, which itself is nested within a left-to-right English text.
non-composing character element
A character element such as a Thai consonant around
which all the other character elements are composed.
Sometimes called a non-composing character or a
non-combining character.
In a string of character elements it is the first
element of a composed character.
When presented the non-composing character element
occupies, alone or with composing elements a
presentation cell.
non-Latin-based alphabet
An alphabet that is not a Latin alphabet.
Examples are Greek and Arabic alphabets.
Contrast with
Common Naming for Layout Values
.
numbers
Numbers express either quantity (cardinal) or order (ordinal). Many
cultures have different forms for cardinal and ordinal numbers. For example
in French the cardinal number five is cinq, but the ordinal fifth is
cinquième or 5eme or 5è. Numbers are
written with symbols usually referred to as numerals. See
Common Naming for Layout Values
and
Common Naming for Layout Values
.
phonetic language
A written language in which one or more characters represent a sound.
Examples of phonetic languages are English, Greek and Russian. Contrast
with
Common Naming for Layout Values
.
physical order (or physical sequence)
A bidirectional text is said to be stored in physical
order if each data element of each segment of the text is stored
in the same physical sequence that is presented.
presentation
Printing or displaying.
presentation form
In the presentation of some scripts, a form of a graphic symbol
representing a character that depends on the position of the
character relative to other characters.
presentation layout
The layout of text when presented on a screen or on a printer. See also
Common Naming for Layout Values
.
presentation shape
The shape of a character such as an Arabic character when presented to the
user. See also
Common Naming for Layout Values
and
Common Naming for Layout Values
.
processing layout
The layout of text when processed.
push mode
An operating mode for entering text in reversed orientation where the cursor
remains stationary while new characters are typed, and the beginning of the
text is pushed as in a pocket calculator. It is also called
calculator mode.
right-to-left mode
An input mode of a bidirectional text in which the cursor moves to the left
after the entry of each character.
segment
A contiguous portion of text with one directionality that may or may not be
embedded in an other portion of text which has a different directionality.
For instance, in a bidirectional text, a left-to-right segment such as a
number can be embedded in a text that has a right-to-left directionality.
ShapeCharset
The charset used to shape text (ShapeCharset is not necessarily
identical to the encoding of the text before the shaping).
shape determination
A process that decides which of the several (up to four) shapes an Arabic
character is to be used in current context. The shapes are initial, middle,
final, and isolated. For each character, the decision is based on the
linking capabilities of current and surrounding characters. See also
Common Naming for Layout Values
and
Common Naming for Layout Values
.
shaping
The process of presenting a cursive script text with characters properly
shaped as initial, middle, final or isolated shape, according to their
context. See
Common Naming for Layout Values
and
Common Naming for Layout Values
.
symmetrical swapping
The process of exchanging some characters (such as { or < ) with their
symmetric twin character (such as } or > respectively). The symmetrical
swapping may be performed during the inversion of segments of a bidirectional
text.
text-type (or TypeOfText)
Text-type is used to indicate which reordering approach (visual, implicit or
explicit) applies to a given bidirectional text. See also
Common Naming for Layout Values
,
Common Naming for Layout Values
and
Common Naming for Layout Values
.
text attribute
Text attributes such as text-type, compliance to symmetrical swapping,
numerals shape or character shaping, describe the complex text being
transformed. The text attributes are used by the layout transformation
function. See also
Common Naming for Layout Values
and
Common Naming for Layout Values
.
upper case
The capital alphabetic characters, whether accented or not, as distinguished
from the small alphabetic characters. The concept of case also applies to
alphabets such as Cyrillic and Greek, but not to Arabic, Hebrew, Thai and
some other languages. Examples of capital letters are A, B and C.
Contrast with
Common Naming for Layout Values
.
visual data
Visual data is composed of data elements that are sequenced in the same
order that they are presented on a screen or printer.
Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy
of this publication.