Systems Management: Common Information Model (CIM)
Copyright © 1998 The Open Group

UNICODE Usage

Basic Character Set

All punctuators associated with object path or MOF Syntax occur within the Basic Latin range U+0000 to U+007F. These include normal punctuators, such as slashes, colons, commas, and the like. No important syntactic punctuation character occurs outside of this range.

All characters above U+007F are treated as parts of names, even though there are several reserved characters such as U+2028 and U+2029 which are logically whitespace.

Therefore, all namespace, class and property names are identifiers composed as follows:

Initial identifier characters must be in set S₁, where
S₁ = {U+005F, U+0041...U+005A, U+0061...U+007A, U+0080...U+FFEF)
This is alphabetic, plus underscore.
All following characters must be in set S₂, where
S₂ = S₁ È {U+0030 . . .U+0039}
This is alphabetic, underscore, plus Arabic numerals 0 through 9.

Note that the Unicode specials range (U+FFF0...U+FFFF) are not legal for identifiers.

While the above sub-range of U+0080 . . . U+FFEF includes many diacritical characters which would not be useful in an identifier, as well as the Unicode reserved sub-range which has not been allocated, it seems advisable for simplicity of parsers to simply treat this entire sub-range as "legal" for identifiers.

Refer to RFC2279 (see ReferencedDocuments) as an example of a Universal Transformation Format that has specific characteristics for dealing with multi-octet characters on an application-specific basis.

MOF Text

MOF files using UNICODE should contain a signature as the first two bytes of the text file, either U+FFFE or U+FEFF, depending on the byte ordering of the text file (as suggested in section 2.4 of the UNICODE specification ISO/IEC 639: 1988 - see Referenced Documents).

U+FFFE is little endian.

All MOF keywords and punctuation symbols are as described in the MOF Syntax document and are not locale-specific. All such characters are composed of characters falling in the range U+0000...U+007F, regardless of the locale of origin for the MOF or its identifiers.

Quoted Strings

In all cases where string values are needed which are not identifiers, delimiters must surround them.

The supported delimiters are U+0027 or U+0022. Once a quoted string is started using one of these delimiters, the same delimiter is used to terminate it.

In addition, the digraph U+005C ("\") followed by U+0027 """ constitutes an embedded quotation mark, not a termination of the quoted string.

The characters permitted within the quotation mark delimiters just described may fall within the range U+0001 through U+FFEF.

Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy of this publication.

Contents

Next section

Index