Previous section.

Protocols for Interworking: XNFS, Version 3W
Copyright © 1998 The Open Group

XDR Protocol Specification

This chapter specifies a protocol that is used by many implementors of XNFS. It is derived from a document designated RFC 1014 by the ARPA Network Information Centre (see References to RFCs ).

This chapter includes only the subset of XDR that is required to define the XNFS protocols.

Introduction

XDR is a standard for the description and encoding of data. It is useful for transferring data between different computer architectures, and has been used to communicate data between many diverse machines. XDR fits into the ISO presentation layer, and is roughly analogous in purpose to X.209 (previously X.409), ISO Abstract Syntax Notation. The major difference between these two is that XDR uses implicit typing, while X.209 uses explicit typing.

XDR uses a language to describe data formats. The language can only be used to describe data; it is not a programming language. This language allows description of intricate data formats in a concise manner. The alternative of using graphical representations (itself an informal language) quickly becomes incomprehensible when faced with complexity. The XDR language itself is similar to the C language as defined in the C Programming Language, just as Courier, as defined in Courier: The Remote Procedure Call Protocol is similar to Mesa. Protocols such as RPC (Remote Procedure Call) and the NFS (Network File System) use XDR to describe the format of their data.

The XDR standard assumes that bytes (or octets) are portable (see Byte Encoding ). A data-description language is used to define XDR rather than diagrams, as languages are more formal than diagrams and lead to less ambiguous descriptions of data. There is also a close analogy between the types of XDR and a high-level language such as C or Pascal. This makes the implementation of XDR encoding and decoding modules an easier task.

A Canonical Standard

XDR's approach to standardising data representations is canonical. That is, XDR defines a single byte order (big-endian, as described in On Holy Wars and a Plea for Peace, a single floating-point representation (IEEE), and so on. Any program running on any machine can use XDR to create portable data by translating its local representation to the XDR standard representations; similarly, any program running on any machine can read portable data by translating the XDR standard representations to its local equivalents. The single standard completely decouples programs that create or send portable data from those that use or receive portable data. The advent of a new machine or a new language has no effect upon the community of existing portable data creators and users.

No data-typing is provided in the XDR language as it has a relatively high cost (encoding and interpreting the type fields) and most protocols already know what data types they are expecting. However, one can still get the benefits of data-typing using XDR. One way is to encode two things; first, a string which is the XDR data description of the encoded data, and then the encoded data itself. Another way is to assign a value to all the types in XDR, and then define a universal type which takes this value as its discriminant, and for each value, describe the corresponding data type.

Byte Encoding

The XDR standard makes the following assumption: that bytes (or octets) are portable, where a byte is defined to be 8 bits of data. A given hardware device must encode the bytes onto the various media in such a way that other hardware devices may decode the bytes without loss of meaning. For example, the Ethernet standard suggests that bytes are encoded with the least significant bit first.

Basic Block Size

The representation of all items requires a multiple of four bytes (or 32 bits) of data. Four bytes is big enough to support most machine architectures efficiently, yet is small enough to keep the encoded data to a reasonable size. The bytes are numbered 0 to n -1. The bytes are read or written to a byte stream such that byte m always precedes byte m +1. If the n bytes needed to contain the data are not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of 4. Setting these residual bytes to zero enables the same data to be encoded to the same result on all machines, allowing encoded data to be meaningfully compared or checksummed.

XDR Data Types

Each of the sections that follow describes a data type defined in the XDR standard, shows how it is declared in the language, and includes a graphic illustration of its encoding.

For each data type in the language, a general paradigm declaration is shown. Note that angle brackets (< and >) denote variable-length sequences of data, and square brackets ([ and ]) denote fixed-length sequences of data. n, m and r denote integers. For the full language specification and more formal definitions of terms such as "identifier" and "declaration", refer to The XDR Language Specification .

For some data types, more specific examples are included. A more extensive example of a data description is in Example of an XDR Data Description .

Integer

An XDR signed integer is a 32-bit datum that encodes an integer in the range [-2147483648,2147483647]. The integer is represented in two's complement notation. The most and least significant bytes are 0 and 3, respectively. Integers are declared as follows:

int identifier;

Unsigned Integer

An XDR unsigned integer is a 32-bit datum that encodes a non-negative integer in the range [0,4294967295]. It is represented by an unsigned binary number whose most and least significant bytes are 0 and 3, respectively. An unsigned integer is declared as follows:

unsigned int identifier;

Hyper Integer and Unsigned Hyper Integer

Two extensions of the integer and unsigned integer types defined previously are the 64-bit (8-byte) numbers called hyper integer and unsigned hyper integer. They are represented in two's complement notation. The most and least significant bytes are 0 and 7, respectively. Their declarations are as follows:
hyper identifier; unsigned hyper identifier;


Enumeration

Enumerations have the same representation as signed integers. Enumerations are handy for describing subsets of the integers. Enumerated data is declared as follows:

enum { name-identifier = constant, ... } identifier;

For example, the three colours red, yellow and blue could be described by an enumerated type:

enum { RED = 2, YELLOW = 3, BLUE = 5 } colors;

It is an error to encode as an enum any integer other than those that have been given assignments in the enum declaration.

Boolean

Booleans are important enough, and occur frequently enough, to warrant their own explicit type in the standard. Booleans are declared as follows:

bool identifier;

This is equivalent to:

enum { FALSE = 0, TRUE = 1 } identifier;

Fixed-Length Opaque Data

At times, fixed-length uninterpreted data needs to be passed among machines. This data is called "opaque" and is declared as follows:

opaque identifier[n];
where the constant n is the (static) number of bytes necessary to contain the opaque data. If n is not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count of the opaque object a multiple of four.

Variable-Length Opaque Data

The standard also provides for variable-length (counted) opaque data, defined as a sequence of n (numbered 0 to n -1) arbitrary bytes to be the number n encoded as an unsigned integer (as described below), and followed by the n bytes of the sequence.

Byte m of the sequence always precedes byte m +1 of the sequence, and byte 0 of the sequence always follows the sequence's length (count). If n is not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of four. Variable-length opaque data is declared in the following way:

opaque identifier<m>;

or:

opaque identifier<>;

The constant m denotes an upper bound of the number of bytes that the sequence may contain. If m is not specified, as in the second declaration, it is assumed to be 232 -1, the maximum length. The constant m would normally be found in a protocol specification. For example, a filing protocol may state that the maximum data transfer size is 8192 bytes, as follows:

opaque filedata<8192>;

This can be illustrated as follows:

It is an error to encode a length greater than the maximum described in the declaration.

String

The standard defines a string of n (numbered 0 to n -1) bytes to be the number n encoded as an unsigned integer (as described above), and followed by the n bytes of the string. Each byte must be regarded by the implementation as being 8-bit transparent data. This allows use of arbitrary character set encodings. Byte m of the string always precedes byte m +1 of the string, and byte 0 of the string always follows the string's length. If n is not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of four. Counted byte strings are declared as follows:

string object<m>;

or:

string object<>;

The constant m denotes an upper bound of the number of bytes that a string may contain. If m is not specified, as in the second declaration, it is assumed to be 232 -1, the maximum length. The constant m would normally be found in a protocol specification. For example, a filing protocol may state that a filename can be no longer than 255 bytes, as follows:

string filename<255>;

This can be illustrated as:

It is an error to encode a length greater than the maximum defined in the declaration.

Fixed-Length Array

Declarations for fixed-length arrays of homogeneous elements are in the following form:

type-name identifier[n];

Fixed-length arrays of elements, numbered 0 to n -1, are encoded by individually encoding the elements of the array in their natural order, 0 to n -1. Each element's size is a multiple of four bytes. Though all elements are of the same type, the elements may have different sizes. For example, in a fixed-length array of strings, all elements are of type string, yet each element will vary in its length.

Variable-Length Array

Counted arrays provide the ability to encode variable-length arrays of homogeneous elements. The array is encoded as the element count n (an unsigned integer) followed by the encoding of each of the array's elements, starting with element 0 and progressing through element n -1. The declaration for variable-length arrays follows this form:

type-name identifier<m>;

or:

type-name identifier<>;

The constant m specifies the maximum acceptable element count of an array; if m is not specified, as in the second declaration, it is assumed to be 232 -1.

It is an error to encode a value of n that is greater than the maximum described in the declaration.

Structure

Structures are declared as follows:

struct { component-declaration-A; component-declaration-B; ... } identifier;

The components of the structure are encoded in the order of their declaration in the structure. Each component's size is a multiple of four bytes, though the components may be different sizes.

Discriminated Union

A discriminated union is a type composed of a discriminant followed by a type selected from a set of pre-arranged types according to the value of the discriminant. The type of discriminant is either int, unsigned int or an enumerated type, such as bool. The component types are called "arms" of the union, and are preceded by the value of the discriminant which implies their encoding. Discriminated unions are declared as follows:

union switch (discriminant-declaration) { case discriminant-value-A: arm-declaration-A; case discriminant-value-B: arm-declaration-B; ... default: default-declaration; } identifier;
Each case keyword is followed by a legal value of the discriminant. The default arm is optional. If it is not specified, then a valid encoding of the union cannot take on unspecified discriminant values. The size of the implied arm is always a multiple of four bytes.

The discriminated union is encoded as its discriminant followed by the encoding of the implied arm.

Void

An XDR void is a 0-byte quantity. voids are useful for describing operations that take no data as input or no data as output. They are also useful in unions, where some arms may contain data and others do not. The declaration is simply as follows:

void;

voids are illustrated as follows:

Constant

The data declaration for a constant follows this form:

const name-identifier = n;

const is used to define a symbolic name for a constant; it does not declare any data. The symbolic constant may be used anywhere a regular constant may be used. For example, the following defines a symbolic constant DOZEN, equal to 12.

const DOZEN = 12;

Typedef

typedef does not declare any data either, but serves to define new identifiers for declaring data. The syntax is:

typedef declaration;

The new type name is actually the variable name in the declaration part of the typedef. For example, the following defines a new type called eggbox using an existing type called egg:

typedef egg eggbox[DOZEN];

Variables declared using the new type name have the same type as the new type name would have in the typedef, if it was considered a variable. For example, the following two declarations are equivalent in declaring the variable fresheggs:

eggbox fresheggs; egg fresheggs[DOZEN];

When a typedef involves a struct, enum or union definition, there is another (preferred) syntax that may be used to define the same type. In general, a typedef of the following form:

typedef <<struct, union, or enum definition>> identifier;

may be converted to the alternative form by removing the "typedef" part and placing the identifier after the struct, union or enum keyword, instead of at the end. For example, here are the two ways to define the type bool:

typedef enum { /* using typedef */ FALSE = 0, TRUE = 1 } bool;

enum bool { /* preferred alternative */ FALSE = 0, TRUE = 1 };

The second syntax is preferred because it is not necessary to wait until the end of a declaration to find the name of the new type.

Optional-data

Optional-data is one kind of union that occurs so frequently that it is given a special syntax of its own. It is declared as follows:
type-name *identifier;

This is equivalent to the following union:

union switch (bool opted) { case TRUE: type-name element; case FALSE: void; } identifier;

It is also equivalent to the following variable-length array declaration, since the boolean opted can be interpreted as the length of the array:

type-name identifier<1>;

Optional-data is not so interesting in itself, but it is very useful for describing recursive data structures such as linked-lists and trees. For example, the following defines a type stringlist that encodes lists of arbitrary length strings:

struct *stringlist { string item<>; stringlist next; };

It could have been equivalently declared as the following union:

union stringlist switch (bool opted) { case TRUE: struct { string item<>; stringlist next; } element; case FALSE: void; };

or as a variable-length array:

struct stringlist<1> { string item<>; stringlist next; };

Both of these declarations obscure the intention of the stringlist type, so the optional-data declaration is preferred over both of them. The optional-data type also has a close correlation to the way in which recursive data structures are represented in high-level languages such as Pascal or C by use of pointers. In fact, the syntax is the same as that of the C language for pointers.

The XDR Language Specification

Notational Conventions

This specification uses an extended Backus-Naur Form notation for describing the XDR language. Here is a brief description of the notation:

  1. The characters |, (, ), [, ], ", and * are special.

  2. Terminal symbols are strings of any characters surrounded by double quotes (").

  3. Non-terminal symbols are strings of non-special characters.

  4. Alternative items are separated by a vertical bar (|).

  5. Optional items are enclosed in brackets.

  6. Items are grouped together by enclosing them in parentheses.

  7. A * following an item means zero or more occurrences of that item.

For example, consider the following pattern:

"a""very"(",""very")*["cold""and"]"rainy"("day"|"night")

An infinite number of strings match this pattern. A few of them are:

"a very rainy day" "a very, very rainy day" "a very cold and rainy day" "a very, very, very cold and rainy night"

Lexical Notes

  1. Comments begin with "/*" and terminate with "*/".

  2. White space serves to separate items and is otherwise ignored.

  3. An identifier is a letter followed by an optional sequence of letters, digits or underbar "_". The case (lower or upper) of identifiers is not ignored.

  4. A constant is a sequence of one or more decimal digits, optionally preceded by a minus sign "-".

The character set is consistent with ISO 8859-1:1987.


Syntax Information

declaration: type-specifier identifier | type-specifier identifier "[" value "]" | type-specifier identifier "<" [ value ] ">" | "opaque" identifier "[" value "]" | "opaque" identifier "<" [ value ] ">" | "string" identifier "<" [ value ] ">" | type-specifier "*" identifier | "void" value: constant | identifier type-specifier: [ "unsigned" ] "int" | [ "unsigned" ] "hyper" | "bool" | enum-type-spec | struct-type-spec | union-type-spec | identifier enum-type-spec: "enum" enum-body bgcolor="#FFFFFF" enum-body bgcolor="#FFFFFF": "{" ( identifier "=" value ) ( "," identifier "=" value )* "}" struct-type-spec: "struct" struct-body bgcolor="#FFFFFF" struct-body bgcolor="#FFFFFF": "{" ( declaration ";" ) ( declaration ";" )* "}" union-type-spec: "union" union-body bgcolor="#FFFFFF"
union-body bgcolor="#FFFFFF": "switch" "(" declaration ")" "{" ( "case" value ":" declaration ";" ) ( "case" value ":" declaration ";" )* [ "default" ":" declaration ";" ] "}" constant-def: "const" identifier "=" constant ";" type-def: "typedef" declaration ";" | "enum" identifier enum-body bgcolor="#FFFFFF" ";" | "struct" identifier struct-body bgcolor="#FFFFFF" ";" | "union" identifier union-body bgcolor="#FFFFFF" ";" definition: type-def | constant-def specification: definition *

Syntax Notes

Use of XDR

Although XDR is used by many implementations of XNFS, it has been defined in this document as a tool for use in later chapters. No implementation of the XDR language is required by a server. Furthermore, an implementation of the XDR language is not constrained to use the lexical and syntactical conventions defined in this specification; in particular, other codesets and reserved words may be used in implementations that are not based on the English language.

Example of an XDR Data Description

Here is a short XDR data description of an object called a file, which might be used to transfer files from one machine to another.

const MAXUSERNAME = 32; /* max length of a user name */ const MAXFILELEN = 65535; /* max length of a file */ const MAXNAMELEN = 255; /* max length of a file name */ /* * Types of files: */ enum filekind { TEXT = 0, /* ascii data */ DATA = 1, /* raw data */ EXEC = 2 /* executable */ }; /* * File information, per kind of file: */ union filetype switch (filekind kind) { case TEXT: void; /* no extra information */ case DATA: string creator<MAXNAMELEN>; /* data creator */ case EXEC: string interpretor<MAXNAMELEN>; /* program interpretor */ }; /* * A complete file: */ struct file { string filename<MAXNAMELEN>; /* name of file */ filetype type; /* info about file */ string owner<MAXUSERNAME>; /* owner of file */ opaque data<MAXFILELEN>; /* file data */ };

Suppose now that there is a user named "john" who wants to store his lisp program "sillyprog" that contains just the data "(quit)". His file would be encoded as follows:


Offset Hex Bytes ASCII Description
0 00 00 00 09 .... Length of filename = 9
4 73 69 6c 6c sill Filename characters
8 79 70 72 6f ypro ... and more characters ...
12 67 00 00 00 g... ... and 3 zero-bytes of fill
16 00 00 00 02 .... Filekind is EXEC = 2
20 00 00 00 04 .... Length of interpreter = 4
24 6c 69 73 70 lisp Interpreter characters
28 00 00 00 04 .... Length of owner = 4
32 6a 6f 68 6e john Owner characters
36 00 00 00 06 .... Length of file data = 6
40 28 71 75 69 (qui File data bytes ...
44 74 29 00 00 t).. ... and 2 zero-bytes of fill


If, instead, "john" stored the same file in the text file "sillytext", it would be encoded as follows:


Offset Hex Bytes ASCII Description
0 00 00 00 09 .... Length of filename = 9
4 73 69 6c 6c sill Filename characters
8 79 74 65 78 ytex ... and more characters ...
12 74 00 00 00 t... ... and 3 zero-bytes of fill
16 00 00 00 00 .... Filekind is TEXT = 0
            Note: no data encoded for void
20 00 00 00 04 .... Length of owner = 4
24 6a 6f 68 6e john Owner characters
28 00 00 00 06 .... Length of file data = 6
32 28 71 75 69 (qui File data bytes ...
36 74 29 00 00 t).. ... and 2 zero-bytes of fill







Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy of this publication.

Contents Next section Index