This chapter specifies a protocol that is used by many implementors of XNFS.
This chapter includes only the subset of XDR that is required to define the XNFS protocols.
XDR is a standard for the description and encoding of data. It is useful for transferring data between different computer architectures, and has been used to communicate data between many diverse machines. XDR fits into the ISO presentation layer, and is roughly analogous in purpose to X.209 (previously X.409), ISO Abstract Syntax Notation. The major difference between these two is that XDR uses implicit typing, while X.209 uses explicit typing.
XDR uses a language to describe data formats. The language can only be used to describe data; it is not a programming language. This language allows description of intricate data formats in a concise manner. The alternative of using graphical representations (itself an informal language) quickly becomes incomprehensible when faced with complexity. The XDR language itself is similar to the C language as defined in the C Programming Language, just as Courier, as defined in Courier: The Remote Procedure Call Protocol is similar to Mesa. Protocols such as RPC (Remote Procedure Call) and the NFS (Network File System) use XDR to describe the format of their data.
The XDR standard assumes that bytes (or octets) are portable (see
XDR's approach to standardising data representations is canonical. That is, XDR defines a single byte order (big-endian, as described in On Holy Wars and a Plea for Peace, a single floating-point representation (IEEE), and so on. Any program running on any machine can use XDR to create portable data by translating its local representation to the XDR standard representations; similarly, any program running on any machine can read portable data by translating the XDR standard representations to its local equivalents. The single standard completely decouples programs that create or send portable data from those that use or receive portable data. The advent of a new machine or a new language has no effect upon the community of existing portable data creators and users.
No data-typing is provided in the XDR language as it has a relatively high cost (encoding and interpreting the type fields) and most protocols already know what data types they are expecting. However, one can still get the benefits of data-typing using XDR. One way is to encode two things; first, a string which is the XDR data description of the encoded data, and then the encoded data itself. Another way is to assign a value to all the types in XDR, and then define a universal type which takes this value as its discriminant, and for each value, describe the corresponding data type.
The XDR standard makes the following assumption: that bytes (or octets) are portable, where a byte is defined to be 8 bits of data. A given hardware device must encode the bytes onto the various media in such a way that other hardware devices may decode the bytes without loss of meaning. For example, the Ethernet standard suggests that bytes are encoded with the least significant bit first.
The representation of all items requires a multiple of four bytes (or 32 bits) of data. Four bytes is big enough to support most machine architectures efficiently, yet is small enough to keep the encoded data to a reasonable size. The bytes are numbered 0 to n -1. The bytes are read or written to a byte stream such that byte m always precedes byte m +1. If the n bytes needed to contain the data are not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of 4. Setting these residual bytes to zero enables the same data to be encoded to the same result on all machines, allowing encoded data to be meaningfully compared or checksummed.
Each of the sections that follow describes a data type defined in the XDR standard, shows how it is declared in the language, and includes a graphic illustration of its encoding.
For each data type in the language, a general paradigm
declaration is shown.
Note that angle brackets (< and >) denote
variable-length sequences of data, and square brackets ([ and ]) denote
fixed-length sequences of data.
n, m and r denote integers.
For the full language specification and more formal definitions of
terms such as "identifier" and "declaration", refer to
For some data types, more specific examples are included.
A more extensive example of a data description is in
An XDR signed integer is a 32-bit datum that encodes an integer in the range [-2147483648,2147483647]. The integer is represented in two's complement notation. The most and least significant bytes are 0 and 3, respectively. Integers are declared as follows:
An XDR unsigned integer is a 32-bit datum that encodes a non-negative integer in the range [0,4294967295]. It is represented by an unsigned binary number whose most and least significant bytes are 0 and 3, respectively. An unsigned integer is declared as follows:
Enumerations have the same representation as signed integers. Enumerations are handy for describing subsets of the integers. Enumerated data is declared as follows:
For example, the three colours red, yellow and blue could be described by an enumerated type:
It is an error to encode as an enum any integer other than those that have been given assignments in the enum declaration.
Booleans are important enough, and occur frequently enough, to warrant their own explicit type in the standard. Booleans are declared as follows:
This is equivalent to:
At times, fixed-length uninterpreted data needs to be passed among machines. This data is called "opaque" and is declared as follows:
The standard also provides for variable-length (counted) opaque data, defined as a sequence of n (numbered 0 to n -1) arbitrary bytes to be the number n encoded as an unsigned integer (as described below), and followed by the n bytes of the sequence.
Byte m of the sequence always precedes byte m +1 of the sequence, and byte 0 of the sequence always follows the sequence's length (count). If n is not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of four. Variable-length opaque data is declared in the following way:
or:
The constant m denotes an upper bound of the number of bytes that the sequence may contain. If m is not specified, as in the second declaration, it is assumed to be 232 -1, the maximum length. The constant m would normally be found in a protocol specification. For example, a filing protocol may state that the maximum data transfer size is 8192 bytes, as follows:
This can be illustrated as follows:
It is an error to encode a length greater than the maximum described in the declaration.
The standard defines a string of n (numbered 0 to n -1) bytes to be the number n encoded as an unsigned integer (as described above), and followed by the n bytes of the string. Each byte must be regarded by the implementation as being 8-bit transparent data. This allows use of arbitrary character set encodings. Byte m of the string always precedes byte m +1 of the string, and byte 0 of the string always follows the string's length. If n is not a multiple of four, then the n bytes are followed by enough (0 to 3) residual zero bytes, r, to make the total byte count a multiple of four. Counted byte strings are declared as follows:
or:
The constant m denotes an upper bound of the number of bytes that a string may contain. If m is not specified, as in the second declaration, it is assumed to be 232 -1, the maximum length. The constant m would normally be found in a protocol specification. For example, a filing protocol may state that a filename can be no longer than 255 bytes, as follows:
This can be illustrated as:
It is an error to encode a length greater than the maximum defined in the declaration.
Declarations for fixed-length arrays of homogeneous elements are in the following form:
Fixed-length arrays of elements, numbered 0 to n -1, are encoded by individually encoding the elements of the array in their natural order, 0 to n -1. Each element's size is a multiple of four bytes. Though all elements are of the same type, the elements may have different sizes. For example, in a fixed-length array of strings, all elements are of type string, yet each element will vary in its length.
Counted arrays provide the ability to encode variable-length arrays of homogeneous elements. The array is encoded as the element count n (an unsigned integer) followed by the encoding of each of the array's elements, starting with element 0 and progressing through element n -1. The declaration for variable-length arrays follows this form:
or:
The constant m specifies the maximum acceptable element count of an array; if m is not specified, as in the second declaration, it is assumed to be 232 -1.
It is an error to encode a value of n that is greater than the maximum described in the declaration.
Structures are declared as follows:
The components of the structure are encoded in the order of their declaration in the structure. Each component's size is a multiple of four bytes, though the components may be different sizes.
A discriminated union is a type composed of a discriminant followed by a type selected from a set of pre-arranged types according to the value of the discriminant. The type of discriminant is either int, unsigned int or an enumerated type, such as bool. The component types are called "arms" of the union, and are preceded by the value of the discriminant which implies their encoding. Discriminated unions are declared as follows:
The discriminated union is encoded as its discriminant followed by the encoding of the implied arm.
An XDR void is a 0-byte quantity. voids are useful for describing operations that take no data as input or no data as output. They are also useful in unions, where some arms may contain data and others do not. The declaration is simply as follows:
voids are illustrated as follows:
The data declaration for a constant follows this form:
const is used to define a symbolic name for a constant; it does not declare any data. The symbolic constant may be used anywhere a regular constant may be used. For example, the following defines a symbolic constant DOZEN, equal to 12.
typedef does not declare any data either, but serves to define new identifiers for declaring data. The syntax is:
The new type name is actually the variable name in the declaration part of the typedef. For example, the following defines a new type called eggbox using an existing type called egg:
Variables declared using the new type name have the same type as the new type name would have in the typedef, if it was considered a variable. For example, the following two declarations are equivalent in declaring the variable fresheggs:
When a typedef involves a struct, enum or union definition, there is another (preferred) syntax that may be used to define the same type. In general, a typedef of the following form:
may be converted to the alternative form by removing the "typedef" part and placing the identifier after the struct, union or enum keyword, instead of at the end. For example, here are the two ways to define the type bool:
enum bool { /* preferred alternative */ FALSE = 0, TRUE = 1 };
The second syntax is preferred because it is not necessary to wait until the end of a declaration to find the name of the new type.
This is equivalent to the following union:
It is also equivalent to the following variable-length array declaration, since the boolean opted can be interpreted as the length of the array:
Optional-data is not so interesting in itself, but it is very useful for describing recursive data structures such as linked-lists and trees. For example, the following defines a type stringlist that encodes lists of arbitrary length strings:
It could have been equivalently declared as the following union:
or as a variable-length array:
Both of these declarations obscure the intention of the stringlist type, so the optional-data declaration is preferred over both of them. The optional-data type also has a close correlation to the way in which recursive data structures are represented in high-level languages such as Pascal or C by use of pointers. In fact, the syntax is the same as that of the C language for pointers.
This specification uses an extended Backus-Naur Form notation for describing the XDR language. Here is a brief description of the notation:
For example, consider the following pattern:
An infinite number of strings match this pattern. A few of them are:
The character set is consistent with ISO 8859-1:1987.
Although XDR is used by many implementations of XNFS, it has been defined in this document as a tool for use in later chapters. No implementation of the XDR language is required by a server. Furthermore, an implementation of the XDR language is not constrained to use the lexical and syntactical conventions defined in this specification; in particular, other codesets and reserved words may be used in implementations that are not based on the English language.
Here is a short XDR data description of an object called a file, which might be used to transfer files from one machine to another.
Suppose now that there is a user named "john" who wants to store his lisp program "sillyprog" that contains just the data "(quit)". His file would be encoded as follows:
Offset | Hex Bytes | ASCII | Description | |||
---|---|---|---|---|---|---|
0 | 00 | 00 | 00 | 09 | .... | Length of filename = 9 |
4 | 73 | 69 | 6c | 6c | sill | Filename characters |
8 | 79 | 70 | 72 | 6f | ypro | ... and more characters ... |
12 | 67 | 00 | 00 | 00 | g... | ... and 3 zero-bytes of fill |
16 | 00 | 00 | 00 | 02 | .... | Filekind is EXEC = 2 |
20 | 00 | 00 | 00 | 04 | .... | Length of interpreter = 4 |
24 | 6c | 69 | 73 | 70 | lisp | Interpreter characters |
28 | 00 | 00 | 00 | 04 | .... | Length of owner = 4 |
32 | 6a | 6f | 68 | 6e | john | Owner characters |
36 | 00 | 00 | 00 | 06 | .... | Length of file data = 6 |
40 | 28 | 71 | 75 | 69 | (qui | File data bytes ... |
44 | 74 | 29 | 00 | 00 | t).. | ... and 2 zero-bytes of fill |
If, instead, "john" stored the same file in the text file "sillytext", it would be encoded as follows:
Offset | Hex Bytes | ASCII | Description | |||
---|---|---|---|---|---|---|
0 | 00 | 00 | 00 | 09 | .... | Length of filename = 9 |
4 | 73 | 69 | 6c | 6c | sill | Filename characters |
8 | 79 | 74 | 65 | 78 | ytex | ... and more characters ... |
12 | 74 | 00 | 00 | 00 | t... | ... and 3 zero-bytes of fill |
16 | 00 | 00 | 00 | 00 | .... | Filekind is TEXT = 0 |
Note: no data encoded for void | ||||||
20 | 00 | 00 | 00 | 04 | .... | Length of owner = 4 |
24 | 6a | 6f | 68 | 6e | john | Owner characters |
28 | 00 | 00 | 00 | 06 | .... | Length of file data = 6 |
32 | 28 | 71 | 75 | 69 | (qui | File data bytes ... |
36 | 74 | 29 | 00 | 00 | t).. | ... and 2 zero-bytes of fill |
Contents | Next section | Index |