The Open Group Base Specifications Issue 7
IEEE Std 1003.1, 2013 Edition
Copyright © 2001-2013 The IEEE and The Open Group

A.12 Utility Conventions

A.12.1 Utility Argument Syntax

The standard developers considered that recent trends toward diluting the SYNOPSIS sections of historical reference pages to the equivalent of:

command [options][operands]

were a disservice to the reader. Therefore, considerable effort was placed into rigorous definitions of all the command line arguments and their interrelationships. The relationships depicted in the synopses are normative parts of POSIX.1-2008; this information is sometimes repeated in textual form, but that is only for clarity within context.

The use of "undefined" for conflicting argument usage and for repeated usage of the same option is meant to prevent conforming applications from using conflicting arguments or repeated options unless specifically allowed (as is the case with ls, which allows simultaneous, repeated use of the -C, -l, and -1 options). Many historical implementations will tolerate this usage, choosing either the first or the last applicable argument. This tolerance can continue, but conforming applications cannot rely upon it. (Other implementations may choose to print usage messages instead.)

The use of "undefined" for conflicting argument usage also allows an implementation to make reasonable extensions to utilities where the implementor considers mutually-exclusive options according to POSIX.1-2008 to have a sensible meaning and result.

POSIX.1-2008 does not define the result of a command when an option-argument or operand is not followed by ellipses and the application specifies more than one of that option-argument or operand. This allows an implementation to define valid (although non-standard) behavior for the utility when more than one such option or operand is specified.

The requirements for option-arguments are summarized as follows:

 

SYNOPSIS Shows:

 

-a arg

-c[arg]

Conforming application uses:

-a arg

-carg or -c

System supports:

-a arg and -aarg

-carg and -c

Non-conforming applications may use:

-aarg

N/A

Earlier versions of this standard included obsolescent syntax which showed some options with (mandatory) adjacent option-arguments in the SYNOPSIS for some utilities. These have since been removed. For all options with mandatory option-arguments, the SYNOPSIS now shows <blank> characters between the option and the option-argument; however, historical usage has not been consistent in this area; therefore, <blank> characters are required to be used by conforming applications and to be handled by all implementations, but implementations are also required to handle an adjacent option-argument in order to preserve backwards-compatibility for old scripts. One of the justifications for selecting the multiple-argument method was that the single-argument case is inherently ambiguous when the option-argument can legitimately be a null string.

POSIX.1-2008 explicitly states that digits are permitted as operands and option-arguments. The lower and upper bounds for the values of the numbers used for operands and option-arguments were derived from the ISO C standard values for {LONG_MIN} and {LONG_MAX}. The requirement on the standard utilities is that numbers in the specified range do not cause a syntax error, although the specification of a number need not be semantically correct for a particular operand or option-argument of a utility. For example, the specification of:

dd obs=3000000000

would yield undefined behavior for the application and could be a syntax error because the number 3000000000 is outside of the range -2147483647 to +2147483647. On the other hand:

dd obs=2000000000

may cause some error, such as "blocksize too large", rather than a syntax error.

A.12.2 Utility Syntax Guidelines

This section is based on the rules listed in the SVID. It was included for two reasons:

  1. The individual utility descriptions in XCU Utilities needed a set of common (although not universal) actions on which they could anchor their descriptions of option and operand syntax. Most of the standard utilities actually do use these guidelines, and many of their historical implementations use the getopt() function for their parsing. Therefore, it was simpler to cite the rules and merely identify exceptions.

  2. Developers of conforming applications need suggested guidelines if the POSIX community is to avoid the chaos of historical UNIX system command syntax.

It is recommended that all future utilities and applications use these guidelines to enhance "user portability". The fact that some historical utilities could not be changed (to avoid breaking historical applications) should not deter this future goal.

The voluntary nature of the guidelines is highlighted by repeated uses of the word should throughout. This usage should not be misinterpreted to imply that utilities that claim conformance in their OPTIONS sections do not always conform.

Guidelines 1 and 2 encourage utility writers to use only characters from the portable character set because use of locale-specific characters may make the utility inaccessible from other locales. Use of uppercase letters is discouraged due to problems associated with porting utilities to systems that do not distinguish between uppercase and lowercase characters in filenames. Use of non-alphanumeric characters is discouraged due to the number of utilities that treat non-alphanumeric characters in "special" ways depending on context (such as the shell using white-space characters to delimit arguments, various quote characters for quoting, the <dollar-sign> to introduce variable expansion, etc.).

In XCU Simple Commands, it is further stated that a command used in the Shell Command Language cannot be named with a trailing <colon>.

Guideline 3 was changed to allow alphanumeric characters (letters and digits) from the character set to allow compatibility with historical usage. Historical practice allows the use of digits wherever practical, and there are no portability issues that would prohibit the use of digits. In fact, from an internationalization viewpoint, digits (being non-language-dependent) are preferable over letters (a -2 is intuitively self-explanatory to any user, while in the -f filename the letter 'f' is a mnemonic aid only to speakers of Latin-based languages where "filename" happens to translate to a word that begins with 'f'. Since Guideline 3 still retains the word "single", multi-digit options are not allowed. Instances of historical utilities that used them have been marked obsolescent, with the numbers being changed from option names to option-arguments.

It was difficult to achieve a satisfactory solution to the problem of name space in option characters. When the standard developers desired to extend the historical cc utility to accept ISO C standard programs, they found that all of the portable alphabet was already in use by various vendors. Thus, they had to devise a new name, c89 (now superseded by c99), rather than something like cc -X. There were suggestions that implementors be restricted to providing extensions through various means (such as using a <plus-sign> as the option delimiter or using option characters outside the alphanumeric set) that would reserve all of the remaining alphanumeric characters for future POSIX standards. These approaches were resisted because they lacked the historical style of UNIX systems. Furthermore, if a vendor-provided option should become commonly used in the industry, it would be a candidate for standardization. It would be desirable to standardize such a feature using historical practice for the syntax (the semantics can be standardized with any syntax). This would not be possible if the syntax was one reserved for the vendor. However, since the standardization process may lead to minor changes in the semantics, it may prove to be better for a vendor to use a syntax that will not be affected by standardization.

Guideline 8 includes the concept of <comma>-separated lists in a single argument. It is up to the utility to parse such a list itself because getopt() just returns the single string. This situation was retained so that certain historical utilities would not violate the guidelines. Applications preparing for international use should be aware of an occasional problem with <comma>-separated lists: in some locales, the <comma> is used as the radix character. Thus, if an application is preparing operands for a utility that expects a <comma>-separated list, it should avoid generating non-integer values through one of the means that is influenced by setting the LC_NUMERIC variable (such as awk, bc, printf, or printf()).

Unless explicitly stated otherwise in the utility description, Guideline 9 requires applications to put options before operands, and requires utilities to accept any such usage without misinterpreting operands as options. For example, if an implementation of the printf utility supports a -e option as an extension, the command:

printf %s -e

must output the string "-e" without interpreting the -e as an option. Similarly, the command:

ls myfile -l

must interpret the -l argument as a second file operand, not as a -l option.

Applications calling any utility with a first operand starting with '-' should usually specify --, as indicated by Guideline 10, to mark the end of the options. This is true even if the SYNOPSIS in the Shell and Utilities volume of POSIX.1-2008 does not specify any options; implementations may provide options as extensions to the Shell and Utilities volume of POSIX.1-2008. The standard utilities that do not support Guideline 10 indicate that fact in the OPTIONS section of the utility description.

Guideline 7 allows any string to be an option-argument; an option-argument can begin with any character, can be - or --, and can be an empty string. For example, the commands pr -h -, pr -h --, pr -h -d, pr -h +2, and pr -h " contain the option-arguments -, --, -d, +2, and an empty string, respectively. Conversely, the command pr -h -- -d treats -d as an option, not as an argument, because the -- is an option-argument here, not a delimiter.

Guideline 11 was modified to clarify that the order of different options should not matter relative to one another. However, the order of repeated options that also have option-arguments may be significant; therefore, such options are required to be interpreted in the order that they are specified. The make utility is an instance of a historical utility that uses repeated options in which the order is significant. Multiple files are specified by giving multiple instances of the -f option; for example:

make -f common_header -f specific_rules target

Guideline 13 does not imply that all of the standard utilities automatically accept the operand '-' to mean standard input or output, nor does it specify the actions of the utility upon encountering multiple '-' operands. It simply says that, by default, '-' operands are not used for other purposes in the file reading or writing (but not when using stat(), unlink(), touch, and so on) utilities. In earlier versions of this standard, all information concerning actual treatment of the '-' operand is found in the individual utility sections. Many implementations, however, treated '-' as standard input or output and many applications depended on this behavior even though it was not standard. This behavior is now implementation-defined. Portable applications should not use '-' to mean standard input or output unless it is explicitly stated to do so in the utility description and they should always use './-' if they intend to refer to a file named - in the current working directory.

Guideline 14 is intended to prohibit implementations that would treat the command ls -l -d as if it were ls -- -l -d or ls -l -- -d.

The standard permits implementations to have extensions that violate the Utility Syntax Guidelines so long as when the utility is used in line with the forms defined by the standard it follows the Utility Syntax Guidelines. Thus, head -42 file and ls--help are permitted extensions. The intent is to allow extensions so long as the standard form is accepted and follows the guidelines.

An area of concern was that as implementations mature, implementation-defined utilities and implementation-defined utility options will result. The idea was expressed that there needed to be a standard way, say an environment variable or some such mechanism, to identify implementation-defined utilities separately from standard utilities that may have the same name. It was decided that there already exist several ways of dealing with this situation and that it is outside of the scope to attempt to standardize in the area of non-standard items. A method that exists on some historical implementations is the use of the so-called /local/bin or /usr/local/bin directory to separate local or additional copies or versions of utilities. Another method that is also used is to isolate utilities into completely separate domains. Still another method to ensure that the desired utility is being used is to request the utility by its full pathname. There are many approaches to this situation; the examples given above serve to illustrate that there is more than one.

 

return to top of page

UNIX ® is a registered Trademark of The Open Group.
POSIX ® is a registered Trademark of The IEEE.
Copyright © 2001-2013 The IEEE and The Open Group, All Rights Reserved
[ Main Index | XBD | XSH | XCU | XRAT ]