Refer to Scope.
Refer to Conformance.
There is no additional rationale provided for this section.
The change history is provided as an informative section, to track changes from previous issues of IEEE Std 1003.1-2001.
The following sections describe changes made to the Shell and Utilities volume of IEEE Std 1003.1-2001 since Issue 5 of the base document. The CHANGE HISTORY section for each utility describes technical changes made to that utility from Issue 5. Changes between earlier issues of the base document and Issue 5 are not included.
The change history between Issue 5 and Issue 6 also lists the changes since the ISO POSIX-2:1993 standard.
The following list summarizes the major changes that were made in the Shell and Utilities volume of IEEE Std 1003.1-2001 from Issue 5 to Issue 6:
This volume of IEEE Std 1003.1-2001 is extensively revised so that it can be both an IEEE POSIX Standard and an Open Group Technical Standard.
The terminology has been reworked to meet the style requirements.
Shading notation and margin codes are introduced for identification of options within the volume.
This volume of IEEE Std 1003.1-2001 is updated to mandate support of FIPS 151-2. The following changes were made:
Support is mandated for the capabilities associated with the following symbolic constants:
_POSIX_CHOWN_RESTRICTED _POSIX_JOB_CONTROL _POSIX_SAVED_IDS
In the environment for the login shell, the environment variables LOGNAME and HOME shall be defined and have the properties described in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 7, Locale.
This volume of IEEE Std 1003.1-2001 is updated to align with some features of the Single UNIX Specification.
A new section on Utility Limits is added.
A section on the Relationships to Other Documents is added.
Concepts and definitions have been moved to a separate volume.
A RATIONALE section is added to each reference page.
The c99 utility is added as a replacement for c89, which is withdrawn in this issue.
IEEE Std 1003.2d-1994 is incorporated, adding the qalter, qdel, qhold, qmove, qmsg, qrerun, qrls, qselect, qsig, qstat, and qsub utilities.
IEEE P1003.2b draft standard is incorporated, making extensive updates and adding the iconv utility.
IEEE PASC Interpretations are applied.
The Open Group's corrigenda and resolutions are applied.
The following table lists the new utilities introduced since the ISO POSIX-2:1993 standard (as modified by IEEE Std 1003.2d-1994). Apart from the c99 and iconv utilities, these are all part of the XSI extension.
New Utilities in Issue 6 |
|||||||
---|---|---|---|---|---|---|---|
Refer to Terminology.
Refer to Definitions.
It has been pointed out that the Shell and Utilities volume of IEEE Std 1003.1-2001 assumes that a great deal of functionality from the System Interfaces volume of IEEE Std 1003.1-2001 is present, but never states exactly how much (and strictly does not need to since both are mandated on a conforming system). This section is an attempt to clarify the assumptions.
IEEE Std 1003.1-2001/Cor 2-2004, item XCU/TC2/D6/2 is applied, updating Table 1-1.
This is intended to be a summary of the unlink() and rmdir() requirements. Note that it is possible using the unlink() function for item 4. to occur.
This section was introduced to address the issue that there was insufficient detail presented by such utilities as awk or sh about their procedural control statements and their methods of performing arithmetic functions.
The ISO C standard was selected as a model because most historical implementations of the standard utilities were written in C. Thus, it was more likely that they would act in the desired manner without modification.
Using the ISO C standard is primarily a notational convenience so that the many procedural languages in the Shell and Utilities volume of IEEE Std 1003.1-2001 would not have to be rigorously described in every aspect. Its selection does not require that the standard utilities be written in Standard C; they could be written in Common Usage C, Ada, Pascal, assembler language, or anything else.
The sizes of the various numeric values refer to C-language data types that are allowed to be different sizes by the ISO C standard. Thus, like a C-language application, a shell application cannot rely on their exact size. However, it can rely on their minimum sizes expressed in the ISO C standard, such as {LONG_MAX} for a long type.
The behavior on overflow is undefined for ISO C standard arithmetic. Therefore, the standard utilities can use "bignum'' representation for integers so that there is no fixed maximum unless otherwise stated in the utility description. Similarly, standard utilities can use infinite-precision representations for floating-point arithmetic, as long as these representations exceed the ISO C standard requirements.
This section addresses only the issue of semantics; it is not intended to specify syntax. For example, the ISO C standard requires that 0L be recognized as an integer constant equal to zero, but utilities such as awk and sh are not required to recognize 0L (though they are allowed to, as an extension).
The ISO C standard requires that a C compiler must issue a diagnostic for constants that are too large to represent. Most standard utilities are not required to issue these diagnostics; for example, the command:
diff -C 2147483648 file1 file2
has undefined behavior, and the diff utility is not required to issue a diagnostic even if the number 2147483648 cannot be represented.
Refer to Portability.
Refer to Codes.
This section grew out of an idea that originated with the original POSIX.1, in the tables of system limits for the sysconf() and pathconf() functions. The idea being that a conforming application can be written to use the most restrictive values that a minimal system can provide, but it should not have to. The values provided represent compromises so that some vendors can use historically limited versions of UNIX system utilities. They are the highest values that a strictly conforming application can assume, given no other information.
However, by using the getconf utility or the sysconf() function, the elegant application can be tailored to more liberal values on some of the specific instances of specific implementations.
There is no explicitly stated requirement that an implementation provide finite limits for any of these numeric values; the implementation is free to provide essentially unbounded capabilities (where it makes sense), stopping only at reasonable points such as {ULONG_MAX} (from the ISO C standard). Therefore, applications desiring to tailor themselves to the values on a particular implementation need to be ready for possibly huge values; it may not be a good idea to allocate blindly a buffer for an input line based on the value of {LINE_MAX}, for instance. However, unlike the System Interfaces volume of IEEE Std 1003.1-2001, there is no set of limits that return a special indication meaning "unbounded". The implementation should always return an actual number, even if the number is very large.
The statement:
"It is not guaranteed that the application ...''
is an indication that many of these limits are designed to ensure that implementors design their utilities without arbitrary constraints related to unimaginative programming. There are certainly conditions under which combinations of options can cause failures that would not render an implementation non-conforming. For example, {EXPR_NEST_MAX} and {ARG_MAX} could collide when expressions are large; combinations of {BC_SCALE_MAX} and {BC_DIM_MAX} could exceed virtual memory.
In the Shell and Utilities volume of IEEE Std 1003.1-2001, the notion of a limit being guaranteed for the process lifetime, as it is in the System Interfaces volume of IEEE Std 1003.1-2001, is not as useful to a shell script. The getconf utility is probably a process itself, so the guarantee would be without value. Therefore, the Shell and Utilities volume of IEEE Std 1003.1-2001 requires the guarantee to be for the session lifetime. This will mean that many vendors will either return very conservative values or possibly implement getconf as a built-in.
It may seem confusing to have limits that apply only to a single utility grouped into one global section. However, the alternative, which would be to disperse them out into their utility description sections, would cause great difficulty when sysconf() and getconf were described. Therefore, the standard developers chose the global approach.
Each language binding could provide symbol names that are slightly different from those shown here. For example, the C-Language Binding option adds a leading underscore to the symbols as a prefix.
The following comments describe selection criteria for the symbols and their values:
It should be noted that {LINE_MAX} applies only to input line length; there is no requirement in IEEE Std 1003.1-2001 that limits the length of output lines. Utilities such as awk, sed, and paste could theoretically construct lines longer than any of the input lines they received, depending on the options used or the instructions from the application. They are not required to truncate their output to {LINE_MAX}. It is the responsibility of the application to deal with this. If the output of one of those utilities is to be piped into another of the standard utilities, line length restrictions will have to be considered; the fold utility, among others, could be used to ensure that only reasonable line lengths reach utilities or applications.
There are different limits associated with command lines and input to utilities, depending on the method of invocation. In the case of a C program exec-ing a utility, {ARG_MAX} is the underlying limit. In the case of the shell reading a script and exec-ing a utility, {LINE_MAX} limits the length of lines the shell is required to process, and {ARG_MAX} will still be a limit. If a user is entering a command on a terminal to the shell, requesting that it invoke the utility, {MAX_INPUT} may restrict the length of the line that can be given to the shell to a value below {LINE_MAX}.
When an option is supported, getconf returns a value of 1. For example, when C development is supported:
if [ "$(getconf POSIX2_C_DEV)" -eq 1 ]; then echo C supported fi
The sysconf() function in the C-Language Binding option would return 1.
The following comments describe selection criteria for the symbols and their values:
rm /usr/bin/c99 getconf POSIX2_C_DEV
IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/2 is applied, deleting the entry for {POSIX2_VERSION} since it is not a utility limit minimum value.
IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/3 is applied, changing the text in Utility Limits from: "utility (see getconf) through the sysconf() function defined in the System Interfaces volume of IEEE Std 1003.1-2001. The literal names shown in Table 1-3 apply only to the getconf utility; the high-level language binding describes the exact form of each name to be used by the interfaces in that binding." to: "utility (see getconf).".
There is no additional rationale provided for this section.
This section is arranged with headings in the same order as all the utility descriptions. It is a collection of related and unrelated information concerning:
The default actions of utilities
The meanings of notations used in IEEE Std 1003.1-2001 that are specific to individual utility sections
Although this material may seem out of place here, it is important that this information appear before any of the utilities to be described later.
There is no additional rationale provided for this section.
There is no additional rationale provided for this section.
There is no additional rationale provided for this section.
Although it has not always been possible, the standard developers tried to avoid repeating information to reduce the risk that duplicate explanations could each be modified differently.
The need to recognize -- is required because conforming applications need to shield their operands from any arbitrary options that the implementation may provide as an extension. For example, if the standard utility foo is listed as taking no options, and the application needed to give it a pathname with a leading hyphen, it could safely do it as:
foo -- -myfile
and avoid any problems with -m used as an extension.
The usage of - is never shown in the SYNOPSIS. Similarly, the usage of -- is never shown.
The requirement for processing operands in command-line order is to avoid a "WeirdNIX" utility that might choose to sort the input files alphabetically, by size, or by directory order. Although this might be acceptable for some utilities, in general the programmer has a right to know exactly what order will be chosen.
Some of the standard utilities take multiple file operands and act as if they were processing the concatenation of those files. For example:
asa file1 file2
and:
cat file1 file2 | asa
have similar results when questions of file access, errors, and performance are ignored. Other utilities such as grep or wc have completely different results in these two cases. This latter type of utility is always identified in its DESCRIPTION or OPERANDS sections, whereas the former is not. Although it might be possible to create a general assertion about the former case, the following points must be addressed:
Access times for the files might be different in the operand case versus the cat case.
The utility may have error messages that are cognizant of the input filename, and this added value should not be suppressed. (As an example, awk sets a variable with the filename at each file boundary.)
There is no additional rationale provided for this section.
A conforming application cannot assume the following three commands are equivalent:
tail -n +2 file (sed -n 1q; cat) < file cat file | (sed -n 1q; cat)
The second command is equivalent to the first only when the file is seekable. In the third command, if the file offset in the open file description were not unspecified, sed would have to be implemented so that it read from the pipe 1 byte at a time or it would have to employ some method to seek backwards on the pipe. Such functionality is not defined currently in POSIX.1 and does not exist on all historical systems. Other utilities, such as head, read, and sh, have similar properties, so the restriction is described globally in this section.
The definition of "text file" is strictly enforced for input to the standard utilities; very few of them list exceptions to the undefined results called for here. (Of course, "undefined" here does not mean that historical implementations necessarily have to change to start indicating error conditions. Conforming applications cannot rely on implementations succeeding or failing when non-text files are used.)
The utilities that allow line continuation are generally those that accept input languages, rather than pure data. It would be unusual for an input line of this type to exceed {LINE_MAX} bytes and unreasonable to require that the implementation allow unlimited accumulation of multiple lines, each of which could reach {LINE_MAX}. Thus, for a conforming application the total of all the continued lines in a set cannot exceed {LINE_MAX}.
The format description is intended to be sufficiently rigorous to allow other applications to generate these input files. However, since <blank>s can legitimately be included in some of the fields described by the standard utilities, particularly in locales other than the POSIX locale, this intent is not always realized.
There is no additional rationale provided for this section.
Because there is no language prohibiting it, a utility is permitted to catch a signal, perform some additional processing (such as deleting temporary files), restore the default signal action (or action inherited from the parent process), and resignal itself.
The format description is intended to be sufficiently rigorous to allow post-processing of output by other programs, particularly by an awk or lex parser.
This section does not describe error messages that refer to incorrect operation of the utility. Consider a utility that processes program source code as its input. This section is used to describe messages produced by a correctly operating utility that encounters an error in the program source code on which it is processing. However, a message indicating that the utility had insufficient memory in which to operate would not be described.
Some utilities have traditionally produced warning messages without returning a non-zero exit status; these are specifically noted in their sections. Other utilities shall not write to standard error if they complete successfully, unless the implementation provides some sort of extension to increase the verbosity or debugging level.
The format descriptions are intended to be sufficiently rigorous to allow post-processing of output by other programs.
The format description is intended to be sufficiently rigorous to allow post-processing of output by other programs, particularly by an awk or lex parser.
Receipt of the SIGQUIT signal should generally cause termination (unless in some debugging mode) that would bypass any attempted recovery actions.
There is no additional rationale provided for this section.
Note the additional discussion of exit values in Exit Status for Commands in the sh utility. It describes requirements for returning exit values greater than 125.
A utility may list zero as a successful return, 1 as a failure for a specific reason, and greater than 1 as "an error occurred". In this case, unspecified conditions may cause a 2 or 3, or other value, to be returned. A strictly conforming application should be written so that it tests for successful exit status values (zero in this case), rather than relying upon the single specific error value listed in IEEE Std 1003.1-2001. In that way, it will have maximum portability, even on implementations with extensions.
The standard developers are aware that the general non-enumeration of errors makes it difficult to write test suites that test the incorrect operation of utilities. There are some historical implementations that have expended effort to provide detailed status messages and a helpful environment to bypass or explain errors, such as prompting, retrying, or ignoring unimportant syntax errors; other implementations have not. Since there is no realistic way to mandate system behavior in cases of undefined application actions or system problems-in a manner acceptable to all cultures and environments-attention has been limited to the correct operation of utilities by the conforming application. Furthermore, the conforming application does not need detailed information concerning errors that it caused through incorrect usage or that it cannot correct.
There is no description of defaults for this section because all of the standard utilities specify something (or explicitly state "Unspecified") for exit status.
Several actions are possible when a utility encounters an error condition, depending on the severity of the error and the state of the utility. Included in the possible actions of various utilities are: deletion of temporary or intermediate work files; deletion of incomplete files; and validity checking of the file system or directory.
The text about recursive traversing is meant to ensure that utilities such as find process as many files in the hierarchy as they can. They should not abandon all of the hierarchy at the first error and resume with the next command-line operand, but should attempt to keep going.
This section provides additional caveats, issues, and recommendations to the developer.
This section provides sample usage.
There is no additional rationale provided for this section.
FUTURE DIRECTIONS sections act as pointers to related work that may impact the interface in the future, and often cautions the developer to architect the code to account for a change in this area. Note that a future directions statement should not be taken as a commitment to adopt a feature or interface in the future.
There is no additional rationale provided for this section.
There is no additional rationale provided for this section.
This section is intended to clarify the requirements for utilities in support of large files.
The utilities listed in this section are utilities which are used to perform administrative tasks such as to create, move, copy, remove, change the permissions, or measure the resources of a file. They are useful both as end-user tools and as utilities invoked by applications during software installation and operation.
The chgrp, chmod, chown, ln, and rm utilities probably require use of large file-capable versions of stat(), lstat(), ftw(), and the stat structure.
The cat, cksum, cmp, cp, dd, mv, sum, and touch utilities probably require use of large file-capable versions of creat(), open(), and fopen().
The cat, cksum, cmp, dd, df, du, ls, and sum utilities may require writing large integer values. For example:
The cat utility might have a -n option which counts <newline>s.
The cmp utility reports the line number at which the first difference occurs, and also has a -l option which reports file offsets.
The dd, find, and test utilities may need to interpret command arguments that contain 64-bit values. For dd, the arguments include skip= n, seek= n, and count= n. For find, the arguments include -size n. For test, the arguments are those associated with algebraic comparisons.
The df utility might need to access large file systems with statvfs().
The ulimit utility will need to use large file-capable versions of getrlimit() and setrlimit() and be able to read and write large integer values.
All of these utilities can be exec-ed. There is no requirement that these utilities are actually built into the shell itself, but many shells need the capability to do so because the Shell and Utilities volume of IEEE Std 1003.1-2001, Section 2.9.1.1, Command Search and Execution requires that they be found prior to the PATH search. The shell could satisfy its requirements by keeping a list of the names and directly accessing the file-system versions regardless of PATH . Providing all of the required functionality for those such as cd or read would be more difficult.
There were originally three justifications for allowing the omission of exec-able versions:
It would require wasting space in the file system, at the expense of very small systems. However, it has been pointed out that all 16 utilities in the table can be provided with 16 links to a single-line shell script:
$0 "$@"
It is not logical to require invocation of utilities such as cd because they have no value outside the shell environment or cannot be useful in a child process. However, counter-examples always seemed to be available for even the most unusual cases:
find . -type d -exec cd {} \; -exec foo {} \; (which invokes "foo" on accessible directories)
ps ... | sed ... | xargs kill
find . -exec true \; -a ... (where "true" is used for temporary debugging)
It is confusing to have a utility such as kill that can easily be in the file system in the base standard, but that requires built-in status for the User Portability Utilities option (for the % job control job ID notation). It was decided that it was more appropriate to describe the required functionality (rather than the implementation) to the system implementors and let them decide how to satisfy it.
On the other hand, it was realized that any distinction like this between utilities was not useful to applications, and that the cost to correct it was small. These arguments were ultimately the most effective.
There were varying reasons for including utilities in the table of built-ins:
All utilities, including those in the table, are accessible via the system() and popen() functions in the System Interfaces volume of IEEE Std 1003.1-2001. There are situations where the return functionality of system() and popen() is not desirable. Applications that require the exit status of the invoked utility will not be able to use system() or popen(), since the exit status returned is that of the command language interpreter rather than that of the invoked utility. The alternative for such applications is the use of the exec family.