The Open Group Base Specifications Issue 8
IEEE Std 1003.1-2024
Copyright © 2001-2024 The IEEE and The Open Group

C. Rationale for Shell and Utilities

C.1 Introduction

C.1.1 Change History

The change history is provided as an informative section, to track changes from earlier versions of this standard.

The following sections describe changes made to the Shell and Utilities volume of POSIX.1-2024 since Issue 7 of the base document. The CHANGE HISTORY section for each utility describes technical changes made to that utility in Issue 5 and later. Changes made before Issue 5 are not included.

Changes from Issue 7 to Issue 8 (POSIX.1-2024)

The following list summarizes the major changes that were made in the Shell and Utilities volume of POSIX.1-2024 from Issue 7 to Issue 8:

New Features in Issue 8

The utilities first introduced in Issue 8 (over the Issue 7 base document) are as follows:

New Utilities in Issue 8


gettext
msgfmt
ngettext
 


readlink
realpath
timeout
 


xgettext
 

Removed Utilities in Issue 8

The utilities removed in Issue 8 (from the Issue 7 base document) are as follows:

Removed Utilities in Issue 8


fort77
qalter
qdel
qhold
 


qmove
qmsg
qrerun
qrls
 


qselect
qsig
qstat
qsub
 

C.1.2 Relationship to Other Documents

C.1.2.1 System Interfaces

It has been pointed out that the Shell and Utilities volume of POSIX.1-2024 assumes that a great deal of functionality from the System Interfaces volume of POSIX.1-2024 is present, but never states exactly how much (and strictly does not need to since both are mandated on a conforming system). This section is an attempt to clarify the assumptions.

File Read, Write, and Creation

IEEE Std 1003.1-2001/Cor 2-2004, item XCU/TC2/D6/2 is applied, updating Table 1-1.

File Removal

This is intended to be a summary of the unlink() and rmdir() requirements. Note that it is possible using the unlink() function for item 4. to occur.

C.1.2.2 Concepts Derived from the ISO C Standard

This section was introduced to address the issue that there was insufficient detail presented by such utilities as awk or sh about their procedural control statements and their methods of performing arithmetic functions.

The ISO C standard was selected as a model because most historical implementations of the standard utilities were written in C. Thus, it was more likely that they would act in the desired manner without modification.

Using the ISO C standard is primarily a notational convenience so that the many procedural languages in the Shell and Utilities volume of POSIX.1-2024 would not have to be rigorously described in every aspect. Its selection does not require that the standard utilities be written in Standard C; they could be written in Common Usage C, Ada, Pascal, assembler language, or anything else.

The sizes of the various numeric values refer to C-language data types that are allowed to be different sizes by the ISO C standard. Thus, like a C-language application, a shell application cannot rely on their exact size. However, it can rely on their minimum sizes expressed in the ISO C standard, such as {LONG_MAX} for a long type.

The behavior on overflow is undefined for ISO C standard arithmetic. Therefore, the standard utilities can use "bignum" representation for integers so that there is no fixed maximum unless otherwise stated in the utility description. Similarly, standard utilities can use infinite-precision representations for floating-point arithmetic, as long as these representations exceed the ISO C standard requirements.

This section addresses only the issue of semantics; it is not intended to specify syntax. For example, the ISO C standard requires that 0L be recognized as an integer constant equal to zero, but utilities such as awk and sh are not required to recognize 0L (though they are allowed to, as an extension).

The ISO C standard requires that a C compiler must issue a diagnostic for constants that are too large to represent. Most standard utilities are not required to issue these diagnostics; for example, the command:

diff -C 2147483648 file1 file2

has undefined behavior, and the diff utility is not required to issue a diagnostic even if the number 2147483648 cannot be represented.

Austin Group Defect 1128 is applied, adding a note about the comma operator.

C.1.3 Utility Limits

This section grew out of an idea that originated with the original POSIX.1, in the tables of system limits for the sysconf() and pathconf() functions. The idea being that a conforming application can be written to use the most restrictive values that a minimal system can provide, but it should not have to. The values provided represent compromises so that some vendors can use historically limited versions of UNIX system utilities. They are the highest values that a strictly conforming application can assume, given no other information.

However, by using the getconf utility or the sysconf() function, the elegant application can be tailored to more liberal values on some of the specific instances of specific implementations.

There is no explicitly stated requirement that an implementation provide finite limits for any of these numeric values; the implementation is free to provide essentially unbounded capabilities (where it makes sense), stopping only at reasonable points such as {ULONG_MAX} (from the ISO C standard). Therefore, applications desiring to tailor themselves to the values on a particular implementation need to be ready for possibly huge values; it may not be a good idea to allocate blindly a buffer for an input line based on the value of {LINE_MAX}, for instance. However, unlike the System Interfaces volume of POSIX.1-2024, there is no set of limits that return a special indication meaning "unbounded". The implementation should always return an actual number, even if the number is very large.

The statement:

"It is not guaranteed that the application ..."

is an indication that many of these limits are designed to ensure that implementors design their utilities without arbitrary constraints related to unimaginative programming. There are certainly conditions under which combinations of options can cause failures that would not render an implementation non-conforming. For example, {EXPR_NEST_MAX} and {ARG_MAX} could collide when expressions are large; combinations of {BC_SCALE_MAX} and {BC_DIM_MAX} could exceed virtual memory.

In the Shell and Utilities volume of POSIX.1-2024, the notion of a limit being guaranteed for the process lifetime, as it is in the System Interfaces volume of POSIX.1-2024, is not as useful to a shell script. The getconf utility is probably a process itself, so the guarantee would be without value. Therefore, the Shell and Utilities volume of POSIX.1-2024 requires the guarantee to be for the session lifetime. This will mean that many vendors will either return very conservative values or possibly implement getconf as a built-in.

It may seem confusing to have limits that apply only to a single utility grouped into one global section. However, the alternative, which would be to disperse them out into their utility description sections, would cause great difficulty when sysconf() and getconf were described. Therefore, the standard developers chose the global approach.

Each language binding could provide symbol names that are slightly different from those shown here. For example, the C-Language Binding option adds a leading <underscore> to the symbols as a prefix.

The following comments describe selection criteria for the symbols and their values:

{ARG_MAX}
This is defined by the System Interfaces volume of POSIX.1-2024. Unfortunately, it is very difficult for a conforming application to deal with this value, as it does not know how much of its argument space is being consumed by the environment variables of the user.
{BC_BASE_MAX}
{BC_DIM_MAX}
{BC_SCALE_MAX}
These were originally one value, {BC_SCALE_MAX}, but it was unreasonable to link all three concepts into one limit.
{CHILD_MAX}
This is defined by the System Interfaces volume of POSIX.1-2024.
{COLL_WEIGHTS_MAX}
The weights assigned to order can be considered as "passes" through the collation algorithm.
{EXPR_NEST_MAX}
The value for expression nesting was borrowed from the ISO C standard.
{LINE_MAX}
This is a global limit that affects all utilities, unless otherwise noted. The {MAX_CANON} value from the System Interfaces volume of POSIX.1-2024 may further limit input lines from terminals. The {LINE_MAX} value was the subject of much debate and is a compromise between those who wished to have unlimited lines and those who understood that many historical utilities were written with fixed buffers. Frequently, utility writers selected the UNIX system constant BUFSIZ to allocate these buffers; therefore, some utilities were limited to 512 bytes for I/O lines, while others achieved 4096 bytes or greater.

It should be noted that {LINE_MAX} applies only to input line length; there is no requirement in POSIX.1-2024 that limits the length of output lines. Utilities such as awk, sed, and paste could theoretically construct lines longer than any of the input lines they received, depending on the options used or the instructions from the application. They are not required to truncate their output to {LINE_MAX}. It is the responsibility of the application to deal with this. If the output of one of those utilities is to be piped into another of the standard utilities, line length restrictions will have to be considered; the fold utility, among others, could be used to ensure that only reasonable line lengths reach utilities or applications.

{LINK_MAX}
This is defined by the System Interfaces volume of POSIX.1-2024.
{MAX_CANON}
{MAX_INPUT}
{NAME_MAX}
{NGROUPS_MAX}
{OPEN_MAX}
{PATH_MAX}
{PIPE_BUF}
These limits are defined by the System Interfaces volume of POSIX.1-2024. Note that the byte lengths described by some of these values continue to represent bytes, even if the applicable character set uses a multi-byte encoding.
{RE_DUP_MAX}
The value selected is consistent with historical practice. Although the name implies that it applies to all REs, only BREs use the interval notation \{m,n\} addressed by this limit.
{POSIX2_SYMLINKS}
The {POSIX2_SYMLINKS} variable indicates that the underlying operating system supports the creation of symbolic links in specific directories. Many of the utilities defined in POSIX.1-2024 that deal with symbolic links do not depend on this value. For example, a utility that follows symbolic links (or does not, as the case may be) will only be affected by a symbolic link if it encounters one. Presumably, a file system that does not support symbolic links will not contain any. This variable does affect such utilities as ln -s and pax that attempt to create symbolic links.

There are different limits associated with command lines and input to utilities, depending on the method of invocation. In the case of a C program exec-ing a utility, {ARG_MAX} is the underlying limit. In the case of the shell reading a script and exec-ing a utility, {LINE_MAX} limits the length of lines the shell is required to process, and {ARG_MAX} will still be a limit. If a user is entering a command on a terminal to the shell, requesting that it invoke the utility, {MAX_INPUT} may restrict the length of the line that can be given to the shell to a value below {LINE_MAX}.

When an option is supported, getconf returns a value of 1. For example, when C development is supported:

if [ "$(getconf POSIX2_C_DEV)" -eq 1 ]; then
    echo C supported
fi

The sysconf() function in the C-Language Binding option would return 1.

The following comments describe selection criteria for the symbols and their values:

POSIX2_C_BIND
POSIX2_C_DEV
POSIX2_FORT_RUN
POSIX2_SW_DEV
POSIX2_UPE
It is possible for some (usually privileged) operations to remove utilities that support these options or otherwise to render these options unsupported. The header files, the sysconf() function, or the getconf utility will not necessarily detect such actions, in which case they should not be considered as rendering the implementation non-conforming. A test suite should not attempt tests such as:
rm /usr/bin/c17
getconf POSIX2_C_DEV
POSIX2_LOCALEDEF
This symbol was introduced to allow implementations to restrict supported locales to only those supplied by the implementation.

IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/2 is applied, deleting the entry for {POSIX2_VERSION} since it is not a utility limit minimum value.

IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/3 is applied, changing the text in Utility Limits from: "utility (see getconf) through the sysconf() function defined in the System Interfaces volume of POSIX.1-2024. The literal names shown in Table 1-3 apply only to the getconf utility; the high-level language binding describes the exact form of each name to be used by the interfaces in that binding." to: "utility (see getconf).".

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0001 [666] is applied.

C.1.4 Grammar Conventions

There is no additional rationale provided for this section.

C.1.5 Utility Description Defaults

This section is arranged with headings in the same order as all the utility descriptions. It is a collection of related and unrelated information concerning:

  1. The default actions of utilities
  2. The meanings of notations used in POSIX.1-2024 that are specific to individual utility sections

Although this material may seem out of place here, it is important that this information appear before any of the utilities to be described later.

NAME

There is no additional rationale provided for this section.

SYNOPSIS

There is no additional rationale provided for this section.

DESCRIPTION

Austin Group Defect 351 is applied, adding a requirement relating to declaration utilities.

OPTIONS

Although it has not always been possible, the standard developers tried to avoid repeating information to reduce the risk that duplicate explanations could each be modified differently.

The need to recognize -- is required because conforming applications need to shield their operands from any arbitrary options that the implementation may provide as an extension. For example, if the standard utility foo is listed as taking no options, and the application needed to give it a pathname with a leading <hyphen-minus>, it could safely do it as:

foo -- -myfile

and avoid any problems with -m used as an extension.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0002 [584] is applied.

OPERANDS

The usage of - is never shown in the SYNOPSIS. Similarly, the usage of -- is never shown.

The requirement for processing operands in command-line order is to avoid a "WeirdNIX" utility that might choose to sort the input files alphabetically, by size, or by directory order. Although this might be acceptable for some utilities, in general the programmer has a right to know exactly what order will be chosen.

Some of the standard utilities take multiple file operands and act as if they were processing the concatenation of those files. For example:

asa file1 file2

and:

cat file1 file2 | asa

have similar results when questions of file access, errors, and performance are ignored. Other utilities such as grep or wc have completely different results in these two cases. This latter type of utility is always identified in its DESCRIPTION or OPERANDS sections, whereas the former is not. Although it might be possible to create a general assertion about the former case, the following points must be addressed:

STDIN

There is no additional rationale provided for this section.

INPUT FILES

A conforming application cannot assume the following three commands are equivalent:

tail -n +2 file
(sed -n 1q; cat) < file
cat file | (sed -n 1q; cat)

The second command is equivalent to the first only when the file is seekable. In the third command, if the file offset in the open file description were not unspecified, sed would have to be implemented so that it read from the pipe 1 byte at a time or it would have to employ some method to seek backwards on the pipe. Such functionality is not defined currently in POSIX.1 and does not exist on all historical systems. Other utilities, such as head, read, and sh, have similar properties, so the restriction is described globally in this section.

The definition of "text file" is strictly enforced for input to the standard utilities; very few of them list exceptions to the undefined results called for here. (Of course, "undefined" here does not mean that historical implementations necessarily have to change to start indicating error conditions. Conforming applications cannot rely on implementations succeeding or failing when non-text files are used.)

The utilities that allow line continuation are generally those that accept input languages, rather than pure data. It would be unusual for an input line of this type to exceed {LINE_MAX} bytes and unreasonable to require that the implementation allow unlimited accumulation of multiple lines, each of which could reach {LINE_MAX}. Thus, for a conforming application the total of all the continued lines in a set cannot exceed {LINE_MAX}.

The format description is intended to be sufficiently rigorous to allow other applications to generate these input files. However, since <blank> characters can legitimately be included in some of the fields described by the standard utilities, particularly in locales other than the POSIX locale, this intent is not always realized.

ENVIRONMENT VARIABLES

There is no additional rationale provided for this section.

ASYNCHRONOUS EVENTS

Because there is no language prohibiting it, a utility is permitted to catch a signal, perform some additional processing (such as deleting temporary files), restore the default signal action, and resignal itself.

Austin Group Defects 1648 and 1772 are applied, clarifying the default behavior for signal handling.

STDOUT

The format description is intended to be sufficiently rigorous to allow post-processing of output by other programs, particularly by an awk or lex parser.

STDERR

This section does not describe error messages that refer to incorrect operation of the utility. Consider a utility that processes program source code as its input. This section is used to describe messages produced by a correctly operating utility that encounters an error in the program source code on which it is processing. However, a message indicating that the utility had insufficient memory in which to operate would not be described.

Some utilities have traditionally produced warning messages without returning a non-zero exit status; these are specifically noted in their sections. Other utilities shall not write to standard error if they complete successfully, unless the implementation provides some sort of extension to increase the verbosity or debugging level.

The format descriptions are intended to be sufficiently rigorous to allow post-processing of output by other programs.

OUTPUT FILES

The format description is intended to be sufficiently rigorous to allow post-processing of output by other programs, particularly by an awk or lex parser.

Receipt of the SIGQUIT signal should generally cause termination (unless in some debugging mode) that would bypass any attempted recovery actions.

EXTENDED DESCRIPTION

There is no additional rationale provided for this section.

EXIT STATUS

Note the additional discussion of exit values in Exit Status for Commands in the sh utility. It describes requirements for returning exit values greater than 125.

A utility may list zero as a successful return, 1 as a failure for a specific reason, and greater than 1 as "an error occurred". In this case, unspecified conditions may cause a 2 or 3, or other value, to be returned. A strictly conforming application should be written so that it tests for successful exit status values (zero in this case), rather than relying upon the single specific error value listed in POSIX.1-2024. In that way, it will have maximum portability, even on implementations with extensions.

The standard developers are aware that the general non-enumeration of errors makes it difficult to write test suites that test the incorrect operation of utilities. There are some historical implementations that have expended effort to provide detailed status messages and a helpful environment to bypass or explain errors, such as prompting, retrying, or ignoring unimportant syntax errors; other implementations have not. Since there is no realistic way to mandate system behavior in cases of undefined application actions or system problems—in a manner acceptable to all cultures and environments—attention has been limited to the correct operation of utilities by the conforming application. Furthermore, the conforming application does not need detailed information concerning errors that it caused through incorrect usage or that it cannot correct.

Austin Group Defect 1492 is applied, adding the Default Behavior paragraph.

CONSEQUENCES OF ERRORS

Several actions are possible when a utility encounters an error condition, depending on the severity of the error and the state of the utility. Included in the possible actions of various utilities are: deletion of temporary or intermediate work files; deletion of incomplete files; and validity checking of the file system or directory.

The text about recursive traversing is meant to ensure that utilities such as find process as many files in the hierarchy as they can. They should not abandon all of the hierarchy at the first error and resume with the next command-line operand, but should attempt to keep going.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0001 [150] is applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0003 [913] is applied.

Austin Group Defect 251 is applied, adding a note about the treatment of pathnames containing any bytes that have the encoded value of a <newline> character.

Austin Group Defect 1499 is applied, requiring utilities to exit with an exit status that indicates an error occurred, instead of any non-zero exit status.

APPLICATION USAGE

This section provides additional caveats, issues, and recommendations to the developer.

EXAMPLES

This section provides sample usage.

RATIONALE

There is no additional rationale provided for this section.

FUTURE DIRECTIONS

FUTURE DIRECTIONS sections act as pointers to related work that may impact the interface in the future, and often cautions the developer to architect the code to account for a change in this area. Note that a future directions statement should not be taken as a commitment to adopt a feature or interface in the future.

SEE ALSO

There is no additional rationale provided for this section.

CHANGE HISTORY

There is no additional rationale provided for this section.

C.1.6 Considerations for Utilities in Support of Files of Arbitrary Size

This section is intended to clarify the requirements for utilities in support of large files.

The utilities listed in this section are utilities which are used to perform administrative tasks such as to create, move, copy, remove, change the permissions, or measure the resources of a file. They are useful both as end-user tools and as utilities invoked by applications during software installation and operation.

The chgrp, chmod, chown, ln, and rm utilities probably require use of large file-capable versions of stat(), lstat(), nftw(), and the stat structure.

The cat, cksum, cmp, cp, dd, mv, and touch utilities probably require use of large file-capable versions of creat(), open(), and fopen().

The cat, cksum, cmp, dd, df, du, and ls utilities may require writing large integer values. For example:

The dd, find, and test utilities may need to interpret command arguments that contain 64-bit values. For dd, the arguments include skip=n, seek=n, and count=n. For find, the arguments include -sizen. For test, the arguments are those associated with algebraic comparisons.

The df utility might need to access large file systems with statvfs().

The ulimit utility will need to use large file-capable versions of getrlimit() and setrlimit() and be able to read and write large integer values.

Austin Group Defect 1568 is applied, removing references to the sum utility.

C.1.7 Built-In Utilities

Other than the special built-in utilities, there is no requirement to build utilities into the shell itself. However, many shells implement certain utilities as regular built-ins for the following reasons:

With the exception of the intrinsic utilities, all regular built-in utilities are subject to the PATH search and can be overridden by a specially crafted PATH environment variable.

Earlier versions of this standard required that all of the regular built-in utilities, including intrinsic utilities, could be exec-ed. This was always a contentious requirement, and with the introduction of intrinsic utilities the standard developers decided to exempt the utilities that this standard requires to be intrinsic, with the exception of kill. The kill utility is still genuinely useful when exec-ed, only lacking support for the % job ID notation, whereas examples given of uses for the other utilities that are now exempted were considered contrived (such as using cd to test accessibility of a directory, which can be done using test -x). If an application needs exec-able versions of some of the exempted intrinsic utilities, it can easily provide them itself, on systems that support the (non-standard but ubiquitous) "#!" mechanism to make scripts executable by the exec family of functions, as links to a two-line shell script:

#! /path/to/sh
${0##*/} "$@"

Austin Group Defect 854 is applied, replacing the table of Regular Built-In Utilities with a reference to the new Intrinsic Utilities section.

Austin Group Defect 1600 is applied, exempting the intrinsic utilities other than kill from the requirement that they can be exec-ed.

C.1.8 Intrinsic Utilities

There were varying reasons for including utilities in the table of intrinsic utilities:
alias, fc, unalias
The functionality of these utilities is performed more simply within the shell itself and that is the model most historical implementations have used.
bg, fg, jobs
All of the job control-related utilities are eligible for built-in status because that is the model most historical implementations have used.
cd, getopts, hash, read, type, ulimit, umask, wait
The functionality of these utilities is performed more simply within the context of the current process. An example can be taken from the usage of the cd utility. The purpose of the cd utility is to change the working directory for subsequent operations. The actions of cd affect the process in which cd is executed and all subsequent child processes of that process. Based on the POSIX standard process model, changes in the process environment of a child process have no effect on the parent process. If the cd utility were executed from a child process, the working directory change would be effective only in the child process. Child processes initiated subsequent to the child process that executed the cd utility would not have a changed working directory relative to the parent process.
command
This utility was placed in the table primarily to protect scripts that are concerned about their PATH being manipulated. The "secure" shell script example in the command utility in the Shell and Utilities volume of POSIX.1-2024 would not be possible if a PATH change retrieved an alien version of command. (An alternative would have been to implement getconf as a built-in, but the standard developers considered that it carried too many changing configuration strings to require in the shell.)
kill
Since kill provides optional job control functionality using shell notation (%1, %2, and so on), some implementations would find it extremely difficult to provide this outside the shell.

The following utilities are frequently implemented as intrinsic (and built-in) utilities. Future versions of this standard might not allow these utilities, or any other standard utility not in Intrinsic Utilities, to be intrinsic; implementations are encouraged to implement these as non-intrinsic utilities instead (but still built-in if they were previously built-in).

[, echo, false, newgrp, printf, pwd, test, true

All utilities, including those in the table, are accessible via the system() and popen() functions in the System Interfaces volume of POSIX.1-2024. There are situations where the return functionality of system() and popen() is not desirable. Applications that require the exit status of the invoked utility will not be able to use system() or popen(), since the exit status returned is that of the command language interpreter rather than that of the invoked utility. The alternative for such applications is the use of the exec family.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0004 [705] is applied.

Austin Group Defect 854 is applied, adding intrinsic utilities.

C.2 Shell Command Language

C.2.1 Shell Introduction

The System V shell was selected as the starting point for the Shell and Utilities volume of POSIX.1-2024. The BSD C shell was excluded from consideration for the following reasons:

The construct "#!" is reserved for implementations wishing to provide that extension. If it were not reserved, the Shell and Utilities volume of POSIX.1-2024 would disallow it by forcing it to be a comment. As it stands, a strictly conforming application must not use "#!" as the first two characters of the file.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

Austin Group Defect 1514 is applied, correcting a misuse of the term "positional parameter".

C.2.2 Quoting

Although this section contains a note indicating that a future version of this standard may extend the conditions under which some characters are special, there are no plans to do so. The note is there to encourage application writers to future-proof their shell code. In some cases existing widespread use of the characters unquoted would preclude them being given a special meaning in those use cases. For example, commas are in widespread use in filenames (notably by RCS and CVS) and it is common to pass the token "{}" as an argument to find and xargs unquoted.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

Austin Group Defects 1191 and 1193 are applied, adding:

]  ^  -  !  {  ,  }

to the list of characters that might need to be quoted under certain circumstances.

C.2.2.1 Escape Character (Backslash)

Austin Group Defect 500 is applied, changing "follows" to "immediately follows".

C.2.2.2 Single-Quotes

A <backslash> cannot be used to escape a single-quote in a single-quoted string. An embedded quote can be created by writing, for example: "'a'\''b'", which yields "a'b". (See XCU 2.6.5 Field Splitting for a better understanding of how portions of words are either split into fields or remain concatenated.) A single token can be made up of concatenated partial strings containing all three kinds of quoting or escaping, thus permitting any combination of characters.

C.2.2.3 Double-Quotes

The escaped <newline> used for line continuation is removed entirely from the input and is not replaced by any white space. Therefore, it cannot serve as a token separator.

In double-quoting, if a <backslash> is immediately followed by a character that would be interpreted as having a special meaning, the <backslash> is deleted and the subsequent character is taken literally. If a <backslash> does not precede a character that would have a special meaning, it is left in place unmodified and the character immediately following it is also left unmodified. Thus, for example:

"\$"  ->  $

"\a" -> \a

It would be desirable to include the statement "The characters from an enclosed "${" to the matching '}' shall not be affected by the double-quotes", similar to the one for "$()". However, historical practice in the System V shell prevents this.

Shell implementations differ widely in their handling of unescaped double-quote characters inside "${...}" (except for the four substring-processing variants). Hence this standard leaves the behavior unspecified. Single-quotes are ordinary characters in this context, and so cannot be used to quote a '}' within "${...}". However, <backslash> can be used to escape a '}'. For example, the value of foo assigned by the following commands is '}':

unset bar
foo="${bar-\}}"

When <backslash> is used in this way it is a special character and is therefore removed during quote removal, even though it would not be removed in:

foo="\}"

Differences in processing the "${...}" form led to inconsistencies between the historical System V shell, BSD, and KornShells, and the text in the Shell and Utilities volume of POSIX.1-2024 is an attempt to converge them without breaking too many applications. The only alternative to this compromise between shells would be to make the behavior unspecified not just for unescaped double-quote but also for unescaped single-quote, '{', or '}'. The chosen requirements provide the maximum consistency between normal double-quote behavior and parameter expansion within double-quotes; the only real difference being the ability to escape a '}' with <backslash>.

Some implementations have allowed the end of the word to terminate the backquoted command substitution, such as in:

"`echo hello"

This usage is undefined; the matching backquote is required by the Shell and Utilities volume of POSIX.1-2024. The other undefined usage can be illustrated by the example:

sh -c '` echo "foo`'

The description of the recursive actions involving command substitution can be illustrated with an example. Upon recognizing the introduction of command substitution, the shell parses input (in a new context), gathering the source for the command substitution until an unbalanced ')' or '`' is located. For example, in the following:

echo "$(date; echo "
    one" )"

the double-quote following the echo does not terminate the first double-quote; it is part of the command substitution script. Similarly, in:

echo "$(echo *)"

the <asterisk> is not quoted since it is inside command substitution; however:

echo "$(echo "*")"

is quoted (and represents the <asterisk> character itself).

The $'...' construct does not retain its special meaning inside double quotes. This was discussed by the standard developers and rejected. Note that $'...' is a quoting mechanism and not an expansion. Losing the special meaning inside double-quotes is consistent with other quoting mechanisms losing their special meaning when quoted.

Austin Group Defect 221 is applied, clarifying the behavior of double-quotes within the string of characters from "${" to the matching '}' in parameter expansions using that form.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

Austin Group Defect 500 is applied, clarifying the behavior of <backslash> within double-quotes.

Austin Group Defect 1268 is applied, clarifying the effect of double-quotes on the results of parameter expansion, command substitution, or arithmetic expansion.

Austin Group Defect 1342 is applied, clarifying the requirements for alias substitutions inside command substitutions.

C.2.2.4 Dollar-Single-Quotes

The $'...' quoting construct has been implemented in several recent shells. It is similar to character string literals ("...") in the ISO C standard with the following exceptions:

This standard makes the results implementation-defined if \e or \cX specifies a character that is not present in the current locale. Application authors should note that implementations are permitted to have a wide range of behaviors when encountering an unsupported character. For example:

However, implementations must document their behavior, and they are prohibited from replacing an unsupported character with bytes that do not form valid characters in the current locale's character set (e.g., encoding in UTF-8 when the locale has a 7-bit character set). This standard does not specify a way for script authors to determine beforehand whether a particular \cX sequence specifies a character that exists in the current locale. At the time this feature was standardized, no known implementations provided such a capability.

Note that the escape sequences recognized by $'...', file format notation (see Escape Sequences and Associated Actions), XSI-conforming implementations of the echo utility (see the utility's OPERANDS section in echo), and the printf utility's format operand (see the utility's EXTENDED DESCRIPTION in printf) are not the same. Some escape sequences are not recognized by all of the above, the \c escape sequence in echo is not at all like the \c escape sequence in $'...', octal escape sequences in some of the above accept one to four octal digits and require a leading zero while others accept one to three octal digits and do not require a leading zero.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

C.2.3 Token Recognition

The "((" and "))" symbols are control operators in the KornShell, used for an alternative syntax of an arithmetic expression command. A conforming application cannot use "((" as a single token (with the exception of the "$((" form for shell arithmetic).

On some implementations, the symbol "((" is a control operator; its use produces unspecified results. Applications that wish to have nested subshells, such as:

((echo Hello);(echo World))

must separate the "((" characters into two tokens by including white space between them. Some systems may treat these as invalid arithmetic expressions instead of subshells.

Certain combinations of characters are invalid in portable scripts, as shown in the grammar. Implementations may use these combinations (such as "|&") as valid control operators. Portable scripts cannot rely on receiving errors in all cases where this volume of POSIX.1-2024 indicates that a syntax is invalid.

The (3) rule about combining characters to form operators is not meant to preclude systems from extending the shell language when characters are combined in otherwise invalid ways. Conforming applications cannot use invalid combinations, and test suites should not penalize systems that take advantage of this fact. For example, the unquoted combination "|&" is not valid in a POSIX script, but has a specific KornShell meaning.

The (10) rule about '#' as the current character is the first in the sequence in which a new token is being assembled. The '#' starts a comment only when it is at the beginning of a token. This rule is also written to indicate that the search for the end-of-comment does not consider escaped <newline> specially, so that a comment cannot be continued to the next line.

Because a complete_command encountered during a program is executed before the next complete_command is tokenized and parsed, syntax errors are not discovered by the shell until just before the code would be executed. While in some cases it might be desirable to detect and react to syntax errors before anything is executed (possible with sh -n), deferring the discovery of syntax errors has several benefits:

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0005 [718], XCU/TC2-2008/0006 [647], XCU/TC2-2008/0007 [568], and XCU/TC2-2008/0008 [648] are applied.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

Austin Group Defect 1036 is applied, clarifying how here-documents are parsed.

Austin Group Defect 1055 is applied, clarifying how much of a program is parsed before the parsed commands are executed.

Austin Group Defect 1083 is applied, changing "the next character" to "each character in turn".

Austin Group Defect 1085 is applied, clarifying requirements for the start and end of tokens.

C.2.3.1 Alias Substitution

The alias capability was added because it is widely used in historical implementations by interactive users.

The definition of "alias name" precludes an alias name containing a <slash> character. Since the text applies to the command words of simple commands, reserved words (in their proper places) cannot be confused with aliases.

The placement of alias substitution in token recognition makes it clear that it precedes all of the word expansion steps.

An example concerning trailing <blank> characters and reserved words follows. If the user types:

$ alias foo="/bin/ls "
$ alias while="/"

The effect of executing:

$ while true
> do
> echo "Hello, World"
> done

is a never-ending sequence of "Hello, World" strings to the screen. However, if the user types:

$ foo while

the result is an ls listing of /. Since the alias substitution for foo ends in a <space>, the next word is checked for alias substitution. The next word, while, has also been aliased, so it is substituted as well. Since it is not in the proper position as a command word, it is not recognized as a reserved word.

If the user types:

$ foo; while

while retains its normal reserved-word properties.

Some implementations add a <space> after the alias value when performing alias substitution in order to prevent the last character of the alias value and the first character after the alias name in the input from combining to form an operator. However, the extra <space> can have side-effects in other situations, such as if the alias value ends with an unquoted <backslash>. Implementations which do this are encouraged to change to an alternative method of delimiting a partial operator token at the end of an alias value.

Some, but not all, shell implementations do not process changes to alias definitions until the current compound_list (see XCU 2.10 Shell Grammar) has completed. In these shells, alias changes do not take effect until the end of the dot script, eval command, function invocation, if statement, case statement, for statement, while statement, or until statement containing the alias change.

Many shell implementations execute the contents of a file, typically ~/.profile, when invoked as a login shell. The standard developers are unaware of any such implementations that process the contents of ~/.profile (and similar startup files) as a single compound_list, so alias changes in ~/.profile typically do take effect before the end of ~/.profile.

Austin Group Defects 953 and 1630 are applied, providing additional detail on how alias substitution is performed.

C.2.4 Reserved Words

All reserved words are recognized syntactically as such in the contexts described. However, note that in is the only meaningful reserved word after a case or for; similarly, in is not meaningful as the first word of a simple command.

Reserved words are recognized only when they are delimited (that is, meet the definition of XBD 3.420 Word), whereas operators are themselves delimiters. For instance, '(' and ')' are control operators, so that no <space> is needed in (list). However, '{' and '}' are reserved words in { list;}, so that in this case the leading <space> and <semicolon> are required.

The list of unspecified reserved words is from the KornShell, so conforming applications cannot use them in places a reserved word would be recognized. Earlier versions of this standard omitted time from this list, so that the time utility could be included without requiring applications to quote all or part of its name (or use other measures) in order to avoid it being treated as a reserved word. However, although the intent was to allow the reserved word implementation (as evidenced by use of time in pipelines being unspecified, and explicit mention in the rationale of the time utility), the conditions under which the behavior was unspecified were insufficient to allow this. In particular, redirection in KornShell does not work in the normal way when time is a reserved word:

time utility 2> time.out

only writes the standard error from utility to time.out; the timing information is written to the shell's standard error, but these versions of the standard required the timing information to be written to time.out. Another issue was that if time is a reserved word, an application cannot define a function with that name, but these versions of the standard required that applications could do so. Hence time has now been added to the list of unspecified reserved words, but with its use as a reserved word limited in order to be compatible with its use as a utility in the cases where the two have traditionally had the same effect (other than possible output format differences).

There was a strong argument for promoting braces to operators (instead of reserved words), so they would be syntactically equivalent to subshell operators. Concerns about compatibility outweighed the advantages of this approach. Nevertheless, conforming applications should consider quoting '{' and '}' when they represent themselves.

When used in circumstances where reserved words are recognized, all words whose final character is a <colon> (':') are reserved. The case of a name suffixed with a colon is reserved to allow implementations to support named labels for flow control; see the RATIONALE for the break special built-in utility. Other words ending in <colon> are reserved to provide implementations with a way to add new reserved words while still conforming to this standard.

It is possible that a future version of the Shell and Utilities volume of POSIX.1-2024 may require that '{' and '}' be treated individually as control operators, although the token "{}" will probably be a special-case exemption from this because of the often-used find{} construct.

Austin Group Defect 267 is applied, adding time to the list of words that may be recognized as reserved words while specifying its behavior if it is recognized as a reserved word, and extending the reservation of words whose final character is <colon> from those that are a name followed by a <colon> to all such words.

Austin Group Defect 465 is applied, adding namespace to the list of words that may be recognized as reserved words.

C.2.5 Parameters and Variables

Austin Group Defect 1561 is applied, clarifying that parameters can contain byte sequences that do not form valid characters and that the shell processes their values as characters only when performing operations that are described in this standard in terms of characters.

C.2.5.1 Positional Parameters

Austin Group Defect 1491 is applied, clarifying the handling of leading zeros in positional parameter identifiers.

C.2.5.2 Special Parameters

Most historical implementations implement subshells by forking; thus, the special parameter '$' does not necessarily represent the process ID of the shell process executing the commands since the subshell execution environment preserves the value of '$'.

If a subshell were to execute a background command, the value of "$!" for the parent would not change. For example:

(
date &
echo $!
)
echo $!

would echo two different values for "$!".

The "$-" special parameter can be used to save and restore set options:

Save=$(echo $- | sed 's/[ics]//g')
...
set +aCefnuvx
if [ -n "$Save" ]; then
    set -$Save
fi

The three options are removed using sed in the example because they may appear in the value of "$-" (from the sh command line), but are not valid options to set.

The descriptions of parameters '*' and '@' assume the reader is familiar with the field splitting discussion in XCU 2.6.5 Field Splitting and understands that portions of the word remain concatenated unless there is some reason to split them into separate fields.

The following examples illustrate some of the ways in which '*' and '@' can be expanded:

set "abc" "def ghi" "jkl"
unset novar
IFS=' ' # a space
printf '%s\n' $*
abc
def
ghi
jkl
printf '%s\n' "$*"
abc def ghi jkl
printf '%s\n' xx$*yy
xxabc
def
ghi
jklyy
printf '%s\n' "xx$*yy"
xxabc def ghi jklyy
printf '%s\n' $@
abc
def
ghi
jkl
printf '%s\n' "$@"
abc
def ghi
jkl
printf '%s\n' ${1+"$@"}
abc
def ghi
jkl
printf '%s\n' ${novar-"$@"}
abc
def ghi
jkl
printf '%s\n' xx$@yy
xxabc
def
ghi
jklyy
printf '%s\n' "xx$@yy"
xxabc
def ghi
jklyy
printf '%s\n' $@$@
abc
def
ghi
jklabc
def
ghi
jkl
printf '%s\n' "$@$@"
abc
def ghi
jklabc
def ghi
jkl
IFS=':'
printf '%s\n' "$*"
abc:def ghi:jkl
var=$*; printf '%s\n' "$var"
abc:def ghi:jkl
var="$*"; printf '%s\n' "$var"
abc:def ghi:jkl
unset var
printf '%s\n' ${var-$*}
abc
def ghi
jkl
printf '%s\n' "${var-$*}"
abc:def ghi:jkl
printf '%s\n' ${var-"$*"}
abc:def ghi:jkl
printf '%s\n' ${var=$*}
abc
def ghi
jkl
printf 'var=%s\n' "$var"
var=abc:def ghi:jkl
unset var
printf '%s\n' "${var=$*}"
abc:def ghi:jkl
printf 'var=%s\n' "$var"
var=abc:def ghi:jkl

IFS='' # null printf '%s\n' "$*"
abcdef ghijkl var=$*; printf '%s\n' "$var" abcdef ghijkl var="$*"; printf '%s\n' "$var" abcdef ghijkl unset var printf '%s\n' ${var-$*} abc def ghi jkl printf '%s\n' "${var-$*}" abcdef ghijkl printf '%s\n' ${var-"$*"} abcdef ghijkl printf '%s\n' ${var=$*} abcdef ghijkl printf 'var=%s\n' "$var" var=abcdef ghijkl unset var printf '%s\n' "${var=$*}" abcdef ghijkl printf 'var=%s\n' "$var" var=abcdef ghijkl printf '%s\n' "$@" abc def ghi jkl
unset IFS printf '%s\n' "$*"
abc def ghi jkl var=$*; printf '%s\n' "$var" abc def ghi jkl var="$*"; printf '%s\n' "$var" abc def ghi jkl unset var printf '%s\n' ${var-$*} abc def ghi jkl printf '%s\n' "${var-$*}" abc def ghi jkl printf '%s\n' ${var-"$*"} abc def ghi jkl printf '%s\n' ${var=$*} abc def ghi jkl printf 'var=%s\n' "$var" var=abc def ghi jkl unset var printf '%s\n' "${var=$*}" abc def ghi jkl printf 'var=%s\n' "$var" var=abc def ghi jkl printf '%s\n' "$@" abc def ghi jkl
set one "" three printf '[%s]\n' $*
[one] [] (this line of output is optional) [three] printf '[%s]\n' $@ [one] [] (this line of output is optional) [three]
set -- printf '[%s]\n' foo "$*"
[foo] [] printf '[%s]\n' foo "$novar$*$(echo)" [foo] [] printf '[%s]\n' foo $@ [foo] printf '[%s]\n' foo "$@" [foo] printf '[%s]\n' foo ''$@ [foo] [] printf '[%s]\n' foo ''"$@" [foo] [] printf '[%s]\n' foo "$novar$@$(echo)" [foo] [] (this line of output is optional) printf '[%s]\n' foo ''"$novar$@$(echo)" [foo] []

In all of the following commands the results of the expansion of '@' (if performed) are unspecified:

var=$@
var="$@"
printf '%s\n' ${var=$@}
printf '%s\n' "${var=$@}"
printf '%s\n' ${var="$@"}
printf '%s\n' ${var?$@}
printf '%s\n' "${var?$@}"
printf '%s\n' ${var?"$@"}
printf '%s\n' ${#@}
printf '%s\n' "${#@}"
printf '%s\n' ${@%foo}
printf '%s\n' "${@%foo}"
printf '%s\n' ${@#foo}
printf '%s\n' "${@#foo}"
printf '%s\n' ${var%$@}
printf '%s\n' "${var%$@}"
printf '%s\n' ${var%"$@"}
printf '%s\n' ${var%%$@}
printf '%s\n' "${var%%$@}"
printf '%s\n' ${var%%"$@"}
printf '%s\n' ${var#$@}
printf '%s\n' "${var#$@}"
printf '%s\n' ${var#"$@"}
printf '%s\n' ${var##$@}
printf '%s\n' "${var##$@}"
printf '%s\n' ${var##"$@"}

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0009 [888] is applied.

Austin Group Defect 1039 is applied, clarifying the description of the '-' special parameter.

Austin Group Defect 1052 is applied, clarifying that decimal valued special parameters expand to the shortest representation.

Austin Group Defects 1150 and 1309 are applied, clarifying the description of the '?' special parameter.

Austin Group Defect 1254 is applied, clarifying how the '?' and '!' special parameters are affected by job control.

C.2.5.3 Shell Variables

Since shell variables are parameters denoted by a name, the shell cannot initialize shell variables from environment variables that do not have a valid name. However, the shell may initialize parameters that do not have valid names from such environment variables.

See the discussion of IFS in C.2.6.5 Field Splitting and the RATIONALE for the sh utility.

The prohibition on LC_CTYPE changes affecting lexical processing protects the shell implementor (and the shell programmer) from the ill effects of changing the definition of <blank> or the set of alphabetic characters in the current environment. It would probably not be feasible to write a compiled version of a shell script without this rule. The rule applies only to the current invocation of the shell and its subshells—invoking a shell script or performing exec sh would subject the new shell to the changes in LC_CTYPE .

Other common environment variables used by historical shells are not specified by the Shell and Utilities volume of POSIX.1-2024, but they should be reserved for the historical uses.

Tilde expansion for components of PATH in an assignment such as:

PATH=~hlj/bin:~dwc/bin:$PATH

is a feature of some historical shells and is allowed by the wording of XCU 2.6.1 Tilde Expansion. Note that the <tilde> characters are expanded during the assignment to PATH , not when PATH is accessed during command search.

The following entries represent additional information about variables included in the Shell and Utilities volume of POSIX.1-2024, or rationale for common variables in use by shells that have been excluded:

_
(Underscore.) While <underscore> is historical practice, its overloaded usage in the KornShell is confusing, and it has been omitted from the Shell and Utilities volume of POSIX.1-2024.
ENV
This variable can be used to set aliases and other items local to the invocation of a shell. The file referred to by ENV differs from $HOME/.profile in that .profile is typically executed at session start-up, whereas the ENV file is executed at the beginning of each shell invocation. The ENV value is interpreted in a manner similar to a dot script, in that the commands are executed in the current environment and the file needs to be readable, but not executable. However, unlike dot scripts, no PATH searching is performed. This is used as a guard against Trojan Horse security breaches.
ERRNO
This variable was omitted from the Shell and Utilities volume of POSIX.1-2024 because the values of error numbers are not defined in POSIX.1-2024 in a portable manner.
FCEDIT
Since this variable affects only the fc utility, it has been omitted from this more global place. The value of FCEDIT does not affect the command-line editing mode in the shell; see the description of set -o vi in the set built-in utility.
PS1
This variable is used for interactive prompts. Historically, the "superuser" has had a prompt of '#'. Since privileges are not required to be monolithic, it is difficult to define which privileges should cause the alternate prompt. However, a sufficiently powerful user should be reminded of that power by having an alternate prompt.
PS3
This variable is used by the KornShell for the select command. Since the POSIX shell does not include select, PS3 was omitted.
PS4
This variable is used for shell debugging. For example, the following script:
PS4='[${LINENO}]+ '
set -x
echo Hello

writes the following to standard error:

[3]+ echo Hello
RANDOM
This pseudo-random number generator was not seen as being useful to interactive users.
SECONDS
Although this variable is sometimes used with PS1 to allow the display of the current time in the prompt of the user, it is not one that would be manipulated frequently enough by an interactive user to include in the Shell and Utilities volume of POSIX.1-2024.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0002 [152] is applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0010 [888], XCU/TC2-2008/0011 [884], and XCU/TC2-2008/0012 [494] are applied.

Austin Group Defect 953 is applied, clarifying how the ENV file is parsed.

Austin Group Defect 1006 is applied, clarifying how the values of the PS1 , PS2 , and PS4 variables are expanded.

Austin Group Defect 1441 is applied, requiring PS4 to be used in non-interactive shells.

Austin Group Defect 1511 is applied, making the description of LINENO consistent with other variables as regards how they relate to the User Portability Utilities option.

Austin Group Defect 1561 is applied, clarifying that shell variables are initialized only from environment variables that have valid names.

C.2.6 Word Expansions

Some shells implement brace expansion which expands, for example, file{A,B,C}.c into the fields fileA.c, fileB.c, and fileC.c or file{1..3}.c into the fields file1.c, file2.c, and file3.c. This form of expansion is allowed but not required by this standard, but if supported must be performed before all of the standard word expansions. A variant which some shells implement whereby brace expansion is performed following field splitting was considered by the standard developers and rejected because it causes surprising behavior if the results of parameter expansion and command substitution happen to produce a valid brace expansion. For example, if the shell variable patt contains an arbitrary pathname, glob pattern applications cannot rely on some_command -- $patt passing a list of pathnames that match the pattern to some_command. Note that quoting the braces or commas prevents this form of expansion, but quoting the periods need not prevent it.

Step (2) refers to the "portions of fields generated by step (1)". For example, if the word being expanded were "$x+$y" and IFS =+, the word would be split only if "$x" or "$y" contained '+'; the '+' in the original word was not generated by step (1).

IFS is used for performing field splitting on the results of parameter and command substitution; it is not used for splitting all fields. Earlier versions of the shell used it for splitting all fields during field splitting, but this has severe problems because the shell can no longer parse its own script. There are also important security implications caused by this behavior. All useful applications of IFS use it for parsing input of the read utility and for splitting the results of parameter and command substitution.

The rule concerning expansion to a single field requires that if foo=abc and bar=def, that:

"$foo""$bar"

expands to the single field:

abcdef

The rule concerning empty fields can be illustrated by:

$    unset foo
$    set $foo bar '' xyz "$foo" abc
$    for i
>    do
>        echo "-$i-"
>    done
-bar-
--
-xyz-
--
-abc-

Step (1) indicates that parameter expansion, command substitution, and arithmetic expansion are all processed simultaneously as they are scanned. For example, the following is valid arithmetic:

x=1
echo $(( $(echo 3)+$x ))

An early proposal stated that tilde expansion preceded the other steps, but this is not the case in known historical implementations; if it were, and if a referenced home directory contained a '$' character, expansions would result within the directory name.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0003 [49,430] is applied.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

Austin Group Defect 985 is applied, clarifying that quote removal is not always performed.

Austin Group Defect 1038 is applied, clarifying that a '$' that is followed by a <space>, <tab>, or a <newline>, or is not followed by any character, is treated as a literal character.

Austin Group Defect 1123 is applied, clarifying the environment in which expansions are performed and requirements relating to empty fields.

Austin Group Defect 1193 is applied, adding optional brace expansion.

C.2.6.1 Tilde Expansion

Tilde expansion generally occurs only at the beginning of words, but an exception based on historical practice has been included:

PATH=/posix/bin:~dgk/bin

This is eligible for tilde expansion because <tilde> follows a <colon> and none of the relevant characters is quoted. Consideration was given to prohibiting this behavior because any of the following are reasonable substitutes:

PATH=$(printf %s ~karels/bin : ~bostic/bin)

for Dir in ~maart/bin ~srb/bin ... do PATH=${PATH:+$PATH:}$Dir done

In the first command, explicit <colon> characters are used for each directory. In all cases, the shell performs tilde expansion on each directory because all are separate words to the shell.

Note that expressions in operands such as:

make -k mumble LIBDIR=~chet/lib

do not qualify as shell variable assignments, and tilde expansion is not performed (unless the command does so itself, which make does not).

Because of the requirement that the word is not quoted, the following are not equivalent; only the last causes tilde expansion:

\~hlj/   ~h\lj/   ~"hlj"/   ~hlj\/   ~hlj/

In an early proposal, tilde expansion occurred following any unquoted <equals-sign> or <colon>, but this was removed because of its complexity and to avoid breaking commands such as:

rcp hostname:~marc/.profile .

System administrators on systems where // has an implementation-defined meaning which is different to /, should not create users with a home directory of / or //, since this may lead to unexpected filename resolution on those systems.

A suggestion was made that the special sequence "$~" should be allowed to force tilde expansion anywhere. Since this is not historical practice, it has been left for future implementations to evaluate. (The description in XCU 2.2 Quoting requires that a <dollar-sign> be quoted to represent itself, so the "$~" combination is already unspecified.)

The results of giving <tilde> with an unknown login name are undefined because the KornShell "~+" and "~-" constructs make use of this condition, but in general it is an error to give an incorrect login name with <tilde>. The results of having HOME unset are unspecified because some historical shells treat this as an error.

Historically, the Korn shell performed field splitting and pathname expansion on the results of tilde expansion, and earlier versions of this standard reflected this. However, tilde expansion results in a pathname, and performing field splitting and pathname expansion on something that is already a pathname is at best redundant and at worst will change the value from the correct pathname to one or more incorrect ones. Later versions of the Korn shell do not perform these expansions and POSIX.1-2024 has been updated to match. Note that although pathname expansion is not performed on the results of tilde expansion, this does not prevent other parts of the same word from being expanded. For example, ~/a* expands to all files in $HOME beginning with 'a'.

Austin Group Defect 1172 is applied, clarifying how quoting affects tilde expansion.

Austin Group Defect 1632 is applied, clarifying the treatment of <slash> characters in tilde expansion.

C.2.6.2 Parameter Expansion

The rule for finding the closing '}' in "${...}" is the one used in the KornShell and is upwardly-compatible with the Bourne shell, which does not determine the closing '}' until the word is expanded. The advantage of this is that incomplete expansions, such as:

${foo

can be determined during tokenization, rather than during expansion.

Quote removal is performed when assigning the value in the ${parameter:=[word]} form of expansion in order that a subsequent expansion of the same parameter produces the same value as the original expansion. That is, the commands:

unset parameter
foo=${parameter:=word}
bar=${parameter}

assign the same value to foo and bar. A consequence of this is that the expansions ${parameter:=[word]} and ${parameter:-[word]} can produce different results for the same word. For example, with parameter unset or empty:

${parameter:-a\ b}

expands to a single field "a b", whereas:

${parameter:=a\ b}

expands to two fields 'a' and 'b' (because parameter is assigned the value "a b" before its value is substituted).

For rationale regarding expansion of "${...}" within double-quotes, see C.2.2.3 Double-Quotes.

The string length and substring capabilities were included because of the demonstrated need for them, based on their usage in other shells, such as C shell and KornShell.

Historical versions of the KornShell have not performed tilde expansion on the word part of parameter expansion; however, it is more consistent to do so.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0004 [458], XCU/TC1-2008/0005 [458], XCU/TC1-2008/0006 [457], XCU/TC1-2008/0007 [457], XCU/TC1-2008/0008 [417], XCU/TC1-2008/0009 [457], XCU/TC1-2008/0010 [457], XCU/TC1-2008/0011 [457], XCU/TC1-2008/0012 [457], XCU/TC1-2008/0013 [457], XCU/TC1-2008/0014 [457], XCU/TC1-2008/0015 [457], XCU/TC1-2008/0016 [457], XCU/TC1-2008/0017 [457], and XCU/TC1-2008/0018 [458] are applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0013 [888] and XCU/TC2-2008/0014 [867] are applied.

Austin Group Defect 221 is applied, removing a statement about counting brace levels and clarifying that quote removal is performed when expanding word in ${parameter:=[word]}.

Austin Group Defect 985 is applied, clarifying when quote removal is performed.

Austin Group Defect 1052 is applied, clarifying the description of string length expansion.

Austin Group Defect 1268 is applied, removing text relating to parameter expansion inside double-quotes.

Austin Group Defect 1478 is applied, making explicitly unspecified the results of parameter expansions that test whether the parameter '*' or '@' is unset or null.

Austin Group Defect 1491 is applied, restructuring a paragraph that used "Otherwise" after two conditions.

Austin Group Defect 1561 is applied, clarifying that the varieties of parameter expansion that provide for substring processing process parameter values as characters.

C.2.6.3 Command Substitution

The "$()" form of command substitution solves a problem of inconsistent behavior when using backquotes. For example:

Command

Output

echo '\$x'

\$x

echo `echo '\$x'`

$x

echo $(echo '\$x')

\$x

Additionally, the backquoted syntax has historical restrictions on the contents of the embedded command. While the newer "$()" form can process any kind of valid embedded script (with a few caveats; see below), the backquoted form cannot handle some valid scripts that include backquotes. For example, these otherwise valid embedded scripts do not work in the left column, but do work on the right:

echo `                         echo $(
cat <<\eof                     cat <<\eof
a here-doc with `              a here-doc with )
eof                            eof
`                              )

echo ` echo $( echo abc # a comment with ` echo abc # a comment with ) ` )
echo ` echo $( echo '`' echo ')' ` )

Because of these inconsistent behaviors, the backquoted variety of command substitution is not recommended for new applications that nest command substitutions or attempt to embed complex scripts.

The KornShell feature:

If the commands string is of the form <word, word is expanded to generate a pathname, and the value of the command substitution is the contents of this file with any trailing <newline> characters deleted.

was omitted from the Shell and Utilities volume of POSIX.1-2024 because $(cat word) is an appropriate substitute. However, to prevent breaking numerous scripts relying on this feature, it is unspecified to have a script within "$()" that has only redirections.

In IEEE Std 1003.2-1992 the $(commands) form of command substitution only had unspecified behavior for a commands string consisting solely of redirections. However, two additional unspecified cases have since been added with relation to aliases:

  1. Implementations are permitted to parse the entire commands string before executing any of it, and in this case alias and unalias commands in commands have no effect during parsing. For example, the following commands:
    alias foo='echo "hello globe"'
    echo $(alias foo='echo "Hello World"';foo)
    

    produce the output "hello globe" if the commands string is executed as an entire command and produce the output "Hello World" if the commands string is executed incrementally.

  2. Although existing aliases are required to be expanded when the shell parses the input that follows the "$(" in order to find the terminating ')' (see 2.3 Token Recognition), it is unspecified whether the terminating ')' can result from alias substitution. For example, with this script:
    alias foo="echo foo )"
    echo $(foo ; echo bar
    

    some shells output lines containing "foo" and "bar" whereas other shells report a syntax error because they do not find a terminating ')' for the command substitution.

Arithmetic expansions have precedence over command substitutions. That is, if the shell can parse an expansion beginning with "$((" as an arithmetic expansion then it will do so. It will only parse the expansion as a command substitution (that starts with a subshell) if it determines that it cannot parse the expansion as an arithmetic expansion. If the syntax is valid for neither type of expansion, then it is unspecified what kind of syntax error the shell reports.

How well the shell performs this determination is a quality of implementation issue. Current shell implementations use heuristics. In particular, the shell need not evaluate nested expansions when determining whether it can parse an expansion beginning with "$((" as an arithmetic expansion. For example:

$((a $op b))

is always an arithmetic expansion if "$op" expands to, say, '+', but if "$op" expands to '(' then the shell might still parse the expansion as an arithmetic expansion (resulting in a syntax error due to unbalanced parentheses) or it might perform a command substitution.

This standard requires that conforming applications always separate the "$(" and '(' with white space when a command substitution starts with a subshell. This is because implementations may support extensions in arithmetic expressions which could result in the shell parsing the input as an arithmetic expansion even though a minimally conforming shell would not. For example, many shells support arrays with the array index (which can be an expression) in square brackets. Therefore, the presence of "myfile[0-9]" within an expansion beginning "$((" is no guarantee that it will be parsed as a command substitution.

The ambiguity is not restricted to the simple case of a single subshell. More complicated ambiguous cases are possible (even with just the standard shell syntax), such as:

$(( cat <<EOH
+ ( (
EOH
) && ( cat <<EOH
) ) + 1 +
EOH
))

This can be parsed as an arithmetic expansion, with cat and EOH as the names of shell variables. Ambiguous cases also exist where the end of the expansion is at a different location for the arithmetic expansion and the command substitution:

$((cat <<EOF
+((((
EOF
) && (
cat <<EOF
+
EOF
))

This is an incomplete arithmetic expansion, but would have been a (complete) command substitution if it could not have been parsed as an arithmetic expansion. If this expansion occurs at the end of input then the shell reports a syntax error; it does not parse it as a command substitution.

IEEE Std 1003.1-2001/Cor 1-2002, item XCU/TC1/D6/4 is applied, changing the text from: "If a command substitution occurs inside double-quotes, it shall not be performed on the results of the substitution." to: "If a command substitution occurs inside double-quotes, field splitting and pathname expansion shall not be performed on the results of the substitution.". The replacement text taken from the ISO POSIX-2:1993 standard is clearer about the items that are not performed.

SD5-XCU-ERN-84 is applied, clarifying how the search for the matching backquote is satisfied.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0019 [217] is applied.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

Austin Group Defect 953 is applied, clarifying how the commands in command substitutions are parsed.

Austin Group Defect 1015 is applied, clarifying the handling of <backslash> when a backquoted command substitution is within double-quotes.

Austin Group Defect 1268 is applied, removing text relating to command substitution inside double-quotes.

Austin Group Defect 1342 is applied, clarifying the requirements for alias substitutions inside command substitutions.

Austin Group Defect 1560 is applied, clarifying that the standard output of the command(s) in a command substitution is treated as a sequence of bytes.

C.2.6.4 Arithmetic Expansion

The standard developers agreed that there was a strong desire for some kind of arithmetic evaluator to provide functionality similar to expr, that relating it to '$' makes it work well with the standard shell language and provides access to arithmetic evaluation in places where accessing a utility would be inconvenient.

The syntax and semantics for arithmetic were revised for the ISO/IEC 9945-2:1993 standard. The language represents a simple subset of the previous arithmetic language (which was derived from the KornShell "(())" construct). The syntax was changed from that of a command denoted by ((expression)) to an expansion denoted by $((expression)). The new form is a dollar expansion ('$') that evaluates the expression and substitutes the resulting value. Objections to the previous style of arithmetic included that it was too complicated, did not fit in well with the use of variables in the shell, and its syntax conflicted with subshells. The justification for the new syntax is that the shell is traditionally a macro language, and if a new feature is to be added, it should be accomplished by extending the capabilities presented by the current model of the shell, rather than by inventing a new one outside the model; adding a new dollar expansion was perceived to be the most intuitive and least destructive way to add such a new capability.

The standard requires assignment operators to be supported (as listed in XCU 1.1.2 Concepts Derived from the ISO C Standard), and since arithmetic expansions are not specified to be evaluated in a subshell environment, changes to variables there have to be in effect after the arithmetic expansion, just as in the parameter expansion "${x=value}".

Note, however, that "$(( x=5 ))" need not be equivalent to "$(( $x=5 ))". If the value of the environment variable x is the string "y=", the expansion of "$(( x=5 ))" would set x to 5 and output 5, but "$(( $x=5 ))" would output 0 if the value of the environment variable y is not 5 and would output 1 if the environment variable y is 5. Similarly, if the value of the environment variable is 4, the expansion of "$(( x=5 ))" would still set x to 5 and output 5, but "$(( $x=5 ))" (which would be equivalent to "$(( 4=5 ))") would yield a syntax error.

In early proposals, a form $[expression] was used. It was functionally equivalent to the "$(())" of the current text, but objections were lodged that the 1988 KornShell had already implemented "$(())" and there was no compelling reason to invent yet another syntax. Furthermore, the "$[]" syntax had a minor incompatibility involving the patterns in case statements.

The portion of the ISO C standard arithmetic operations selected corresponds to the operations historically supported in the KornShell. In addition to the exceptions listed in XCU 2.6.4 Arithmetic Expansion, the use of the following are explicitly outside the scope of the rules defined in XCU 1.1.2.1 Arithmetic Precision and Operations:

It was concluded that the test command ([) was sufficient for the majority of relational arithmetic tests, and that tests involving complicated relational expressions within the shell are rare, yet could still be accommodated by testing the value of "$(())" itself. For example:

# a complicated relational expression
while [ $(( (($x + $y)/($a * $b)) < ($foo*$bar) )) -ne 0 ]

or better yet, the rare script that has many complex relational expressions could define a function like this:

val() {
    return $((!$1))
}

and complicated tests would be less intimidating:

while val $(( (($x + $y)/($a * $b)) < ($foo*$bar) ))
do
    # some calculations
done

A suggestion that was not adopted was to modify true and false to take an optional argument, and true would exit true only if the argument was non-zero, and false would exit false only if the argument was non-zero:

while true $(($x > 5 && $y <= 25))

There is a minor portability concern with the new syntax. The example "$((2+2))" could have been intended to mean a command substitution of a utility named "2+2" in a subshell. The standard developers considered this to be obscure and isolated to some KornShell scripts (because "$()" command substitution existed previously only in the KornShell). The text on command substitution requires that the "$(" and '(' be separate tokens if this usage is needed.

An example such as:

echo $((echo hi);(echo there))

should not be misinterpreted by the shell as arithmetic because attempts to balance the parentheses pairs would indicate that they are subshells. However, as indicated by XBD 3.85 Control Operator, a conforming application must separate two adjacent parentheses with white space to indicate nested subshells.

The standard is intentionally silent about how a variable's numeric value in an expression is determined from its normal "sequence of bytes" value. It could be done as a text substitution, as a conversion like that performed by strtol(), or even recursive evaluation. Therefore, the only cases for which the standard is clear are those for which both conversions produce the same result. The cases where they give the same result are those where the sequence of bytes form a valid integer constant. Therefore, if a variable does not contain a valid integer constant, the behavior is unspecified.

For the commands:

x=010; echo $((x += 1))

the output must be 9.

For the commands:

x=' 1'; echo $((x += 1))

the results are unspecified.

For the commands:

x=1+1; echo $((x += 1))
the results are unspecified.

Although the ISO C standard requires support for long long and allows extended integer types with higher ranks, POSIX.1-2024 only requires arithmetic expansions to support signed long integer arithmetic. Implementations are encouraged to support signed integer values at least as large as the size of the largest file allowed on the implementation.

Implementations are also allowed to perform floating-point evaluations as long as an application won't see different results for expressions that would not overflow signed long integer expression evaluation. (This includes appropriate truncation of results to integer values.)

Changes made in response to IEEE PASC Interpretation 1003.2 #208 removed the requirement that the integer constant suffixes l and L had to be recognized. The ISO POSIX-2:1993 standard did not require the u, ul, uL, U, Ul, UL, lu, lU, Lu, and LU suffixes since only signed integer arithmetic was required. Since all arithmetic expressions were treated as handling signed long integer types anyway, the l and L suffixes were redundant. No known scripts used them and some historic shells did not support them. When the ISO/IEC 9899:1999 standard was used as the basis for the description of arithmetic processing, the ll and LL suffixes and combinations were also not required. Implementations are still free to accept any or all of these suffixes, but are not required to do so.

There was also some confusion as to whether the shell was required to recognize character constants. Syntactically, character constants were required to be recognized, but the requirements for the handling of <backslash> and single-quote characters (needed to specify character constants) within an arithmetic expansion were ambiguous. Furthermore, no known shells supported them. Changes made in response to IEEE PASC Interpretation 1003.2 #208 removed the requirement to support them (if they were indeed required before). POSIX.1-2024 clearly does not require support for character constants.

IEEE Std 1003.1-2001/Cor 2-2004, item XCU/TC2/D6/3 is applied, clarifying arithmetic expressions.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0020 [50] is applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0015 [584] is applied.

C.2.6.5 Field Splitting

The operation of field splitting using IFS , as described in early proposals, was based on the way the KornShell splits words, but it is incompatible with other common versions of the shell. However, each has merit, and so a decision was made to allow both. If the IFS variable is unset or is <space><tab><newline>, the operation is equivalent to the way the System V shell splits words. Using characters outside the <space><tab><newline> set yields the KornShell behavior, where each of the non-<space><tab><newline>s is significant. This behavior, which affords the most flexibility, was taken from the way the original awk handled field splitting.

The different handling of white space and non-white-space characters in IFS can be summarized as a pseudo-ERE:

(s*ns*|s+)

where s is an IFS white-space character and n is a character in the IFS that is not white space. Any string matching that ERE delimits a field, except that the s+ form does not delimit fields at the beginning or the end of a line. For example, if IFS is <space>/<comma>/<tab>, the string:

<space><space>red<space><space>,<space>white<space>blue

yields the three colors as the delimited fields.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0016 [832] is applied.

Austin Group Defect 1123 is applied, clarifying the requirements if no fields are delimited.

Austin Group Defect 1560 is applied, clarifying that the results of word expansions are treated as a sequences of bytes when searching for (bytes that form) IFS characters.

Austin Group Defect 1649 is applied, clarifying how field splitting is performed.

C.2.6.6 Pathname Expansion

There is no additional rationale provided for this section.

C.2.6.7 Quote Removal

The golden rule in quote removal is that if a quote character was treated as special in the original word, it is removed; if it was treated as a literal character, it is not removed.

Austin Group Defect 221 is applied, clarifying the conditions under which quote characters are, or are not, removed.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

C.2.7 Redirection

In the System Interfaces volume of POSIX.1-2024, file descriptors are integers in the range 0-({OPEN_MAX}-1). The file descriptors discussed in XCU 2.7 Redirection are that same set of small integers.

Having multi-digit file descriptor numbers for I/O redirection can cause some obscure compatibility problems. Specifically, scripts that depend on an example command:

echo 22>/dev/null

echoing "2" to standard error or "22" to standard output are no longer portable. However, the file descriptor number must still be delimited from the preceding text. For example:

cat file2>foo

writes the contents of file2, not the contents of file.

The limitation to 9 file descriptors is overcome in some shells via a form of redirection whereby a shell variable stores the file descriptor number. For example:

exec {fdvar}> foo

opens the file foo on a file descriptor greater than 9 and stores the file descriptor number in shell variable fdvar. (This can later be closed using exec {fdvar}>&-.) This form of redirection is allowed but not required by this standard.

The ">|" format of output redirection was adopted from the KornShell. Along with the noclobber option, set -C, it provides a safety feature to prevent inadvertent overwriting of existing files. (See the RATIONALE for the pathchk utility for why this step was taken.) The restriction on regular files is historical practice.

The System V shell and the KornShell have differed historically on pathname expansion of word; the former never performed it, the latter only when the result was a single field (file). As a compromise, it was decided that the KornShell functionality was useful, but only as a shorthand device for interactive users. No reasonable shell script would be written with a command such as:

cat foo > a*

Thus, shell scripts are prohibited from doing it, while interactive users can select the shell with which they are most comfortable.

The construct "2>&1" is often used to redirect standard error to the same file as standard output. Since the redirections take place beginning to end, the order of redirections is significant. For example:

ls > foo 2>&1

directs both standard output and standard error to file foo. However:

ls 2>&1 > foo

only directs standard output to file foo because standard error was duplicated as standard output before standard output was directed to file foo.

Applications should not use the [n]<&- or [n]>&- operators to execute a utility or application with file descriptor 0 not open for reading or with file descriptor 1 or 2 not open for writing, as this might cause the executed program (or shell built-in) to misbehave. In order not to pass on these file descriptors to an executed utility or application, applications should not just close them but should reopen them on, for example, /dev/null. Some implementations may reopen them automatically, but applications should not rely on this being done.

The "<>" operator could be useful in writing an application that worked with several terminals, and occasionally wanted to start up a shell. That shell would in turn be unable to run applications that run from an ordinary controlling terminal unless it could make use of "<>" redirection. The specific example is a historical version of the pager more, which reads from standard error to get its commands, so standard input and standard output are both available for their usual usage. There is no way of saying the following in the shell without "<>":

cat food | more - >/dev/tty03 2<>/dev/tty03

Another example of "<>" is one that opens /dev/tty on file descriptor 3 for reading and writing:

exec 3<> /dev/tty

An example of creating a lock file for a critical code region:

set -C
until    2> /dev/null > lockfile
do       sleep 30
done
set +C
perform critical function
rm lockfile

Since /dev/null is not a regular file, no error is generated by redirecting to it in noclobber mode.

Tilde expansion is not performed on a here-document because the data is treated as if it were enclosed in double-quotes.

Austin Group Defect 1193 is applied, adding the optional redirection form {location}redir-op word.

Austin Group Defect 1232 is applied, clarifying the allowed behaviors in an interactive shell when pathname expansion on the word following a redirection operator would result in more than one word.

Austin Group Defect 1493 is applied, moving some information from this section to the definition of "file descriptor" in XBD 3.141 File Descriptor.

C.2.7.1 Redirecting Input

There is no additional rationale provided for this section.

C.2.7.2 Redirecting Output

Earlier versions of this standard did not require redirection using '>' when noclobber is set to perform the file creation step as an atomic operation. Historical shells just called stat() to check if a regular file existed and then called creat(). The operation thus involved a race condition which meant that it could not be used for reliable creation of lock files. Many shell implementations improved on this by using open() with the O_CREAT and O_EXCL flags set as one step in a multi-step process which still meant that an existing non-regular file (for example /dev/null, /dev/tty, or a FIFO) was opened successfully. However, the methods employed still involved a race condition and could produce misleading diagnostics if there is concurrent creation or removal of files.

An ideal solution would be an O_NOCLOBBER flag for open() which the shell could use in order to perform the entire operation atomically, and implementations are encouraged to adopt this solution, adding the flag as described in the FUTURE DIRECTIONS section of open, and using it in the implementation's POSIX shell and in other shells. Authors of portable shells should make use of #ifdef O_NOCLOBBER so that it is used on implementations that provide it.

If O_NOCLOBBER is not used, shells can use one of the following methods:

  1. The "stat first" method.
    1. Call stat() and if the file exists and is a regular file, the redirection fails. Otherwise:
    2. Call open() without O_CREAT or O_TRUNC to open an existing file. If the open succeeds, use fstat() to check whether the opened file is a regular file. If it is, close it and fail the redirection. If it is a non-regular file, the redirection succeeds. Otherwise:
    3. Call open() with O_CREAT|O_EXCL. The redirection succeeds or fails depending on whether the open succeeds or fails.
  2. The "exclusive create first" method.
    1. Call open() with O_CREAT|O_EXCL. If the open succeeds, the redirection succeeds. If the open fails with [EMFILE] or [ENFILE], use stat() to check whether a regular file exists; if it does, fail the redirection. Otherwise:
    2. Call open() without O_CREAT or O_TRUNC to open an existing file. If the open succeeds, use fstat() to check whether the opened file is a regular file. If it is, close it and fail the redirection. If it is a non-regular file, the redirection succeeds. If the second open fails, the redirection fails with a diagnostic based on the errno value set by the first open.

    (A minor variation of this method could also be used whereby step 2.b is only done if the open() in step 2.a fails with [EEXIST].)

Method 1 is in widespread use. Method 2 has not been observed exactly as described, although an implementation which omits the stat() in step 2.a has been observed. Without the stat(), this method has a problem in that if a regular file exists but the open() fails with [EMFILE] or [ENFILE] instead of [EEXIST] (which is to be expected if those conditions exist, because detecting [EEXIST] is more expensive), then the shell will give an incorrect diagnostic. (Reporting that no file descriptors are available implies that a non-regular file exists, because the shell tried to open the file and it is not supposed to open an existing regular file.)

A variant of method 1 which omits the initial stat() call has also been observed; this has the same problem with [EMFILE] and [ENFILE]. With the stat(), this misleading diagnostic can also happen, but only if a regular file is created in the timing window between steps 1.a and 1.b, which makes it an allowed case. (The standard allows a misleading diagnostic when there is concurrent creation or removal of files.)

Both methods have cases where a misleading diagnostic is given when a non-regular file is concurrently created or removed. With method 1 it occurs if no file exists at steps 1.a and 1.b, and a non-regular file is created before step 1.c. With method 2 it occurs if a non-regular file exists at step 2.a and is removed before step 2.b. (In both cases, the diagnostic misleadingly implies that a regular file exists).

Both methods differ from historical shell behavior in that the redirection fails if there is an existing symbolic link whose target does not exist, instead of the link's target being created as a regular file. The standard developers consider reliable lock file creation to be more important than the creation of symbolic link targets.

Creation of lock files and unique (often temporary) files with noclobber set is only reliable provided neither non-regular files nor symbolic links to non-regular files exist or are created in the same directory with the same names, and no other processes delete the files while still in use. If a directory such as /tmp is used for lock files, then another process could accidentally or maliciously create a FIFO (or a special file, given sufficient privilege) with the same name, causing multiple processes to simultaneously open the same lock file instead of one succeeding and the others failing.

Austin Group Defects 1016 and 1364 are applied, changing the requirements when the noclobber option is set.

C.2.7.3 Appending Redirected Output

Note that when a file is opened (even with the O_APPEND flag set), the initial file offset for that file is set to the beginning of the file. Some historic shells set the file offset to the current end-of-file when append mode shell redirection was used, but this is not allowed by POSIX.1-2024.

Austin Group Defect 1016 is applied, changing "with the O_APPEND flag" to "with the O_APPEND flag set".

C.2.7.4 Here-Document

Historical shell behavior was to treat the end of input as being equivalent to the delimiter of a here-document, terminating the here-document, usually without any indication, and continuing as if the delimiter had been recognized. This can cause problems where the delimiter had been intended to occur much earlier in the script, but was incorrectly entered—a mistake which for many other errors would have resulted in a syntax error, and an aborted script, instead simply generates incorrect results. Because of this some shell implementations have changed to reporting an undelimited here-document as a syntax error. Other implementations are encouraged to do the same.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0017 [890], XCU/TC2-2008/0018 [583], and XCU/TC2-2008/0019 [580] are applied.

Austin Group Defect 1036 is applied, clarifying how here-documents are parsed.

Austin Group Defect 1411 is applied, adding a paragraph break.

C.2.7.5 Duplicating an Input File Descriptor

The file descriptor duplication redirection operators, [n]<&word and [n]>&word, make a copy of one file descriptor as another. If the operation is successful, the new file descriptor has the same access mode as the source (old) file descriptor, because the access mode is determined by the open file description to which both file descriptors point. To avoid a redirection error, applications need to ensure that they use the appropriate redirection operator for the access mode of the file descriptor being duplicated.

Austin Group Defect 1536 is applied, making it optional whether attempting to duplicate an open file descriptor that is not open for input results in a redirection error.

C.2.7.6 Duplicating an Output File Descriptor

See C.2.7.5 Duplicating an Input File Descriptor.

Austin Group Defect 1536 is applied, making it optional whether attempting to duplicate an open file descriptor that is not open for output results in a redirection error.

C.2.7.7 Open File Descriptors for Reading and Writing

There is no additional rationale provided for this section.

C.2.8 Exit Status and Errors

There is no additional rationale provided for this section.

C.2.8.1 Consequences of Shell Errors

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0020 [882] and XCU/TC2-2008/0021 [717,882] are applied.

Austin Group Defect 914 is applied, requiring that the shell does not exit when a redirection error occurs with compound commands or with function execution.

Austin Group Defect 1427 is applied, changing this section to account for the effect of the command utility when it is used to execute a special built-in utility.

Austin Group Defect 1629 is applied, requiring that the shell exits if an unrecoverable read error occurs when reading commands.

C.2.8.2 Exit Status for Commands

There is a historical difference in sh and ksh non-interactive error behavior. When a command named in a script is not found, some implementations of sh exit immediately, but ksh continues with the next command. Thus, the Shell and Utilities volume of POSIX.1-2024 says that the shell "may" exit in this case. This puts a small burden on the programmer, who has to test for successful completion following a command if it is important that the next command not be executed if the previous command was not found. If it is important for the command to have been found, it was probably also important for it to complete successfully. The test for successful completion would not need to change.

Historically, shells have returned an exit status of 128+n, where n represents the signal number. Since signal numbers are not standardized, there is no portable way to determine which signal caused the termination. Also, it is possible for a command to exit with a status in the same range of numbers that the shell would use to report that the command was terminated by a signal. Implementations are encouraged to choose exit values greater than 256 to indicate programs that terminate by a signal so that the exit status cannot be confused with an exit status generated by a normal termination. However, the use of exit values greater than 256 poses a problem for the shell's own exit status. Historically this was the exit status of the last command invoked by the shell, but if the last command was terminated by a signal and was assigned an exit status greater than 256 by the shell, this value would be truncated to eight bits in the shell's exit status. Likewise truncation would occur with use of

exit $?

or

ret=$?
....
exit $ret

in shell scripts. To avoid this truncation, shells which assign exit statuses greater than 256 are required to propagate the wait status of the last command to the shell's own wait status (by sending itself the same signal), and to handle exit values greater than 256 passed to the exit builtin by mimicking the wait status that would give rise to assignment of that exit status in the shell. Note that this requirement does not apply to signals that do not cause termination, such as SIGCHLD, since the shell can never actually assign a corresponding exit status greater than 256, and the requirement is worded in terms of this assignment.

Historical shells make the distinction between "utility not found" and "utility found but cannot execute" in their error messages. By specifying two seldomly used exit status values for these cases, 127 and 126 respectively, this gives an application the opportunity to make use of this distinction without having to parse an error message that would probably change from locale to locale. The command, env, nohup, and xargs utilities in the Shell and Utilities volume of POSIX.1-2024 have also been specified to use this convention.

When a command fails during word expansion or redirection, most historical implementations exit with a status of 1. However, there was some sentiment that this value should probably be much higher so that an application could distinguish this case from the more normal exit status values. Thus, the language "greater than zero" was selected to allow either method to be implemented.

If a C application calls exit(256), the command's exit status in the shell becomes zero due to the modulo 256 operation. Since zero is interpreted as "true" or "success" for if statements, AND and OR lists, set -e, and so on, applications should be careful to avoid exiting with a value that is a multiple of 256 unless the value is intended to be interpreted as true or success.

To avoid ambiguity caused by the modulo 256 operation, applications are encouraged to avoid using a count or the result of a computation as the exit value unless the value is guaranteed to be non-negative and less than 256.

The ambiguity caused by the modulo 256 operation is unfortunate, but required due to historical implementation behavior. A future version of this standard may change the definition of exit status to remove the modulo 256 requirement and use all bits of the value passed to exit() (or equivalent), and may introduce a way to select whether the special parameter '?' contains the exit status modulo 256 or the full exit status.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0022 [717] is applied.

Austin Group Defect 51 is applied, clarifying the exit status when a command is terminated due to the receipt of a signal.

Austin Group Defect 947 is applied, clarifying the exit status of commands.

C.2.9 Shell Commands

A description of an "empty command" was removed from an early proposal because it is only relevant in the cases of sh -c "", system(""), or an empty shell-script file (such as the implementation of true on some historical systems). Since it is no longer mentioned in the Shell and Utilities volume of POSIX.1-2024, it falls into the silently unspecified category of behavior where implementations can continue to operate as they have historically, but conforming applications do not construct empty commands. (However, note that sh does explicitly state an exit status for an empty string or file.) In an interactive session or a script with other commands, extra <newline> or <semicolon> characters, such as:

$ false
$
$ echo $?
1

would not qualify as the empty command described here because they would be consumed by other parts of the grammar.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0023 [473] is applied.

C.2.9.1 Simple Commands

Austin Group Defect 1224 is applied, correcting a mismatch between the description of simple commands and the formal simple_command grammar.

Austin Group Defect 1227 is applied, inserting additional subsection headings.

Order of Processing

The enumerated list is used only when the command is actually going to be executed. For example, in:

true || $foo *

no expansions are performed.

Expansion of words in an assignment context following the command name can only occur for declaration utilities, and only when the word can be used as a variable assignment in isolation.

For example, this code sequence exports the single variable a with the value "1 b=2", but invokes make with the macro a set to '1' and b set to '2', since make is not a declaration utility:

set '1 b=2'
export a=$1
make a=$1

Conversely, this code sequence exports two variables, a set to '1' and b set to '2', because the use of quoting means that the word could not be recognized as a variable assignment, and regular expansion rules require that field splitting occurs on the unquoted expansion of $1:

set '1 b=2'
export \a=$1

Likewise, this code sequence will not be parsed in assignment context, but is still required to export the variable named foo with the value '1':

var=foo
export $var=1

Implementations are permitted to provide extensions that serve as declaration utilities, such as typeset or local, or even a way to define a function that can behave as a declaration utility.

Declaration utilities are only required to be recognized via lexical analysis; if any expansions are required before the command name is known, or before the first argument to the command utility is known, then it is unspecified whether subsequent arguments will be treated with an assignment context during expansion. For example, it is unspecified whether

var=export; $var a=~

sets the variable a to a literal <tilde> or to the value of $HOME, since lexical analysis sees "$var" rather than "export" as the command name.

Austin Group Defects 351 and 1535 are applied, adding requirements relating to declaration utilities.

Variable Assignments

The following example illustrates both how a variable assignment without a command name affects the current execution environment, and how an assignment with a command name only affects the execution environment of the command:

$ x=red
$ echo $x
red
$ export x
$ sh -c 'echo $x'
red
$ x=blue sh -c 'echo $x'
blue
$ echo $x
red

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0021 [255] is applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0024 [654] is applied.

Austin Group Defect 1009 is applied, clarifying the behavior when a special built-in utility is executed with a variable assignment.

Commands with no Command Name

This next example illustrates that redirections without a command name are still performed:

$ ls foo
ls: foo: no such file or directory
$ > foo
$ ls foo
foo

A command without a command name, but one that includes a command substitution, has an exit status of the last command substitution that the shell performed. For example:

if      x=$(command)
then    ...
fi

An example of redirections without a command name being performed in a subshell shows that the here-document does not disrupt the standard input of the while loop:

IFS=:
while    read a b
do       echo $a
         <<-eof
         Hello
         eof
done </etc/passwd

Following are examples of commands without command names in AND-OR lists:

> foo || {
    echo "error: foo cannot be created" >&2
    exit 1
}

# set saved if /vmunix.save exists test -f /vmunix.save && saved=1

Command substitution and redirections without command names both occur in subshells, but they are not necessarily the same ones. For example, in:

exec 3> file
var=$(echo foo >&3) 3>&1

it is unspecified whether foo is echoed to the file or to standard output.

Austin Group Defect 1150 is applied, clarifying the exit status of a command that has no command name and has more than one command substitution.

Command Search and Execution

This description requires that the shell can execute shell scripts directly, even if the underlying system does not support the common "#!" interpreter convention. That is, if file foo contains shell commands and is executable, the following executes foo:

./foo

The command search shown here does not match all historical implementations. A more typical sequence has been:

But there are problems with this sequence. Since the programmer has no idea in advance which utilities might have been built into the shell, a function cannot be used to override portably a utility of the same name. (For example, a function named cd cannot be written for many historical systems.) Furthermore, the PATH variable is partially ineffective in this case, and only a pathname with a <slash> can be used to ensure a specific executable file is invoked.

After the execve() failure described, the shell normally executes the file as a shell script. Some implementations, however, attempt to detect whether the file is actually a script and not an executable from some other architecture. The method used by the KornShell is allowed by the text that indicates non-text files may be bypassed.

The sequence selected for the Shell and Utilities volume of POSIX.1-2024 acknowledges that special built-ins cannot be overridden, but gives the programmer full control over which versions of other utilities are executed (with some exceptions). It provides a means of suppressing function lookup (via the command utility) for the user's own functions and, with the exception of the intrinsic utilities (see XCU 1.7 Intrinsic Utilities), ensures that any regular built-ins or functions provided by the implementation are under the control of the path search. The mechanisms for associating non-intrinsic built-ins or functions with executable files in the path are not specified by the Shell and Utilities volume of POSIX.1-2024, but the wording requires that if either is implemented, the application is not able to distinguish a function or built-in from an executable (other than in terms of performance, presumably). The implementation ensures that all effects specified by the Shell and Utilities volume of POSIX.1-2024 resulting from the invocation of the regular built-in or function (interaction with the environment, variables, traps, and so on) are identical to those resulting from the invocation of an executable file.

Various historical implementations have used the names in item 1.b. as built-ins or reserved words. This standard does not specify their behavior, but their existence means that it is important for portable applications to avoid giving functions (or utilities in PATH ) those names because the function (or utility in PATH ) might not be executed as expected.

IEEE Std 1003.1-2001/Cor 2-2004, item XCU/TC2/D6/4 is applied, updating the case where execve() fails due to an error equivalent to the [ENOEXEC] error.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0022 [168], XCU/TC1-2008/0023 [168], XCU/TC1-2008/0024 [168], XCU/TC1-2008/0025 [168], XCU/TC1-2008/0026 [168,430], XCU/TC1-2008/0027 [168,430], and XCU/TC1-2008/0028 [173] are applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0025 [935] and XCU/TC2-2008/0026 [705] are applied.

Austin Group Defect 465 is applied, adding compound, enum, float, integer, and nameref to the table of command names for which the results are unspecified.

Austin Group Defect 854 is applied, adding intrinsic utilities.

Austin Group Defect 1391 is applied, clarifying the execution of a standard utility provided by the implementation in the form of a function.

Standard File Descriptors

There is no additional rationale provided for this section.

Non-built-in Utility Execution

Austin Group Defect 1157 is applied, clarifying the execution of non-built-in utilities.

Austin Group Defects 1226 and 1435 are applied, clarifying the circumstances under which the shell may bypass execution of a non-built-in utility as a shell script.

Examples

Consider three versions of the ls utility:

  1. The application includes a shell function named ls.
  2. The user writes a utility named ls and puts it in /fred/bin.
  3. The example implementation provides ls as a regular shell built-in that is invoked (either by the shell or directly by exec) when the path search reaches the directory /posix/bin.

If PATH =/posix/bin, various invocations yield different versions of ls:

Invocation

Version of ls

ls (from within application script)

(1) function

command ls (from within application script)

(3) built-in

ls (from within makefile called by application)

(3) built-in

system("ls")

(3) built-in

PATH="/fred/bin:$PATH" ls

(2) user's version

C.2.9.2 Pipelines

Because pipeline assignment of standard input or standard output or both takes place before redirection, it can be modified by redirection. For example:

$ command1 2>&1 | command2

sends both the standard output and standard error of command1 to the standard input of command2.

The reserved word ! allows more flexible testing using AND and OR lists. The behavior of !( is unspecified because in the Korn Shell this introduces a negated pathname expansion. Portable applications need to separate the ! and ( to ensure the command is treated as a negated subshell.

It was suggested that it would be better to return a non-zero value if any command in the pipeline terminates with non-zero status (perhaps the bitwise-inclusive OR of all return values). However, the choice of the last-specified command semantics are historical practice and would cause applications to break if changed. An example of historical behavior:

$ sleep 5 | (exit 4)
$ echo $?
4
$ (exit 4) | sleep 5
$ echo $?
0

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0029 [205] is applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0027 [521] is applied.

Exit Status

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0030 [52] is applied.

Austin Group Defect 789 is applied, adding the pipefail option.

C.2.9.3 Lists

The equal precedence of "&&" and "||" is historical practice. The standard developers evaluated the model used more frequently in high-level programming languages, such as C, to allow the shell logical operators to be used for complex expressions in an unambiguous way, but they could not allow historical scripts to break in the subtle way unequal precedence might cause. Some arguments were posed concerning the "{}" or "()" groupings that are required historically. There are some disadvantages to these groupings:

IEEE PASC Interpretation 1003.2 #204 is applied, clarifying that the operators "&&" and "||" are evaluated with left associativity.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0031 [45] and XCU/TC1-2008/0032 [45] are applied.

Asynchronous AND-OR Lists

Unless the implementation has an internal limit, such as {CHILD_MAX}, on the retained process IDs, it would require unbounded memory for the following example:

while true
do      foo & echo $!
done

The treatment of the signals SIGINT and SIGQUIT with asynchronous AND-OR lists is described in XCU 2.12 Signals and Error Handling.

Since the connection of the input to the equivalent of /dev/null is considered to occur before redirections, the following script would produce no output:

exec < /etc/passwd
cat <&0 &
wait

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0028 [760] is applied.

Austin Group Defect 1254 is applied, replacing the Asynchronous Lists section with an Asynchronous AND-OR Lists section.

Sequential AND-OR Lists

Austin Group Defect 1254 is applied, replacing the Sequential Lists section with a Sequential AND-OR Lists section.

AND Lists

There is no additional rationale provided for this section.

OR Lists

There is no additional rationale provided for this section.

C.2.9.4 Compound Commands
Austin Group Defect 1309 is applied, clarifying the exit status of the for, case, if, while, and until compound commands.
Grouping Commands

The semicolon shown in {compound-list;} is an example of a control operator delimiting the } reserved word. Other delimiters are possible, as shown in XCU 2.10 Shell Grammar ; <newline> is frequently used.

A proposal was made to use the <do-done> construct in all cases where command grouping in the current process environment is performed, identifying it as a construct for the grouping commands, as well as for shell functions. This was not included because the shell already has a grouping construct for this purpose ("{}"), and changing it would have been counter-productive.

The requirement for conforming applications to separate two leading '(' characters with white space if a grouping command would be parsed as an arithmetic expansion if preceded by a '$' is to allow shells which implement the "(( arithmetic expression ))" extension to apply the same disambiguation rules consistently to $((...)) and ((...)). See C.2.6.3 Command Substitution.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0033 [217] is applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0029 [473] is applied.

For Loop

The format is shown with generous usage of <newline> characters. See the grammar in XCU 2.10 Shell Grammar for a precise description of where <newline> and <semicolon> characters can be interchanged.

Some historical implementations support '{' and '}' as substitutes for do and done. The standard developers chose to omit them, even as an obsolescent feature. (Note that these substitutes were only for the for command; the while and until commands could not use them historically because they are followed by compound-lists that may contain "{...}" grouping commands themselves.)

The reserved word pair do ... done was selected rather than do ... od (which would have matched the spirit of if ... fi and case ... esac) because od is already the name of a standard utility.

PASC Interpretation 1003.2 #169 has been applied changing the grammar.

Case Conditional Construct

An optional <left-parenthesis> before pattern was added to allow numerous historical KornShell scripts to conform. At one time, using the leading parenthesis was required if the case statement was to be embedded within a "$()" command substitution; this is no longer the case with the POSIX shell. Nevertheless, many historical scripts use the <left-parenthesis>, if only because it makes matching-parenthesis searching easier in vi and other editors. This is a relatively simple implementation change that is upwards-compatible for all scripts.

Consideration was given to requiring break inside the compound-list to prevent falling through to the next pattern action list. This was rejected as being nonexisting practice. Instead, the standard now requires a feature first added in KornShell that using ";&" instead of ";;" as a terminator causes the exact opposite behavior—the flow of control continues with the next compound-list.

Although the standard is explicit that the order of side-effects due to pattern expansion within a single clause is unspecified, it is clear that patterns are expanded in clause order, and that no further pattern expansions are attempted after the first match. That is, the following example is required to output "1.0":

x=0 y=1
case 1 in
  $((y=0)) ) ;;
  $((x=1)) ) ;&
  $((x=2)) ) echo $x.$y ;;
esac

Some implementations of the shell also allow ";;&" as a terminator which falls through to the next matching pattern (regardless of the choice of terminator in any intermediate non-matching clauses), in contrast to ";&" falling through to the next clause (regardless of the pattern guarding that clause). This is an allowed extension, but is not required by the standard at this time.

The pattern '*', given as the last pattern in a case construct, is equivalent to the default case in a C-language switch statement.

The grammar shows that reserved words can be used as patterns, even if one is the first word on a line. Obviously, the reserved word esac cannot be used in this manner.

Some historical shells would fall back to doing a byte to byte comparison with each pattern if the pattern matching rules did not produce a match. That behavior is not allowed by this standard because it allows user input to bypass input validations like:

case $1 in
  [0123456789]) : OK;;
  *) echo >&2 not a decimal digit; exit 1;;
esac

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0029 [473] is applied.

Austin Group Defect 449 is applied, adding ;& as a case clause terminator.

Austin Group Defect 1454 is applied, clarifying that a case statement with no patterns is valid syntax.

If Conditional Construct

The precise format for the command syntax is described in XCU 2.10 Shell Grammar.

While Loop

The precise format for the command syntax is described in XCU 2.10 Shell Grammar.

Until Loop

The precise format for the command syntax is described in XCU 2.10 Shell Grammar.

C.2.9.5 Function Definition Command

The description of functions in an early proposal was based on the notion that functions should behave like miniature shell scripts; that is, except for sharing variables, most elements of an execution environment should behave as if they were a new execution environment, and changes to these should be local to the function. For example, traps and options should be reset on entry to the function, and any changes to them do not affect the traps or options of the caller. There were numerous objections to this basic idea, and the opponents asserted that functions were intended to be a convenient mechanism for grouping common commands that were to be executed in the current execution environment, similar to the execution of the dot special built-in.

It was also pointed out that the functions described in that early proposal did not provide a local scope for everything a new shell script would, such as the current working directory, or umask, but instead provided a local scope for only a few select properties. The basic argument was that if a local scope is needed for the execution environment, the mechanism already existed: the application can put the commands in a new shell script and call that script. All historical shells that implemented functions, other than the KornShell, have implemented functions that operate in the current execution environment. Because of this, traps and options have a global scope within a shell script. Local variables within a function were considered and included in another early proposal (controlled by the special built-in local), but were removed because they do not fit the simple model developed for functions and because there was some opposition to adding yet another new special built-in that was not part of historical practice. Implementations should reserve the identifier local (as well as typeset, as used in the KornShell) in case this local variable mechanism is adopted in a future version of this standard.

A separate issue from the execution environment of a function is the availability of that function to child shells. A few objectors maintained that just as a variable can be shared with child shells by exporting it, so should a function. In early proposals, the export command therefore had a -f flag for exporting functions. Functions that were exported were to be put into the environment as name()=value pairs, and upon invocation, the shell would scan the environment for these and automatically define these functions. This facility was strongly opposed and was omitted. Some of the arguments against exportable functions were as follows:

As far as can be determined, the functions in the Shell and Utilities volume of POSIX.1-2024 match those in System V. Earlier versions of the KornShell had two methods of defining functions:

function fname { compound-list }

and:

fname() { compound-list }

The latter used the same definition as the Shell and Utilities volume of POSIX.1-2024, but differed in semantics, as described previously. The current edition of the KornShell aligns the latter syntax with the Shell and Utilities volume of POSIX.1-2024 and keeps the former as is.

Some shells accept simple commands (see XCU 2.9.1 Simple Commands) after fname() in addition to compound commands (see XCU 2.9.4 Compound Commands); however this standard only requires support for compound commands.

The name space for functions is limited to that of a name because of historical practice. Complications in defining the syntactic rules for the function definition command and in dealing with known extensions such as the "@()" usage in the KornShell prevented the name space from being widened to a word. Using functions to support synonyms such as the "!!" and '%' usage in the C shell is thus disallowed to conforming applications, but acceptable as an extension. For interactive users, the aliasing facilities in the Shell and Utilities volume of POSIX.1-2024 should be adequate for this purpose. It is recognized that the name space for utilities in the file system is wider than that currently supported for functions, if the portable filename character set guidelines are ignored, but it did not seem useful to mandate extensions in systems for so little benefit to conforming applications.

The "()" in the function definition command consists of two operators. Therefore, intermixing <blank> characters with the fname, '(', and ')' is allowed, but unnecessary.

An example of how a function definition can be used wherever a simple command is allowed:

# If variable i is equal to "yes",
# define function foo to be ls -l
#
[ "$i" = yes ] && foo() {
    ls -l
}

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0034 [383] and XCU/TC1-2008/0035 [214] are applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0029 [473] and XCU/TC2-2008/0030 [654] are applied.

C.2.10 Shell Grammar

There are several subtle aspects of this grammar where conventional usage implies rules about the grammar that in fact are not true.

For compound_list, only the forms that end in a separator allow a reserved word to be recognized, so usually only a separator can be used where a compound list precedes a reserved word (such as Then, Else, Do, and Rbrace). Explicitly requiring a separator would disallow such valid (if rare) statements as:

if (false) then (echo x) else (echo y) fi

See the Note under special grammar rule (1).

Concerning the third sentence of rule (1) ("Also, if the parser ..."):

Note that the body of here-documents are handled by token recognition (see XCU 2.3 Token Recognition) and do not appear in the grammar directly. (However, the here-document I/O redirection operator is handled as part of the grammar.)

The optional redirection syntax:

{location}redir-op word

(see XCU 2.7 Redirection) is accommodated in the grammar rules by the optional IO_LOCATION token identifier and two correspondingly optional elements in io_redirect. Without these, the grammar would not permit this form of redirection because it would require that, for example, echo {var}> foo is parsed such that {var} is a WORD to be expanded and passed to echo. The grammar does not restrict the location given between the '{' and '}' in these forms (other than requiring it to be non-empty) since shells may parse an invalid location as part of an io_redirect and later treat the invalid location as an error.

C.2.10.1 Shell Grammar Lexical Conventions

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0031 [648] and XCU/TC2-2008/0032 [574,646] are applied.

Austin Group Defect 1193 is applied, adding the optional IO_LOCATION token identifier.

Austin Group Defect 1454 is applied, clarifying how to convert the token identifier type of the TOKEN when rule 1 applies.

C.2.10.2 Shell Grammar Rules

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0036 [44] is applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0033 [643,839], XCU/TC2-2008/0034 [643], XCU/TC2-2008/0035 [648], XCU/TC2-2008/0036 [736], XCU/TC2-2008/0037 [737], XCU/TC2-2008/0038 [581], and XCU/TC2-2008/0039 [735] are applied.

Austin Group Defect 249 is applied, adding the dollar-single-quotes quoting mechanism.

Austin Group Defect 449 is applied, adding ;& as a case clause terminator.

Austin Group Defect 1193 is applied, adding the optional IO_LOCATION token identifier.

Austin Group Defects 1276 and 1279 are applied, clarifying rule 7.

Austin Group Defect 1454 is applied, clarifying how rule 4 applies.

C.2.11 Job Control

See also Job Control.

Shell implementations differ regarding how much of a foreground job is retained when it is converted to a suspended job. For example, given this foreground job:

sleep 10; echo foo; echo bar &

if this is suspended during execution of the sleep, ksh93 retains all of the commands in the suspended job and executes them when fg is used:

^Z[1] + Stopped                  sleep 10; echo foo; echo bar &
$ jobs
[1] + Stopped                  sleep 10; echo foo; echo bar &
$ fg
sleep 10; echo foo; echo bar
foo
[1]     30686
bar
$

However, some other shells create a suspended job containing only the sleep 10 command.

Some historical shells did not handle suspending a foreground AND-OR list well. They would treat the wait status of a process that indicated it had stopped as if it was a non-zero exit status and (if the next operator in the AND-OR list was ||) would execute the remainder of the AND-OR list at that point. This behavior is not allowed by the standard for two reasons:

  1. It does not meet the fundamental requirement of an AND-OR list that the decision on whether to execute each part (except the first) is made based on the exit status of the previous part when it completes.
  2. It can lead to data loss. For example, consider a user who often runs this command:
    generate_report > report.out || rm report.out
    

    with the intention that the incomplete results from a failed generate_report run are never retained in order that they cannot be mistaken for a complete set of results. If one day the user decides to check on the progress of the command by stopping it and examining what has been written so far, they will find that the report.out file has already been removed.

Austin Group Defects 1254 and 1675 are applied, adding this section.

C.2.12 Signals and Error Handling

Historically, some shell implementations silently ignored attempts to use trap to set SIGINT or SIGQUIT to the default action or to set a trap for them after they have been set to be ignored by the shell when it executes an asynchronous subshell (and job control is disabled). This behavior is not conforming. For example, if a shell script containing the following line is run in the foreground at a terminal:

(trap - INT; exec sleep 10) & wait

and is then terminated by typing the interrupt character, this standard requires that the sleep command is terminated by the SIGINT signal.

SD5-XCU-ERN-93 is applied, updating the first paragraph of XCU 2.12 Signals and Error Handling.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0040 [750] is applied.

C.2.13 Shell Execution Environment

Some implementations have implemented the last stage of a pipeline in the current environment so that commands such as:

command | read foo

set variable foo in the current environment. This extension is allowed, but not required; therefore, a shell programmer should consider a pipeline to be in a subshell environment, but not depend on it.

In early proposals, the description of execution environment failed to mention that each command in a multiple command pipeline could be in a subshell execution environment. For compatibility with some historical shells, the wording was phrased to allow an implementation to place any or all commands of a pipeline in the current environment. However, this means that a POSIX application must assume each command is in a subshell environment, but not depend on it.

The wording about shell scripts is meant to convey the fact that describing "trap actions" can only be understood in the context of the shell command language. Outside of this context, such as in a C-language program, signals are the operative condition, not traps.

POSIX.1-2008, Technical Corrigendum 1, XCU/TC1-2008/0037 [238] is applied.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0041 [706] is applied.

Austin Group Defect 1247 is applied, changing "signal traps" to "traps" and changing "All other commands" to "Except where otherwise stated, all other commands".

Austin Group Defect 1254 is applied, changing the list item relating to process IDs "known to this shell environment".

Austin Group Defect 1384 is applied, changing the requirements for subshells of interactive shells.

Austin Group Defect 1580 is applied, adding a list item about environment variables with invalid names.

C.2.14 Pattern Matching Notation

Pattern matching is a simpler concept and has a simpler syntax than REs, as the former is generally used for the manipulation of filenames, which are relatively simple collections of characters, while the latter is generally used to manipulate arbitrary text strings of potentially greater complexity. However, some of the basic concepts are the same, so this section points liberally to the detailed descriptions in XBD 9. Regular Expressions.

Austin Group Defect 1443 is applied, adding non-shell uses to the description of what shell pattern matching notation is used for.

Austin Group Defect 1564 is applied, clarifying that pattern matching notation is used for matching character strings (not arbitrary byte strings), and that if an attempt is made to use pattern matching notation to match a string that contains one or more bytes that do not form part of a valid character, the behavior is unspecified.

C.2.14.1 Patterns Matching a Single Character

Both quoting and escaping are described here because pattern matching must work in three separate circumstances:

  1. Calling directly upon the shell, such as in pathname expansion or in a case statement. All of the following match the string or file abc:
    abc "abc" a"b"c a\bc a[b]c a["b"]c a[\b]c a["\b"]c a?c a*c
    

    The following do not:

    "a?c" a\*c a\[b]c
    
  2. Calling a utility or function without going through a shell, as described for find and the fnmatch() and glob() functions defined in the System Interfaces volume of POSIX.1-2024, or pattern matching in the shell in situations where the pattern is specified indirectly instead of directly to the shell, such as:
    ls -ld -- $pattern
    

    or

    case $var in ($pattern) ...
    
  3. Calling utilities such as find, cpio, tar, or pax through the shell command line. In this case, shell quote removal is performed before the utility sees the argument. For example, in:
    find /bin -name "e\c[\h]o" -print
    

    after quote removal, the <backslash> characters are presented to find and it treats them as escape characters. Both precede ordinary characters, so the c and h represent themselves and echo would be found on many historical systems (that have it in /bin). To find a filename that contained shell special characters or pattern characters, both quoting and escaping are required, such as:

    pax -r ... "*a(\?"
    

    to extract a filename ending with "a(?".

The wording "In a pattern, or part of one, where a shell-quoting <backslash> cannot be used to preserve the literal value of a character that would otherwise be treated as special" has been carefully crafted so that for the shell it only applies to certain contexts. In particular:

In patterns specified indirectly to the shell, it is unspecified whether or not <backslash> is special inside bracket expressions. This is because there are two mutually exclusive consistency aims and neither is considered more important than the other. One is consistency with direct patterns, where <backslash> is special inside bracket expressions (which is, in turn, for consistency with the way single-quotes and double-quotes preserve the literal value of characters inside bracket expressions); the other is consistency with regular expressions, find, pax, fnmatch(), and glob(), where <backslash> is not special inside bracket expressions (not counting the extra C-string escaping in EREs in awk).

Earlier versions of this standard allowed two behaviors when a pattern ends with an unescaped <backslash>: it could match nothing or be treated as an invalid pattern. However, a third behavior has since been observed, where the ending <backslash> is treated as a literal <backslash>, and therefore this standard now simply states that the behavior is unspecified.

Earlier versions of this standard included the statement "The shell special characters always require quoting" in XCU 2.14.1 Patterns Matching a Single Character. It is unclear what was intended by this, since there are pattern matching contexts in which it is not possible to quote those characters, such as:

execlp("find", "find", ".", "-name", "*[()]*", (char *)0);

where the parentheses cannot be escaped with a <backslash> because <backslash> is not special in bracket expressions in that context. The statement is thought to have been a warning to application writers and interactive shell users that shell special characters (sometimes called metacharacters) always need quoting in patterns that appear directly in shell code; for example, this code:

case $char in
[()]) ... ;;
esac

is incorrect because the parentheses are parsed as operators—they need to be quoted in order to be treated as part of the pattern. This standard now simply requires instead that applications quote or escape any character that would otherwise be treated as special, in order for it to be matched as an ordinary character. If shell special characters are used without this protection in contexts where they are treated as special, syntax errors can result or implementation extensions can be triggered. Some shells support a series of extensions based on parentheses in patterns that are valid extensions in these contexts because they would otherwise cause syntax errors. However, this means that they are not allowed by this standard to be recognized in contexts where those syntax errors would not occur anyway, such as in:

pattern='a*(b)'; ls -- $pattern
which this standard requires to list files with names beginning 'a' and ending "(b)". It is recommended that implementations do not extend pattern matching in the shell in ways that are only valid extensions because they would otherwise be syntax errors, in order to avoid inconsistency between different pattern matching contexts. One way to provide an extension that is consistent between different pattern matching contexts in the shell (although still not consistent with find -name, fnmatch(), etc.) is to enable the extension only when a non-standard shell option is set, or when the shell is executed using a command name other than sh. Consistency with non-shell contexts can then be achieved by enabling equivalent extensions in those other contexts by use of non-standard utility options or non-standard FNM_* and GLOB_* flags.

The restriction on a <circumflex> in a bracket expression is to allow implementations that support pattern matching using the <circumflex> as the negation character in addition to the <exclamation-mark>. A conforming application must use something like "[\^!]" to match either character.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0042 [806] is applied.

Austin Group Defect 985 is applied, changing the description of the '[' special character.

Austin Group Defect 1234 is applied, clarifying how <backslash> is handled in patterns.

C.2.14.2 Patterns Matching Multiple Characters

Since each <asterisk> matches zero or more occurrences, the patterns "a*b" and "a**b" have identical functionality.

Examples
a[bc]
Matches the strings "ab" and "ac".
a*d
Matches the strings "ad", "abd", and "abcd", but not the string "abc".
a*d*
Matches the strings "ad", "abcd", "abcdef", "aaaad", and "adddd".
*a*d
Matches the strings "ad", "abcd", "efabcd", "aaaad", and "adddd".
C.2.14.3 Patterns Used for Filename Expansion

The caveat about a <slash> within a bracket expression is derived from historical practice. The pattern "a[b/c]d" does not match such pathnames as abd or a/d. On some implementations (including those conforming to the Single UNIX Specification), it matched a pathname of literally "a[b/c]d". On other systems, it produced an undefined condition (an unescaped '[' used outside a bracket expression). In this version, the XSI behavior is now required.

Filenames beginning with a <period> historically have been specially protected from view on UNIX systems. A proposal to allow an explicit <period> in a bracket expression to match a leading <period> was considered; it is allowed as an implementation extension, but a conforming application cannot make use of it. If this extension becomes popular in the future, it will be considered for a future version of the Shell and Utilities volume of POSIX.1-2024.

Patterns are matched against existing filenames and pathnames only when the pattern contains a '*', '?' or '[' character that will be treated as special. This prevents accidental removal of <backslash> characters in variable expansions where generating a list of matching files is not intended and a (usually oddly named) file with a matching name happens to exist. For example, a shell script that tries to be portable to systems that predate the introduction of functions and printf might use this on POSIX systems:

myecho='printf %s\n'

to be used as:

$myecho args...
If %s\n were to be matched against existing files, this would not work if a file called %sn happened to exist.

Historical systems have varied in their permissions requirements. To match f*/bar has required read permissions on the f* directories in the System V shell, but the Shell and Utilities volume of POSIX.1-2024, the C shell, and KornShell require only search permissions. If read or search permission is denied, shells do not report an error but treat this as a successful "no match" condition. Error conditions that are related to file system contents and occur when attempting to read or search a directory are also required to be treated the same way because they imply that there are no matches (that are accessible to the process). For example, if the pattern is foo/*bar and attempting to open the directory foo fails because it does not exist or is not a directory, then there can be no matching pathnames. The error conditions listed in XSH 2.3 Error Numbers that are related to file system contents and could occur when attempting to open or search a directory are [EACCES], [ELOOP], [ENAMETOOLONG], [ENOENT], and [ENOTDIR]. Error conditions that are not related to file system contents or which occur when reading a directory, notably [EMFILE] and [ENFILE] but also things like [EIO], [ENOMEM], and [EOVERFLOW], can either be treated as errors or be treated the same way as when permission is denied. Treating them as errors is seen as desirable, because to do otherwise would mean the shell could execute a command with an unchanged pattern when pathnames matching the pattern exist, but it is not historical practice. Implementations that handle the two categories of error differently should also handle non-standard error conditions appropriately, if encountered, depending on which category they fit into.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0043 [963] is applied.

Austin Group Defect 1070 is applied, requiring that when the matching filenames or pathnames are sorted, any that collate equally are further compared byte-by-byte using the collating sequence for the POSIX locale.

Austin Group Defect 1228 is applied, allowing directory entries for dot and dot-dot to be ignored when matching patterns against existing filenames.

Austin Group Defect 1234 is applied, changing the behavior of patterns used for filename expansion such that a pattern is matched against existing filenames and pathnames only when it contains a '*', '?' or '[' character that will be treated as special.

Austin Group Defects 1273 and 1275 are applied, clarifying how errors are treated when attempting to open or search a pathname as a directory or attempting to read an opened directory.

C.2.15 Special Built-In Utilities

See the RATIONALE sections on the individual reference pages.

POSIX.1-2008, Technical Corrigendum 2, XCU/TC2-2008/0044 [882] and XCU/TC2-2008/0045 [654] are applied.

Austin Group Defect 1009 is applied, clarifying the behavior when a special built-in utility is executed with a variable assignment.

Austin Group Defect 1445 is applied, changing text relating to the term "built-in".

C.3 Utilities

For the utilities included in POSIX.1-2024, see the RATIONALE sections on the individual reference pages.

C.3.1 Utilities Removed in this Version

The following utilities were removed in this version of this standard:


qalter
qdel
qhold


qmove
qmsg
qrerun


qrls
qselect
qsig


qstat
qsub

C.3.2 Utilities Removed in the Previous Version

None.

C.3.3 Exclusion of Utilities

The set of utilities contained in POSIX.1-2024 is drawn from the base documents for IEEE Std 1003.2-1992, with one addition: the c17 utility. This section contains rationale for some of the deliberations that led to this set of utilities, and why certain utilities were excluded.

Many utilities were evaluated by the standard developers; more historical utilities were excluded from the base documents for IEEE Std 1003.2-1992 than included. The following list contains many common UNIX system utilities that were not included as mandatory utilities, in the User Portability Utilities option, in the XSI option, or in one of the software development groups. It is logistically difficult for this rationale to distribute correctly the reasons for not including a utility among the various utility options. Therefore, this section covers the reasons for all utilities not included in POSIX.1-2024.

This rationale is limited to a discussion of only those utilities actively or indirectly evaluated by the IEEE Std 1003.2-1992 standard developers, rather than the list of all known UNIX utilities from all its variants.

adb
The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. Furthermore, many useful aspects of adb are very hardware-specific.
as
Assemblers are hardware-specific and are included implicitly as part of the compilers in POSIX.1-2024.
banner
The only known use of this command is as part of the lp printer header pages. It was decided that the format of the header is implementation-defined, so this utility is superfluous to application portability.
calendar
This reminder service program is not useful to conforming applications.
cancel
The lp (line printer spooling) system specified is the most basic possible and did not need this level of application control.
chroot
This is primarily of administrative use, requiring superuser privileges.
col
No utilities defined in POSIX.1-2024 produce output requiring such a filter. The nroff text formatter is present on many historical systems and will continue to remain as an extension; col is expected to be shipped by all the systems that ship nroff.
cpio
This has been replaced by pax, for reasons explained in the rationale for that utility.
cpp
This is subsumed by c17.
cu
This utility is terminal-oriented and is not useful from shell scripts or typical application programs.
dc
The functionality of this utility can be provided by the bc utility; bc was selected because it was easier to use and had superior functionality. Although the historical versions of bc are implemented using dc as a base, POSIX.1-2024 prescribes the interface and not the underlying mechanism used to implement it.
dircmp
Although a useful concept, the historical output of this directory comparison program is not suitable for processing in application programs. Also, the diff -r command gives equivalent functionality.
dis
Disassemblers are hardware-specific.
emacs
The community of emacs editing enthusiasts was adamant that the full emacs editor not be included in IEEE Std 1003.2-1992 because they were concerned that an attempt to standardize this very powerful environment would encourage vendors to ship versions conforming strictly to the standard, but lacking the extensibility required by the community. The author of the original emacs program also expressed his desire to omit the program. Furthermore, there were a number of historical UNIX systems that did not include emacs, or included it without supporting it, but there were very few that did not include and support vi.
ld
This is subsumed by c17.
line
The functionality of line can be provided with read.
lint
This technology is partially subsumed by c17. It is also hard to specify the degree of checking for possible error conditions in programs in any compiler, and specifying what lint would do in these cases is equally difficult.

It is fairly easy to specify what a compiler does. It requires specifying the language, what it does with that language, and stating that the interpretation of any incorrect program is unspecified. Unfortunately, any description of lint is required to specify what to do with erroneous programs. Since the number of possible errors and questionable programming practices is infinite, one cannot require lint to detect all errors of any given class.

Additionally, some vendors complained that since many compilers are distributed in a binary form without a lint facility (because the ISO C standard does not require one), implementing the standard as a stand-alone product will be much harder. Rather than being able to build upon a standard compiler component (simply by providing c17 as an interface), source to that compiler would most likely need to be modified to provide the lint functionality. This was considered a major burden on system providers for a very small gain to developers (users).

login
This utility is terminal-oriented and is not useful from shell scripts or typical application programs.
lorder
This utility is an aid in creating an implementation-defined detail of object libraries that the standard developers did not feel required standardization.
lpstat
The lp system specified is the most basic possible and did not need this level of application control.
mail
This utility was omitted in favor of mailx because there was a considerable functionality overlap between the two.
mknod
This was omitted in favor of mkfifo, as mknod has too many implementation-defined functions.
news
This utility is terminal-oriented and is not useful from shell scripts or typical application programs.
pack
This compression program was considered inferior to compress.
passwd
This utility was proposed in an early draft of the IEEE Std 1003.2-1992 UPE but met with too many objections to be included. There were various reasons:
pcat
This compression program was considered inferior to zcat.
pg
This duplicated many of the features of the more pager, which was preferred by the standard developers.
prof
The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool.
RCS
RCS was originally considered as part of a version control utilities portion of the scope. However, this aspect was abandoned by the standard developers. SCCS is now included as an optional part of the XSI option.
red
Restricted editor. This was not considered by the standard developers because it never provided the level of security restriction required.
rsh
Restricted shell. This was not considered by the standard developers because it does not provide the level of security restriction that is implied by historical documentation.
sdb
The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool. Furthermore, some useful aspects of sdb are very hardware-specific.
sdiff
The "side-by-side diff" utility from System V was omitted because it is used infrequently, and even less so by conforming applications. Despite being in System V, it is not in the SVID or XPG.
shar
Any of the numerous "shell archivers" were excluded because they did not meet the requirement of existing practice.
shl
This utility is terminal-oriented and is not useful from shell scripts or typical application programs. The job control aspects of the shell command language are generally more useful.
size
The intent of the various software development utilities was to assist in the installation (rather than the actual development and debugging) of applications. This utility is primarily a debugging tool.
spell
This utility is not useful from shell scripts or typical application programs. The spell utility was considered, but was omitted because there is no known technology that can be used to make it recognize general language for user-specified input without providing a complete dictionary along with the input file.
su
This utility is not useful from shell scripts or typical application programs. (There was also sentiment to avoid security-related utilities.)
sum
This utility was renamed cksum.
tar
This has been replaced by pax, for reasons explained in the rationale for that utility.
unpack
This compression program was considered inferior to uncompress.
wall
This utility is terminal-oriented and is not useful in shell scripts or typical applications. It is generally used only by system administrators.

 

return to top of page

UNIX® is a registered Trademark of The Open Group.
POSIX™ is a Trademark of The IEEE.
Copyright © 2001-2024 The IEEE and The Open Group, All Rights Reserved
[ Main Index | XBD | XSH | XCU | XRAT ]