POSIX.1-2024 is one of a family of standards known as POSIX. The family of standards extends to many topics; POSIX.1 consists of both operating system interfaces and shell and utilities. POSIX.1-2024 is technically identical to The Open Group Base Specifications, Issue 8.
The (paraphrased) goals of this development were to revise the single document that is ISO/IEC 9945:2009 Parts 1 through 4 as amended by ISO/IEC 9945:2009/Cor.1:2013 and ISO/IEC 9945:2009/Cor.2:2017, IEEE Std 1003.1-2017, and the appropriate parts of The Open Group Single UNIX Specification, Version 5. This work has been undertaken by the Austin Group, a joint working group of IEEE, The Open Group, and ISO/IEC JTC 1/SC 22.
The following are the base documents in this version:
IEEE Std 1003.1-2017
IEEE Std 1003.26-2003
ISO/IEC 9899:2018, Programming Languages — C
This version has addressed the following areas:
Issues raised by Austin Group defect reports, IEEE Interpretations against IEEE Std 1003.1, and ISO/IEC defect reports against ISO/IEC 9945
The repository of interpretations can be accessed at www.opengroup.org/austin/interps.
Issues raised in corrigenda for The Open Group Standards and working group resolutions from The Open Group
Changes to make the text self-consistent with the additional material merged
A list of the new interfaces is included in B.1.1 Change History .
Features, marked obsolescent in the base documents, have been considered for removal in this version
See B.1.1 Change History and C.1.1 Change History.
Alignment with the ISO/IEC 9899:2018 standard
The following were requirements on POSIX.1-2024:
Backward-compatibility
For interfaces carried forward, it was agreed that there should be no breakage of functionality in the existing base documents. All strictly conforming applications will be conforming but not necessarily strictly conforming to the revised standard. The goal is for system implementations to be able to support the existing and revised standards simultaneously.
Architecture and n-bit-neutral
The common standard should not make any implicit assumptions about the system architecture or size of data types; for example, previously some 32-bit implicit assumptions had crept into the standards.
Extensibility
It should be possible to extend the common standard without breaking backwards-compatibility; for example, the name space should be reserved and structured to avoid duplication of names between the standard and extensions to it.
The standard developers believed it essential for a programmer to have a single complete reference place, but recognized that deference to the formal standard has to be addressed for the duplicate interface definitions between the ISO C standard and POSIX.1-2024.
Where an interface has a version in the ISO C standard, the DESCRIPTION section describes the relationship to the ISO C standard and markings are included as appropriate to show where the ISO C standard has been extended in the text.
A block of text is included at the start of each affected reference page stating whether the page is aligned with the ISO C standard or extended. Each page has been parsed for additions beyond the ISO C standard (that is, including both POSIX and UNIX extensions), and these extensions are marked as CX extensions (for C extensions).
The Federal Information Processing Standards (FIPS) are a series of US government procurement standards managed and maintained on behalf of the US Department of Commerce by the National Institute of Standards and Technology (NIST).
The following restrictions were integrated into IEEE Std 1003.1-2001. They originally came from FIPS 151-2 which was withdrawn by NIST on February 25 2000.
The implementation supports _POSIX_CHOWN_RESTRICTED.
The limit {NGROUPS_MAX} is greater than or equal to 8.
The implementation supports the setting of the group ID of a file (when it is created) to that of the parent directory.
The implementation supports _POSIX_SAVED_IDS.
The implementation supports _POSIX_VDISABLE.
The implementation supports _POSIX_JOB_CONTROL.
The implementation supports _POSIX_NO_TRUNC.
The read() function returns the number of bytes read when interrupted by a signal and does not return -1.
The write() function returns the number of bytes written when interrupted by a signal and does not return -1.
In the environment for the login shell, the environment variables LOGNAME and HOME are defined and have the properties described in POSIX.1-2024.
The value of {CHILD_MAX} is greater than or equal to 25.
The value of {OPEN_MAX} is greater than or equal to 20.
The implementation supports the functionality associated with the symbols CS7, CS8, CSTOPB, PARODD, and PARENB defined in <termios.h>.
Note that where the footnotes state that "must" is used only to describe unavoidable situations and "will" is only used in statements of fact, they are referring to uses of these words in normative text. In informative text, they are used in other ways with their usual dictionary meanings.
See A.2 Conformance.
There is no additional rationale provided for this section.
For Issue 7 onwards, in references to Technical Corrigenda, the original Austin Group defect report numbers that gave rise to the change are included in square brackets after the change number from the Technical Corrigendum. For more information on Austin Group defect reports see www.opengroup.org/austin/defectform.html.
The meanings specified in POSIX.1-2024 for the words shall, should, and may are mandated by ISO/IEC directives.
In the Rationale (Informative) volume of POSIX.1-2024, the words shall, should, and may are sometimes used to illustrate similar usages in POSIX.1-2024. However, the rationale itself does not specify anything regarding implementations or applications.
As a practical matter, the conformance document is effectively part of the system documentation. Conformance documents are distinguished by POSIX.1-2024 so that they can be referred to distinctly.
This definition is analogous to that of the ISO C standard and, together with "undefined" and "unspecified", provides a range of specification of freedom allowed to the interface implementor.
The use of may has been limited as much as possible, due both to confusion stemming from its ordinary English meaning and to objections regarding the desirability of having as few options as possible and those as clearly specified as possible.
The usage of can and may were selected to contrast optional application behavior (can) against optional implementation behavior (may).
Declarative sentences are sometimes used in POSIX.1-2024 as if they included the word shall, and facilities thus specified are no less required. For example, the two statements:
The foo() function shall return zero.
The foo() function returns zero.
are meant to be exactly equivalent.
In POSIX.1-2024, the word should does not usually apply to the implementation, but rather to the application. Thus, the important words regarding implementations are shall, which indicates requirements, and may, which indicates options.
The term "obsolescent" means "do not use this feature in new applications". A feature noted as obsolescent is supported by all implementations, but may be removed in a future version; new applications should not use these features. The obsolescence concept is not an ideal solution, but was used as a method of increasing consensus: many more objections would be heard from the user community if some of these historical features were suddenly removed without the grace period obsolescence implies. The phrase "may be removed in a future version" implies that the result of that consideration might in fact keep those features indefinitely if the predominance of applications do not migrate away from them quickly.
The term "legacy" was included in earlier versions of this standard but is no longer used in the current version.
The system documentation should normally describe the whole of the implementation, including any extensions provided by the implementation. Such documents normally contain information at least as detailed as the specifications in POSIX.1-2024. Few requirements are made on the system documentation, but the term is needed to avoid a dangling pointer where the conformance document is permitted to point to the system documentation.
See implementation-defined.
See implementation-defined.
The definitions for "unspecified" and "undefined" appear nearly identical at first examination, but are not. The term "unspecified" means that a conforming application may deal with the unspecified behavior, and it should not care what the outcome is. The term "undefined" says that a conforming application should not do it because no definition is provided for what it does (and implicitly it would care what the outcome was if it tried it). It is important to remember, however, that if the syntax permits the statement at all, it must have some outcome in a real implementation.
Thus, the terms "undefined" and "unspecified" apply to the way the application should think about the feature. In terms of the implementation, it is always "defined"—there is always some result, even if it is an error. The implementation is free to choose the behavior it prefers.
This also implies that an implementation, or another standard, could specify or define the result in a useful fashion. The terms apply to POSIX.1-2024 specifically.
The term "implementation-defined" implies requirements for documentation that are not required for "undefined" (or "unspecified"). Where there is no need for a conforming program to know the definition, the term "undefined" is used, even though "implementation-defined" could also have been used in this context. There could be a fourth term, specifying "this standard does not say what this does; it is acceptable to define it in an implementation, but it does not need to be documented", and undefined would then be used very rarely for the few things for which any definition is not useful. In particular, implementation-defined is used where it is believed that certain classes of application will need to know such details to determine whether the application can be successfully ported to the implementation. Such applications are not always strictly portable, but nevertheless are common and useful; often the requirements met by the application cannot be met without dealing with the issues implied by "implementation-defined". In some places the text refers to facilities supplied by the implementation that are outside the standard as implementation-supplied or implementation-provided. This is not intended to imply a requirement for documentation. If it were, the term "implementation-defined" would have been used.
In many places POSIX.1-2024 is silent about the behavior of some possible construct. For example, a variable may be defined for a specified range of values and behaviors are described for those values; nothing is said about what happens if the variable has any other value. That kind of silence can imply an error in the standard, but it may also imply that the standard was intentionally silent and that any behavior is permitted. There is a natural tendency to infer that if the standard is silent, a behavior is prohibited. That is not the intent. Silence is intended to be equivalent to the term "unspecified".
Three terms used within POSIX.1-2024 overlap in meaning: "macro", "symbolic name", and "symbolic constant".
This usually describes a C preprocessor symbol, the result of the #define operator, with or without an argument. It may also be used to describe similar mechanisms in editors and text processors.
In earlier versions of this standard this was also sometimes used to refer to a C preprocessor symbol (without arguments), but the intention is for all such uses to have been removed. It is now mainly used to refer to the names for characters in character sets, but is sometimes used to refer to host names and even filenames.
This also refers to a C preprocessor symbol, with specific associated requirements. See the definition in 3.363 Symbolic Constant.
There is no additional rationale provided for this section.
To aid the identification of options within POSIX.1-2024, a notation consisting of margin codes and shading is used. This is based on the notation used in earlier versions of The Open Group Base specifications.
The benefit of this approach is a reduction in the number of if statements within the running text, that makes the text easier to read, and also an identification to the programmer that they need to ensure that their target platforms support the underlying options. For example, if functionality is marked with RPP in the margin, it will be available on all systems supporting the Robust Mutex Priority Protection option, but may not be available on some others.
This section includes codes for options defined in XBD 2.1.6 Options, and the following additional codes for other purposes:
Where an interface is added to an ISO C standard header, within the header the interface has an appropriate margin marker and shading (for example, CX, XSI, TSF, and so on) and the same marking appears on the reference page in the SYNOPSIS section. This enables a programmer to easily identify that the interface is extending an ISO C standard header.
Austin Group Defect 1755 is applied, changing the CX code description to include intentional conflicts (deviations).
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0001 [591] is applied.
Since some features may depend on one or more options, or require more than one option, a notation is used. Where a feature requires support of a single option, a single margin code will occur in the margin. If it depends on two options and both are required, then the codes will appear with a <space> separator. If either of two options are required, then a logical OR is denoted using the '|' symbol. If more than two codes are used, a special notation is used.
The terms "profile" and "profiling" are used throughout this section.
A profile of a standard or standards is a codified set of option selections, such that by being conformant to a profile, particular classes of users are specifically supported.
These definitions allow application developers to know what to depend on in an implementation.
There is no definition of a "strictly conforming implementation"; that would be an implementation that provides only those facilities specified by POSIX.1 with no extensions whatsoever. This is because no actual operating system implementation can exist without system administration and initialization facilities that are beyond the scope of POSIX.1.
The word "support" is used in certain instances, rather than "provide", in order to allow an implementation that has no resident software development facilities, but that supports the execution of a Strictly Conforming POSIX.1 Application, to be a conforming implementation.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0002 [810] is applied.
The conformance documentation is required to use the same numbering scheme as POSIX.1 for purposes of cross-referencing. All options that an implementation chooses are reflected in <limits.h> and <unistd.h>.
Note that the use of "may" in terms of where conformance documents record where implementations may vary, implies that it is not required to describe those features identified as undefined or unspecified.
Other aspects of systems must be evaluated by purchasers for suitability. Many systems incorporate buffering facilities, maintaining updated data in volatile storage and transferring such updates to non-volatile storage asynchronously. Various exception conditions, such as a power failure or a system crash, can cause this data to be lost. The data may be associated with a file that is still open, with one that has been closed, with a directory, or with any other internal system data structures associated with permanent storage. This data can be lost, in whole or part, so that only careful inspection of file contents could determine that an update did not occur.
Also, interrelated file activities, where multiple files and/or directories are updated, or where space is allocated or released in the file system structures, can leave inconsistencies in the relationship between data in the various files and directories, or in the file system itself. Such inconsistencies can break applications that expect updates to occur in a specific sequence, so that updates in one place correspond with related updates in another place.
For example, if a user creates a file, places information in the file, and then records this action in another file, a system or power failure at this point followed by restart may result in a state in which the record of the action is permanently recorded, but the file created (or some of its information) has been lost. The consequences of this to the user may be undesirable. For a user on such a system, the only safe action may be to require the system administrator to have a policy that requires, after any system or power failure, that the entire file system must be restored from the most recent backup copy (causing all intervening work to be lost).
The characteristics of each implementation will vary in this respect and may or may not meet the requirements of a given application or user. Enforcement of such requirements is beyond the scope of POSIX.1. It is up to the purchaser to determine what facilities are provided in an implementation that affect the exposure to possible data or sequence loss, and also what underlying implementation techniques and/or facilities are provided that reduce or limit such loss or its consequences.
This really means conformance to the base standard; however, since this document includes the core material of the Single UNIX Specification, the standard developers decided that it was appropriate to segment the conformance requirements into two, the former for the base standard, and the latter for the Single UNIX Specification (denoted XSI Conformance).
Within POSIX.1 there are some symbolic constants that, if defined to a certain value or range of values, indicate that a certain option is enabled. Other symbolic constants exist in POSIX.1 for other reasons.
In this version, some features that were previously optional have been made mandatory. For backwards compatibility, the symbolic constants associated with the option are still required now with fixed allowable ranges or values. The following options from previous versions of this standard are now mandatory:
_POSIX_ASYNCHRONOUS_IO _POSIX_BARRIERS _POSIX_CLOCK_SELECTION _POSIX_MAPPED_FILES _POSIX_MEMORY_PROTECTION _POSIX_MONOTONIC_CLOCK _POSIX_READER_WRITER_LOCKS _POSIX_REALTIME_SIGNALS _POSIX_SEMAPHORES _POSIX_SPIN_LOCKS _POSIX_THREAD_SAFE_FUNCTIONS _POSIX_THREADS _POSIX_TIMEOUTS _POSIX_TIMERS
A POSIX-conformant system may support the XSI option required by the Single UNIX Specification. This was intentional since the standard developers intend them to be upwards-compatible, so that a system conforming to the Single UNIX Specification can also conform to the base standard at the same time.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0003 [637] is applied.
Austin Group Defect 729 is applied, adding _POSIX_DEVICE_CONTROL.
Austin Group Defect 1346 is applied, requiring support for Monotonic Clock.
This section is included to describe the conformance requirements for the base volumes of the Single UNIX Specification.
XSI conformance can be thought of as a profile, selecting certain options from POSIX.1-2024.
The concept of "Option Groups" is included to allow collections of related functions or options to be grouped together. This has been used as follows: the "XSI Option Groups" have been created to allow super-options, collections of underlying options and related functions, to be collectively supported by XSI-conforming systems.
The standard developers considered the matter of subprofiling and decided it was better to include an enabling mechanism rather than detailed normative requirements. A set of subprofiling options was developed and included later in this volume of POSIX.1-2024 as an informative illustration.
The goal of not simultaneously fixing maximums and minimums was to allow implementations of the base standard or standards to
support multiple profiles without conflict.
The following summarizes the rules for the limit types:
Limit Type |
Fixed Value |
Minimum Acceptable Value |
Maximum Acceptable Value |
---|---|---|---|
Standard |
Xs |
Ys |
Zs |
Profile |
Xp == Xs |
Yp >= Ys |
Zp <= Zs |
|
(No change) |
(May increase the limit) |
(May decrease the limit) |
The intent is that ranges specified by limits in profiles be entirely contained within the corresponding ranges of the base standard or standards being profiled, and that the unlimited end of a range in a base standard must remain unlimited in any profile of that standard.
Thus, the fixed _POSIX_* limits are constants and must not be changed by a profile. The variable counterparts (typically without the leading _POSIX_) can be changed but still remain semantically the same; that is, they still allow implementation values to vary as long as they meet the requirements for that value (be it a minimum or maximum).
Where a profile does not provide a feature upon which a limit is based, the limit is not relevant. Applications written to that profile should be written to operate independently of the value of the limit.
An example which has previously allowed implementations to support both the base standard and two other profiles in a compatible manner follows:
Base standard (POSIX.1-1996): _POSIX_CHILD_MAX 6 Base standard: CHILD_MAX minimum maximum _POSIX_CHILD_MAX FIPS profile/SUSv2 CHILD_MAX 25 (minimum maximum)
Another example:
Base standard (POSIX.1-1996): _POSIX_NGROUPS_MAX 0 Base standard: NGROUPS_MAX minimum maximum _POSIX_NGROUP_MAX FIPS profile/SUSv2 NGROUPS_MAX 8
A profile may lower a minimum maximum below the equivalent _POSIX value:
Base standard: _POSIX_foo_MAX Z Base standard: foo_MAX _POSIX_foo_MAX profile standard : foo_MAX X (X can be less than, equal to, or greater than _POSIX_foo_MAX)
In this case an implementation conforming to the profile may not conform to the base standard, but an implementation to the base standard will conform to the profile.
Austin Group Defect 1192 is applied, marking the encrypt() and setkey() functions as obsolescent.
Austin Group Defect 1346 is applied, removing _POSIX_MONOTONIC_CLOCK from the Advanced Realtime option group.
The final subsections within Implementation Conformance list the core options within POSIX.1-2024. This includes both options for the System Interfaces volume of POSIX.1-2024 and the Shell and Utilities volume of POSIX.1-2024.
Austin Group Defect 190 is applied, adding man to the list of utilities in the User Portability Utilities option.
These definitions guide users or adapters of applications in determining on which implementations an application will run and how much adaptation would be required to make it run on others. These definitions are modeled after related ones in the ISO C standard.
POSIX.1 occasionally uses the expressions "portable application" or "conforming application". As they are used, these are synonyms for any of these terms. The differences between the classes of application conformance relate to the requirements for other standards, the options supported (such as the XSI option) or, in the case of the Conforming POSIX.1 Application Using Extensions, to implementation extensions. When one of the less explicit expressions is used, it should be apparent from the context of the discussion which of the more explicit names is appropriate
This definition is analogous to that of an ISO C standard "conforming program".
The major difference between a Strictly Conforming POSIX Application and an ISO C standard strictly conforming program is that the latter is not allowed to use features of POSIX that are not in the ISO C standard.
Examples of <National Bodies> include ANSI, BSI, and AFNOR.
Due to possible requirements for configuration or implementation characteristics in excess of the specifications in <limits.h> or related to the hardware (such as array size or file space), not every Conforming POSIX Application Using Extensions will run on every conforming implementation.
This is intended to be upwards-compatible with the definition of a Strictly Conforming POSIX Application, with the addition of the facilities and functionality included in the XSI option.
Such applications may use extensions beyond the facilities defined by POSIX.1-2024 including the XSI option, but need to document the additional requirements.
POSIX.1 is, for historical reasons, both a specification of an operating system interface, shell and utilities, and a C binding for that specification. Efforts had been previously undertaken to generate a language-independent specification; however, that had failed, and the fact that the ISO C standard is the de facto primary language on POSIX and the UNIX system makes this a necessary and workable situation.
There is no additional rationale provided for this section.
The definitions in this section are stated so that they can be used as exact substitutes for the terms in text. They should not contain requirements or cross-references to sections within POSIX.1-2024; that is accomplished by using an informative note. In addition, the term should not be included in its own definition. Where requirements or descriptions need to be addressed but cannot be included in the definitions, due to not meeting the above criteria, these occur in the General Concepts chapter.
In this version, the definitions have been reworked extensively to meet style requirements and to include terms from the base documents (see the Scope).
Many of these definitions are necessarily circular, and some of the terms (such as "process") are variants of basic computing science terms that are inherently hard to define. Where some definitions are more conceptual and contain requirements, these appear in the General Concepts chapter. Those listed in this section appear in an alphabetical glossary format of terms.
Some definitions must allow extension to cover terms or facilities that are not explicitly mentioned in POSIX.1-2024. For example, the definition of "Extended Security Controls" permits implementations beyond those defined in POSIX.1-2024.
Some terms in the following list of notes do not appear in POSIX.1-2024; these are marked suffixed with an asterisk (*). Many of them have been specifically excluded from POSIX.1-2024 because they concern system administration, implementation, or other issues that are not specific to the programming interface. Those are marked with a reason, such as "implementation-defined".
Austin Group Defect 1050 is applied, adding '-' to the characters that can be used in an alias name.
Austin Group Defect 850 is applied, adding anonymous memory objects.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0004 [937] is applied.
One of the fundamental security problems with many historical UNIX systems has been that the privilege mechanism is monolithic—a user has either no privileges or all privileges. Thus, a successful "trojan horse" attack on a privileged process defeats all security provisions. Therefore, POSIX.1 allows more granular privilege mechanisms to be defined. For many historical implementations of the UNIX system, the presence of the term "appropriate privileges" in POSIX.1 may be understood as a synonym for "superuser" (UID 0). However, other systems have emerged where this is not the case and each discrete controllable action has appropriate privileges associated with it. Because this mechanism is implementation-defined, it must be described in the conformance document. Although that description affects several parts of POSIX.1 where the term "appropriate privilege" is used, because the term "implementation-defined" only appears here, the description of the entire mechanism and its effects on these other sections belongs in this equivalent section of the conformance document. This is especially convenient for implementations with a single mechanism that applies in all areas, since it only needs to be described once.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0005 [516] is applied.
Austin Group Defect 1254 is applied, changing this definition.
The term "Base Character" has been removed, as it was felt that the use of this term within POSIX.1-2024 was common usage English.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0006 [653] is applied.
Austin Group Defect 854 is applied, changing text relating to regular built-in utilities.
The restriction that a byte is now exactly eight bits was a conscious decision by the standard developers. It came about due to a combination of factors, primarily the use of the type int8_t within the networking functions and the alignment with the ISO/IEC 9899:1999 standard, where the intN_t types were first defined.
According to the ISO/IEC 9899:1999 standard:
The standard developers also felt that this was not an undue restriction for the current state-of-the-art for this version of the standard, but recognize that if industry trends continue, a wider character type may be required in the future.
The term "character" is used to mean a sequence of one or more bytes representing a member of a character set. The deviation in the exact text of the ISO C standard definition for "byte" meets the intent of the rationale of the ISO C standard also clears up the ambiguity raised by the term "basic execution character set". The octet-minimum requirement is a reflection of the {CHAR_BIT} value.
Austin Group Defect 1356 is applied, changing the definition of "character" to match the definition of the term "multi-byte character" in the ISO C standard.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/3 is applied, adding the vfork() function to those listed.
The ISO C standard defines a similar interval for use by the clock() function. There is no requirement that these intervals be the same. In historical implementations these intervals are different.
Austin Group Defect 613 is applied, adding this definition.
The terms "command" and "utility" are related but have distinct meanings. Command is defined as "a directive to a shell to perform a specific task". The directive can be in the form of a single utility name (for example, ls), or the directive can take the form of a compound command (for example, "ls | grep name | pr"). A utility is a program that can be called by name from a shell. Issuing only the name of the utility to a shell is the equivalent of a one-word command. A utility may be invoked as a separate program that executes in a different process than the command language interpreter, or it may be implemented as a part of the command language interpreter. For example, the echo command (the directive to perform a specific task) may be implemented such that the echo utility (the logic that performs the task of echoing) is in a separate program; therefore, it is executed in a process that is different from the command language interpreter. Conversely, the logic that performs the echo utility could be built into the command language interpreter; therefore, it could execute in the same process as the command language interpreter.
The terms "tool" and "application" can be thought of as being synonymous with "utility" from the perspective of the operating system kernel. Tools, applications, and utilities historically have run, typically, in processes above the kernel level. Tools and utilities historically have been a part of the operating system non-kernel code and have performed system-related functions, such as listing directory contents, checking file systems, repairing file systems, or extracting system status information. Applications have not generally been a part of the operating system, and they perform non-system-related functions, such as word processing, architectural design, mechanical design, workstation publishing, or financial analysis. Utilities have most frequently been provided by the operating system distributor, applications by third-party software distributors, or by the users themselves. Nevertheless, POSIX.1-2024 does not differentiate between tools, utilities, and applications when it comes to receiving services from the system, a shell, or the standard utilities. (For example, the xargs utility invokes another utility; it would be of fairly limited usefulness if the users could not run their own applications in place of the standard utilities.) Utilities are not applications in the sense that they are not themselves subject to the restrictions of POSIX.1-2024 or any other standard—there is no requirement for grep, stty, or any of the utilities defined here to be any of the classes of conforming applications.
In most 1-byte character sets, such as ASCII, the concept of column positions is identical to character positions and to bytes. Therefore, it has been historically acceptable for some implementations to describe line folding or tab stops or table column alignment in terms of bytes or character positions. Other character sets pose complications, as they can have internal representations longer than one octet and they can have display characters that have different widths on the terminal screen or printer.
In POSIX.1-2024 the term "column positions" has been defined to mean character—not byte—positions in input files. Output files describe the column position in terms of the display width of the narrowest printable character in the character set, adjusted to fit the characteristics of the output device. It is very possible that n column positions will not be able to hold n characters in some character sets, unless all of those characters are of the narrowest width. It is assumed that the implementation is aware of the width of the various characters, deriving this information from the value of LC_CTYPE , and thus can determine how many column positions to allot for each character in those utilities where it is important.
The term "column position" was used instead of the more natural "column" because the latter is frequently used in the different contexts of columns of figures, columns of table values, and so on. Wherever confusion might result, these latter types of columns are referred to as "text columns".
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
Austin Group Defect 449 is applied, adding ;& to the list of control operators.
The question of which of possibly several special files referring to the terminal is meant is not addressed in POSIX.1. The pathname /dev/tty is a synonym for the controlling terminal associated with a process.
Austin Group Defect 1141 is applied, replacing the core file definition with a core image definition.
Austin Group Defect 1116 is applied, removing a reference to the Threads option that existed in earlier versions of this standard.
Austin Group Defect 1449 is applied, adding this definition.
Austin Group Defect 351 is applied, adding this definition.
The concept is handled in stat() as ID of device.
Historically, direct I/O refers to the system bypassing intermediate buffering, but may be extended to cover implementation-defined optimizations.
The format of the directory file is implementation-defined and differs radically between System V and 4.3 BSD. However, routines (derived from 4.3 BSD) for accessing directories and certain constraints on the format of the information returned by those routines are described in the <dirent.h> header.
Austin Group Defect 1380 is applied, changing "link" to "hard link".
The Shell and Utilities volume of POSIX.1-2024 assigns precise requirements for the terms "display" and "write". Some historical systems have chosen to implement certain utilities without using the traditional file descriptor model. For example, the vi editor might employ direct screen memory updates on a personal computer, rather than a write() system call. An instance of user prompting might appear in a dialog box, rather than with standard error. When the Shell and Utilities volume of POSIX.1-2024 uses the term "display", the method of outputting to the terminal is unspecified; many historical implementations use termcap or terminfo, but this is not a requirement. The term "write" is used when the Shell and Utilities volume of POSIX.1-2024 mandates that a file descriptor be used and that the output can be redirected. However, it is assumed that when the writing is directly to the terminal (it has not been redirected elsewhere), there is no practical way for a user or test suite to determine whether a file descriptor is being used. Therefore, the use of a file descriptor is mandated only for the redirection case and the implementation is free to use any method when the output is not redirected. The verb write is used almost exclusively, with the very few exceptions of those utilities where output redirection need not be supported: tabs, talk, tput, and vi.
The symbolic name dot is carefully used in POSIX.1 to distinguish the working directory filename from a period or a decimal point.
Historical implementations permit the use of these filenames without their special meanings. Such use precludes any meaningful use of these filenames by a Conforming POSIX.1 Application. Therefore, such use is considered an extension, the use of which makes an implementation non-conforming; see also A.4.16 Pathname Resolution.
Austin Group Defect 1122 is applied, adding this definition.
Austin Group Defect 1380 is applied, changing "link" to "hard link".
Historically, the origin of UNIX system time was referred to as "00:00:00 GMT, January 1, 1970". Greenwich Mean Time is actually not a term acknowledged by the international standards community; therefore, this term, "Epoch", is used to abbreviate the reference to the actual standard, Coordinated Universal Time.
See Pipe.
It is permissible for an implementation-defined file type to be non-readable or non-writable.
These classes correspond to the historical sets of permission bits. The classes are general to allow implementations flexibility in expanding the access mechanism for more stringent security environments. Note that a process is in one and only one class, so there is no ambiguity.
Austin Group Defect 1493 is applied, moving some information from XCU 2.7 Redirection to this definition.
Austin Group Defect 768 is applied, changing this definition.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0007 [834] is applied.
Filenames are sequences of bytes, not sequences of characters. The only bytes that this standard says cannot appear in any filename are the slash byte and the null byte. This is a side-effect of the fact that no conforming implementations of the standard currently provide a way to pass information specifying the locale associated with strings passed between user-level applications and the kernel. This decision could be revisited if implementations develop a way to associate a locale with the strings passed between kernel space and user space.
Implementations may add other restrictions to the byte sequences allowed in filenames except that any filename consisting of no more than {NAME_MAX} bytes from the set of characters in the portable filename character set must be allowed.
See Pathname.
Historically, the meaning of this term has been overloaded with two meanings: that of the complete file hierarchy, and that of a mountable subset of that hierarchy; that is, a mounted file system. POSIX.1 uses the term "file system" in the second sense, except that it is limited to the scope of a process (and root directory of a process). This usage also clarifies the domain in which a file serial number is unique.
Austin Group Defect 1254 is applied, changing this definition.
This definition is made available for those definitions (in particular, TZ ) which must exclude control characters.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/4 is applied, removing the words "of implementation-defined format". See User Database.
Implementation-defined; see User Database.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0008 [511] is applied.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0009 [584] is applied.
Austin Group Defect 1380 is applied, changing this definition.
This refers to previously existing implementations of programming interfaces and operating systems that are related to the interface specified by POSIX.1.
Austin Group Defect 415 is applied, adding this definition.
This refers to a POSIX.1 implementation that is accomplished through interfaces from the POSIX.1 services to some alternate form of operating system kernel services. Note that the line between a hosted implementation and a native implementation is blurred, since most implementations will provide some services directly from the kernel and others through some indirect path. (For example, fopen() might use open(); or mkfifo() might use mknod().) There is no necessary relationship between the type of implementation and its correctness, performance, and/or reliability.
This term is generally used instead of its synonym, "system", to emphasize the consequences of decisions to be made by system implementors. Perhaps if no options or extensions to POSIX.1 were allowed, this usage would not have occurred.
The term "specific implementation" is sometimes used as a synonym for "implementation". This should not be interpreted too narrowly; both terms can represent a relatively broad group of systems. For example, a hardware vendor could market a very wide selection of systems that all used the same instruction set, with some systems desktop models and others large multi-user minicomputers. This wide range would probably share a common POSIX.1 operating system, allowing an application compiled for one to be used on any of the others; this is a [specific] implementation. However, such a wide range of machines probably has some differences between the models. Some may have different clock rates, different file systems, different resource limits, different network connections, and so on, depending on their sizes or intended usages. Even on two identical machines, the system administrators may configure them differently. Each of these different systems is known by the term "a specific instance of a specific implementation". This term is only used in the portions of POSIX.1 dealing with runtime queries: sysconf() and pathconf().
Absolute pathname has been adequately defined.
Austin Group Defect 1347 is applied, adding a definition of interactive device.
Austin Group Defect 854 is applied, adding intrinsic utilities.
Austin Group Defect 1254 is applied, changing this definition.
In order to understand the job control facilities in POSIX.1 it is useful to understand how they are used by a job control-cognizant shell to create the user interface effect of job control.
While the job control facilities supplied by POSIX.1 can, in theory, support different types of interactive job control interfaces supplied by different types of shells, there was historically one particular interface that was most common when the standard was originally developed (provided by BSD C Shell).
This discussion describes that interface as a means of illustrating how the POSIX.1 job control facilities can be used.
Job control allows users to selectively stop (suspend) the execution of processes and continue (resume) their execution at a later point. The user typically employs this facility via the interactive interface jointly supplied by the terminal I/O driver and a command interpreter (shell).
The user can launch jobs (command pipelines) in either the foreground or background. When launched in the foreground, the shell waits for the job to complete before prompting for additional commands. When launched in the background, the shell does not wait, but immediately prompts for new commands.
If the user launches a job in the foreground and subsequently regrets this, the user can type the suspend character (typically set to <control>-Z), which causes the foreground process group to stop, and the shell to convert the corresponding foreground job to a suspended job and begin prompting for new commands. The suspended job can be continued by the user (via special shell commands) either as a foreground job or as a background job. Background jobs can also be moved into the foreground via shell commands.
If a background process group attempts to access the login terminal (controlling terminal), it is stopped by the
terminal driver and the shell detects this and, in turn, suspends the corresponding background job and notifies the user. (Terminal
access includes read() and certain terminal control functions, and conditionally
includes write().) The user can continue the suspended job in the foreground, thus
allowing the terminal access to succeed in an orderly fashion. After the terminal access succeeds, the user can optionally move the
job into the background via the suspend character and shell commands.
Implementing Job Control Shells
The job control features of the POSIX shell (described in 2.11 Job Control) and of other shells can be implemented using the job control facilities of the System Interfaces volume of POSIX.1-2024 in the following way.
The key feature necessary to provide job control is a way to group processes into jobs. This grouping is necessary in order to direct signals to a single job and also to identify which job is in the foreground. (There is at most one job that is in the foreground on any controlling terminal at a time.)
The concept of process groups is used to provide this grouping. The shell places the process(es) it creates for each job in a separate process group via the setpgid() function. To do this, the setpgid() function is invoked by the shell for each process in the job. It is actually useful to invoke setpgid() twice for each process: once in the child process, after calling fork() to create the process, but before calling one of the exec family of functions to begin execution of the program, and once in the parent shell process, after calling fork() to create the child. The redundant invocation avoids a race condition by ensuring that the child process is placed into the new process group before either the parent or the child relies on this being the case. The process group ID for the job is selected by the shell to be equal to the process ID of one of the processes in the job. Some shells choose to make one process in the job be the parent of the other processes in the job (if any). Other shells (for example, the C Shell) choose to make themselves the parent of all processes in the job. In order to support this latter case, the setpgid() function accepts a process group ID parameter since the correct process group ID cannot be inherited from the shell.
The shell also controls which job is currently in the foreground. A foreground and background job differ in two ways: the shell waits for a foreground command to complete (or stop) before continuing to read new commands, and the terminal I/O driver inhibits terminal access by background jobs (causing the processes to stop). Thus, the shell must work cooperatively with the terminal I/O driver and have a common understanding of which job is currently in the foreground. It is the user who decides which command should be currently in the foreground, and the user informs the shell via shell commands. The shell, in turn, informs the terminal I/O driver via the tcsetpgrp() function. This indicates to the terminal I/O driver the process group ID of the foreground process group. When the current foreground job is either suspended or terminated, the shell places its own process group in the foreground via tcsetpgrp() before prompting for additional commands. Note that when a job is created the new process group begins as a background process group. It requires an explicit act of the shell via tcsetpgrp() to move a process group into the foreground.
When a process in a job stops or terminates, its parent (for example, the shell) receives synchronous notification by calling the waitpid() function with the WUNTRACED flag set. Asynchronous notification is also provided when the parent establishes a signal handler for SIGCHLD and does not specify the SA_NOCLDSTOP flag. Usually all processes in a job stop as a unit since the terminal I/O driver always sends job control stop signals to all processes in the process group.
To continue a suspended job, the shell sends a SIGCONT signal to the corresponding process group. In addition, if the job is being continued in the foreground, the shell invokes tcsetpgrp() to place the process group in the foreground before sending SIGCONT. Otherwise, the shell leaves itself in the foreground and reads additional commands.
There is additional flexibility in the POSIX.1 job control facilities that allows deviations from the typical interface. Clearing the TOSTOP terminal flag allows background jobs to perform write() functions without stopping. The same effect can be achieved on a per-process basis by having a process set the signal action for SIGTTOU to SIG_IGN.
A login session that is not using the job control facilities can be thought of as a large collection of processes that are all in the same job. Such a login session may have a partial distinction between foreground and background processes; that is, the shell waits for some processes before continuing to read new commands and does not wait for other processes. However, the terminal I/O driver considers all these processes to be in the foreground since they are all members of the same process group.
In addition to the basic job control operations already mentioned, a job control-cognizant shell needs to perform the following actions.
When a foreground (not background) job is suspended, the shell needs to sample and remember the current terminal settings so that it can restore them later when it continues the suspended job in the foreground (via the tcgetattr() and tcsetattr() functions).
Because a shell itself can be spawned from a shell, it must take special action to ensure that child shells interact well with their parent shells. A child shell can be spawned to perform an interactive function (prompting the terminal for commands) or a non-interactive function (reading commands from a file). When operating non-interactively, the job control shell will by default refrain from performing the job control-specific actions described above. It will behave as a shell that does not support job control. For example, all jobs will be left in the same process group as the shell, which itself remains in the process group established for it by its parent. This allows the shell and its children to be treated as a single job by a parent shell, and they can be affected as a unit by terminal keyboard signals.
An interactive child shell can be spawned from another job control-cognizant shell in either the foreground or
background. (For example, the user can execute an interactive shell in the background by means of the command "sh &".)
Before the child shell activates job control by calling setpgid() to place itself in
its own process group and tcsetpgrp() to place its new process group in the
foreground, it needs to ensure that it has already been placed in the foreground by its parent. (Otherwise, there could be multiple
job control shells that simultaneously attempt to control mediation of the terminal.) To determine this, the shell retrieves its
own process group via getpgrp() and the process group of the current foreground job
via tcgetpgrp(). If these are not equal, the shell sends SIGTTIN to its own
process group, causing itself to stop. When continued later by its parent, the shell repeats the process group check. When the
process groups finally match, the shell is in the foreground and it can proceed to take control. After this point, the shell
ignores all the job control stop signals so that it does not inadvertently stop itself.
Implementing Job Control Applications
Most applications do not need to be aware of job control signals and operations; the intuitively correct behavior happens by default. However, sometimes an application can inadvertently interfere with normal job control processing, or an application may choose to overtly effect job control in cooperation with normal shell procedures.
An application can inadvertently subvert job control processing by "blindly" altering the handling of signals. A common application error is to learn how many signals the system supports and to ignore or catch them all. Such an application makes the assumption that it does not know what this signal is, but knows the right handling action for it. The system may initialize the handling of job control stop signals so that they are being ignored. This allows shells that do not support job control to inherit and propagate these settings and hence to be immune to stop signals. A job control shell will set the handling to the default action and propagate this, allowing processes to stop. In doing so, the job control shell is taking responsibility for restarting the stopped applications. If an application wishes to catch the stop signals itself, it should first determine their inherited handling states. If a stop signal is being ignored, the application should continue to ignore it. This is directly analogous to the recommended handling of SIGINT described in the referenced UNIX Programmer’s Manual.
If an application is reading the terminal and has disabled the interpretation of special characters (by clearing
the ISIG flag), the terminal I/O driver will not send SIGTSTP when the suspend character is typed. Such an application can simulate
the effect of the suspend character by recognizing it and sending SIGTSTP to its process group as the terminal driver would have
done. Note that the signal is sent to the process group, not just to the application itself; this ensures that other processes in
the job also stop. (Note also that other processes in the job could be children, siblings, or even ancestors.) Applications should
not assume that the suspend character is <control>-Z (or any particular value); they should retrieve the current setting at
startup.
Implementing Job Control Systems
The intent in adding 4.2 BSD-style job control functionality was to adopt the necessary 4.2 BSD programmatic interface with only minimal changes to resolve syntactic or semantic conflicts with System V or to close recognized security holes. The goal was to maximize the ease of providing both conforming implementations and Conforming POSIX.1 Applications.
It is only useful for a process to be affected by job control signals if it is the descendant of a job control shell. Otherwise, there will be nothing that continues the stopped process.
POSIX.1 does not specify how controlling terminal access is affected by a user logging out (that is, by a controlling process terminating). 4.2 BSD uses the vhangup() function to prevent any access to the controlling terminal through file descriptors opened prior to logout. System V does not prevent controlling terminal access through file descriptors opened prior to logout (except for the case of the special file, /dev/tty). Some implementations choose to make processes immune from job control after logout (that is, such processes are always treated as if in the foreground); other implementations continue to enforce foreground/background checks after logout. Therefore, a Conforming POSIX.1 Application should not attempt to access the controlling terminal after logout since such access is unreliable. If an implementation chooses to deny access to a controlling terminal after its controlling process exits, POSIX.1 requires a certain type of behavior (see Controlling Terminal).
Austin Group Defect 1254 is applied, changing this definition.
Austin Group Defect 1254 is applied, changing "job control job ID" to "job ID".
Austin Group Defect 792 is applied, adding this definition.
See System Call*.
See System Call*.
Austin Group Defect 1380 is applied, changing this definition.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0010 [690] is applied.
Austin Group Defect 792 is applied, adding this definition.
Implementation-defined.
The definition of map is included to clarify the usage of mapped pages in the description of the behavior of process memory locking.
The term "memory-resident" is historically understood to mean that the so-called resident pages are actually present in the physical memory of the computer system and are immune from swapping, paging, copy-on-write faults, and so on. This is the actual intent of POSIX.1-2024 in the process memory locking section for implementations where this is logical. But for some implementations—primarily mainframes—actually locking pages into primary storage is not advantageous to other system objectives, such as maximizing throughput. For such implementations, memory locking is a "hint" to the implementation that the application wishes to avoid situations that would cause long latencies in accessing memory. Furthermore, there are other implementation-defined issues with minimizing memory access latencies that "memory residency" does not address—such as MMU reload faults. The definition attempts to accommodate various implementations while allowing conforming applications to specify to the implementation that they want or need the best memory access times that the implementation can provide.
The term "memory object" usually implies shared memory. If the object is the same as a filename in the file system name space of the implementation, it is expected that the data written into the memory object be preserved on disk. A memory object may also apply to a physical device on an implementation. In this case, writes to the memory object are sent to the controller for the device and reads result in control registers being returned.
Austin Group Defect 1122 is applied, adding this definition.
See File System.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0011 [625] is applied.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0011 [625] is applied.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0011 [625] is applied.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
There are no explicit limits in POSIX.1-2024 on the sizes of names, words (see the definition of word in the Base Definitions volume of POSIX.1-2024), lines, or other objects. However, other implicit limits do apply: shell script lines produced by many of the standard utilities cannot exceed {LINE_MAX} and the sum of exported variables comes under the {ARG_MAX} limit. Historical shells dynamically allocate memory for names and words and parse incoming lines a character at a time. Lines cannot have an arbitrary {LINE_MAX} limit because of historical practice, such as makefiles, where make removes the <newline> characters associated with the commands for a target and presents the shell with one very long line. The text on INPUT FILES in XCU 1.4 Utility Description Defaults does allow a shell to run out of memory, but it cannot have arbitrary programming limits.
This refers to an implementation of POSIX.1 that interfaces directly to an operating system kernel; see also hosted implementation. A similar concept is a native UNIX system, which would be a kernel derived from one of the original UNIX system products.
Austin Group Defect 1428 is applied, adding this definition.
This definition is not intended to suggest that all processes in a system have priorities that are comparable. Scheduling policy extensions, such as adding realtime priorities, make the notion of a single underlying priority for all scheduling policies problematic. Some implementations may implement the features related to nice to affect all processes on the system, others to affect just the general time-sharing activities implied by POSIX.1-2024, and others may have no effect at all. Because of the use of "implementation-defined" in nice and renice, a wide range of implementation strategies is possible.
Austin Group Defect 940 is applied, adding a statement that any pointer object whose representation has all bits set to zero will be interpreted as a null pointer.
Austin Group Defect 1621 is applied, adding this definition.
Austin Group Defect 768 is applied, adding this definition.
An "open file description", as it is currently named, describes how a file is being accessed. What is currently called a "file descriptor" is actually just an identifier or "handle"; it does not actually describe anything.
The following alternate names were discussed:
Austin Group Defect 1784 is applied, changing this definition.
Historical implementations have a concept of an orphaned process, which is a process whose parent process has exited. When job control is in use, it is necessary to prevent processes from being stopped in response to interactions with the terminal after they no longer are controlled by a job control-cognizant program. Because signals generated by the terminal are sent to a process group and not to individual processes, and because a signal may be provoked by a process that is not orphaned, but sent to another process that is orphaned, it is necessary to define an orphaned process group. The definition assumes that a process group will be manipulated as a group and that the job control-cognizant process controlling the group is outside of the group and is the parent of at least one process in the group (so that state changes may be reported via waitpid()). Therefore, a group is considered to be controlled as long as at least one process in the group has a parent that is outside of the process group, but within the session.
This definition of orphaned process groups ensures that a session leader's process group is always considered to be orphaned, and thus it is prevented from stopping in response to terminal signals.
The term "page" is defined to support the description of the behavior of memory mapping for shared memory and memory mapped files, and the description of the behavior of process memory locking. It is not intended to imply that shared memory/file mapping and memory locking are applicable only to "paged" architectures. For the purposes of POSIX.1-2024, whatever the granularity on which an architecture supports mapping or locking, this is considered to be a "page" . If an architecture cannot support the memory mapping or locking functions specified by POSIX.1-2024 on any granularity, then these options will not be implemented on the architecture.
Pathnames historically allowed all bytes except for the <slash> and <NUL> characters. For compatibility with existing file systems, this usage is maintained throughout the standard by noting that a pathname need not be a valid character string in all locales. However, the properties of the portable filename character set are such that a pathname using only those characters and the <slash> is portable in all locales as a character string.
Austin Group Defect 1073 is applied, making it implementation-defined whether the case of exactly two leading <slash> characters is treated specially.
Implementation-defined; see User Database.
There may be more than one directory entry pointing to a given directory in some implementations. The wording here identifies that exactly one of those is the parent directory. In pathname resolution, dot-dot is identified as the way that the unique directory is identified. (That is, the parent directory is the one to which dot-dot points.) In the case of a remote file system, if the same file system is mounted several times, it would appear as if they were distinct file systems (with interesting synchronization properties).
Austin Group Defect 1443 is applied, changing this definition to be inclusive of all uses of shell pattern matching notation.
It proved convenient to define a pipe as a special case of a FIFO, even though historically the latter was not introduced until System III and does not exist at all in 4.3 BSD.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0012 [584] is applied.
The encoding of this character set is not specified—specifically, ASCII is not required. But the implementation must provide a unique character code for each of the printable graphics specified by POSIX.1; see also A.4.9 Filenames.
Situations where characters beyond the portable filename character set (or historically ASCII or the ISO/IEC 646:1991 standard) would be used (in a context where the portable filename character set or the ISO/IEC 646:1991 standard is required by POSIX.1) are expected to be common. Although such a situation renders the use technically non-compliant, mutual agreement among the users of an extended character set will make such use portable between those users. Such a mutual agreement could be formalized as an optional extension to POSIX.1. (Making it required would eliminate too many possible systems, as even those systems using the ISO/IEC 646:1991 standard as a base character set extend their character sets for Western Europe and the rest of the world in different ways.)
Nothing in POSIX.1 is intended to preclude the use of extended characters where interchange is not required or where mutual agreement is obtained. It has been suggested that in several places "should" be used instead of "shall". Because (in the worst case) use of any character beyond the portable filename character set would render the program or data not portable to all possible systems, no extensions are permitted in this context.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0013 [584] is applied.
Austin Group Defect 1122 is applied, adding this definition.
Austin Group Defect 1514 is applied, changing this definition in line with earlier changes to the cross-reference to which it refers.
Austin Group Defect 1428 is applied, adding this definition.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0014 [690] is applied.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/5 is applied, adding fork(), posix_spawn(), posix_spawnp(), and vfork() to the list of functions.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0014 [690] is applied.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/6 is applied, rewording the definition to address the "passive exit" on termination of the last thread or the _Exit() function.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0014 [690] is applied.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
Austin Group Defect 768 is applied, adding this definition.
Austin Group Defect 1466 is applied, changing the terminology used for pseudo-terminal devices.
Austin Group Defect 1449 is applied, adding "(or Decimal-Point Character)".
Austin Group Defect 768 is applied, adding this definition.
Austin Group Defect 850 is applied, adding this entry as a pointer to the Built-In Utility definition.
POSIX.1 does not intend to preclude the addition of structuring data (for example, record lengths) in the file, as long as such data is not visible to an application that uses the features described in POSIX.1.
This definition permits the operation of chroot(), even though that function is not in POSIX.1; see also A.4.8 File Hierarchy.
Implementation-defined.
Commonly used to refer to a mount point; this standard uses the latter.
The definition implies a double meaning for the term. Although a signal is an event, common usage implies that a signal is an identifier of the class of event.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0011 [625] is applied.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0011 [625] is applied.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0015 [896] is applied.
Austin Group Defect 415 is applied, adding this definition.
Austin Group Defect 1583 is applied, clarifying that "special built-in utility" and "special built-in" are equivalent terms.
Austin Group Defect 1493 is applied, expanding this definition to cover uses of the term outside the XSH volume.
Austin Group Defect 1493 is applied, expanding this definition to cover uses of the term outside the XSH volume.
Austin Group Defect 1493 is applied, expanding this definition to cover uses of the term outside the XSH volume.
Austin Group Defect 1371 is applied, updating the stream definition so that it applies to the shell command language as well as the C language.
This concept, with great historical significance to UNIX system users, has been replaced with the notion of appropriate privileges.
The POSIX.1-1990 standard is inconsistent in its treatment of supplementary groups. The definition of supplementary group ID explicitly permits the effective group ID to be included in the set, but wording in the description of the setuid() and setgid() functions states: "Any supplementary group IDs of the calling process remain unchanged by these function calls". In the case of setgid() this contradicts that definition. In addition, some felt that the unspecified behavior in the definition of supplementary group IDs adds unnecessary portability problems. The standard developers considered several solutions to this problem:
The standard developers decided to permit either 2 or 3. The effective group ID is orthogonal to the set of supplementary group IDs, and it is implementation-defined whether getgroups() returns this. If the effective group ID is returned with the set of supplementary group IDs, then all changes to the effective group ID affect the supplementary group set returned by getgroups(). It is permissible to eliminate duplicates from the list returned by getgroups(). However, if a group ID is contained in the set of supplementary group IDs, setting the group ID to that value and then to a different value should not remove that value from the supplementary group IDs.
The definition of supplementary group IDs has been changed to not include the effective group ID. This simplifies permanent rationale and makes the relevant functions easier to understand. The getgroups() function has been modified so that it can, on an implementation-defined basis, return the effective group ID. By making this change, functions that modify the effective group ID do not need to discuss adding to the supplementary group list; the only view into the supplementary group list that the application developer has is through the getgroups() function.
Austin Group Defect 1254 is applied, changing this definition.
Earlier versions of this standard used a variety of terms other than "macro" for many of the constants defined in headers, and it was not clear in which of these cases they were required to be macros or not, or to be pre-processor constants (i.e., usable in #if) or not. In cases where the symbols had a reserved prefix or suffix, there was often inconsistency between whether the prefix/suffix was reserved only for macros or for any use, and whether the term "macro" or a different term was used in the descriptions of the symbols. There were also some unintentional differences from the ISO C standard.
One of the most commonly used terms was "symbolic constant". This has now been designated as the default term to be used wherever appropriate, and a formal definition of the term has been added giving the exact requirements for symbols that are described as symbolic constants.
The standard developers have performed a major rationalization of the header descriptions of symbols with constant values according to the following policy:
The description of the symbol can override individual requirements for symbolic constants; e.g., to specify a non-integer type, or to add a requirement that the symbol is usable in #if preprocessor directives.
Where a constant is required to be a macro but is also allowed to be another type of constant such as an enumeration constant, on implementations which do define it as another type of constant the macro is typically defined as follows:
#define macro_name macro_name
This allows applications to use #ifdef, etc. to determine whether the macro is defined, but the macro is not usable in #if preprocessor directives because the preprocessor will treat the unexpanded word macro_name as having the value zero.
Earlier versions of this standard did not require symbolic links to have attributes such as ownership and a file serial number. This was because the 4.4 BSD implementation did not have them, and it was expected that other implementations may wish to do the same. However, experience with 4.4 BSD has shown that symbolic links implemented in this way cause problems for users and application developers, and later BSD systems have reverted to using inodes to implement symbolic links. Allowing no-inode symbolic links also caused problems in the standard. For example, leaving the st_ino value for symbolic links unspecified meant that the common technique of comparing the st_dev and st_ino values for two pathnames to see if they refer to the same file could only be used with stat() in conforming applications and not with lstat(). The standard now requires symbolic links to have meaningful values for the same struct stat fields as regular files, except for the file mode bits in st_mode. Historically, the file mode bits were unused (the contents of a symbolic link could always be read), but implementations differed as to whether the file mode bits (as returned in st_mode or reported by ls -l) were set according to the umask or just to a fixed value such as 0777. Accordingly, the standard requires the file mode bits to be ignored by readlink() and when a symbolic link is followed during pathname resolution, but leaves the corresponding part of the value returned in st_mode unspecified.
Historical implementations were followed when determining which interfaces should apply to symbolic links. Interfaces that historically followed symbolic links include chmod(), stat(), and utime(). Interfaces that historically did not follow symbolic links include lstat(), rename(), remove(), rmdir(), and unlink(). For chown() and link(), historical implementations differed. POSIX.1-2024 inherited the lchown() function from the Single UNIX Specification, Version 2, and therefore requires chown() to follow symbolic links. Earlier versions of this standard required link() to follow symbolic links, but with the addition of the linkat() function (which has a flag to indicate whether to follow symbolic links), both behaviors are now allowed for link().
When the final component of a pathname is a symbolic link, the standard requires that a trailing <slash> causes the link to be followed. This is the behavior of historical implementations. For example, for /a/b and /a/b/, if /a/b is a symbolic link to a directory, then /a/b refers to the symbolic link, and /a/b/ refers to the directory to which the symbolic link points.
Because a symbolic link and its referenced object coexist in the file system name space, confusion can arise in distinguishing between the link itself and the referenced object. Historically, utilities and system calls have adopted their own link following conventions in a somewhat ad hoc fashion. Rules for a uniform approach are outlined here, although historical practice has been adhered to as much as was possible. To promote consistent system use, user-written utilities are encouraged to follow these same rules.
Symbolic links are handled either by operating on the link itself, or by operating on the object referenced by the link. In the latter case, an application or system call is said to "follow" the link. Symbolic links may reference other symbolic links, in which case links are dereferenced until an object that is not a symbolic link is found, a symbolic link that references a file that does not exist is found, or a loop is detected. (Current implementations do not detect loops, but have a limit on the number of symbolic links that they will dereference before declaring it an error.)
There are four domains for which default symbolic link policy is established in a system. In almost all cases, there are utility options that override this default behavior. The four domains are as follows:
First Domain
The first domain is considered in earlier rationale.
Second Domain
The reason this category is restricted to utilities that are not traversing the file hierarchy is that some standard utilities take an option that specifies a hierarchical traversal, but by default operate on the arguments themselves. Generally, users specifying the option for a file hierarchy traversal wish to operate on a single, physical hierarchy, and therefore symbolic links, which may reference files outside of the hierarchy, are ignored. For example, chown owner file is a different operation from the same command with the -R option specified. In this example, the behavior of the command chown owner file is described here, while the behavior of the command chown -R owner file is described in the third and fourth domains.
The general rule is that the utilities in this category follow symbolic links named as arguments.
Exceptions in the second domain are:
All other standard utilities, when not traversing a file hierarchy, always follow symbolic links named as arguments.
Historical practice is that the -h option is specified if standard utilities are to act upon symbolic links
instead of upon their targets. Examples of commands that have historically had a -h option for this purpose are the chgrp, chown, file, and test utilities.
Third Domain
The third domain is symbolic links, referencing files not of type directory, specified to utilities that are performing a traversal of a file hierarchy. (This includes symbolic links specified as command line pathname arguments or encountered during the traversal.)
The intention of the Shell and Utilities volume of POSIX.1-2024 is that the operation that the utility is performing is applied to the symbolic link itself, if that operation is applicable to symbolic links. If the operation is not applicable to symbolic links, the symbolic link should be ignored. Specifically, by default, no change should be made to the file referenced by the symbolic link.
Fourth Domain
The fourth domain is symbolic links referencing files of type directory, specified to utilities that are performing a traversal of a file hierarchy. (This includes symbolic links specified as command line pathname arguments or encountered during the traversal.)
Most standard utilities do not, by default, indirect into the file hierarchy referenced by the symbolic link. (The Shell and Utilities volume of POSIX.1-2024 uses the informal term "physical walk" to describe this case. The case where the utility does indirect through the symbolic link is termed a "logical walk".)
There are three reasons for the default to be a physical walk:
However, the standard developers agreed to leave it unspecified to achieve consensus.
As consistently as possible, users may cause standard utilities performing a file hierarchy traversal to follow any symbolic links named on the command line, regardless of the type of file they reference, by specifying the -H (for half logical) option. This option is intended to make the command line name space look like the logical name space.
As consistently as possible, users may cause standard utilities performing a file hierarchy traversal to follow any symbolic links named on the command line as well as any symbolic links encountered during the traversal, regardless of the type of file they reference, by specifying the -L (for logical) option. This option is intended to make the entire name space look like the logical name space.
For consistency, implementors are encouraged to use the -P (for "physical") flag to specify the physical walk in utilities that do logical walks by default for whatever reason.
When one or more of the -H, -L, and -P flags can be specified, the last one specified determines the behavior of the utility. This permits users to alias commands so that the default behavior is a logical walk and then override that behavior on the command line.
Exceptions in the Third and Fourth Domains
The ls and rm utilities are exceptions to these rules. The rm utility never follows symbolic links and does not support the -H, -L, or -P options. Some historical versions of ls always followed symbolic links given on the command line whether the -L option was specified or not. Historical versions of ls did not support the -H option. In POSIX.1-2024, unless one of the -H or -L options is specified, the ls utility only follows symbolic links to directories that are given as operands. The ls utility does not support the -P option.
The Shell and Utilities volume of POSIX.1-2024 requires that the standard utilities ls, find, and pax detect infinite loops when doing logical walks; that is, a directory, or more commonly a symbolic link, that refers to an ancestor in the current file hierarchy. If the file system itself is corrupted, causing the infinite loop, it may be impossible to recover. Because find and ls are often used in system administration and security applications, they should attempt to recover and continue as best as they can. The pax utility should terminate because the archive it was creating is by definition corrupted. Other, less vital, utilities should probably simply terminate as well. Implementations are strongly encouraged to detect infinite loops in all utilities.
Historical practice is shown in Historical Practice for Symbolic Links. The heading SVID3 stands for the Third Edition of the System V Interface Definition.
Historically, several shells have had built-in versions of the pwd utility. In some of these shells, pwd reported the physical path, and in others, the logical path. Implementations of the shell corresponding to POSIX.1-2024 must report the logical path by default.
The cd command is required, by default, to treat the filename dot-dot
logically. Implementors are required to support the -P flag in cd so that users
can have their current environment handled physically. In 4.3 BSD, chgrp during tree
traversal changed the group of the symbolic link, not the target. Symbolic links in 4.4 BSD did not have owner,
group, mode, or other standard UNIX system file attributes.
Utility |
SVID3 |
4.3 BSD |
4.4 BSD |
POSIX |
Comments |
---|---|---|---|---|---|
cd |
|
|
|
-L |
Treat ".." logically. |
cd |
|
|
|
-P |
Treat ".." physically. |
chgrp |
|
|
-H |
-H |
Follow command line symlinks. |
chgrp |
|
|
-h |
-L |
Follow symlinks. |
chgrp |
-h |
|
|
-h |
Affect the symlink. |
chmod |
|
|
|
|
Affect the symlink. |
chmod |
|
|
-H |
|
Follow command line symlinks. |
chmod |
|
|
-h |
|
Follow symlinks. |
chown |
|
|
-H |
-H |
Follow command line symlinks. |
chown |
|
|
-h |
-L |
Follow symlinks. |
chown |
-h |
|
|
-h |
Affect the symlink. |
cp |
|
|
-H |
-H |
Follow command line symlinks. |
cp |
|
|
-h |
-L |
Follow symlinks. |
cpio |
-L |
|
-L |
|
Follow symlinks. |
du |
|
|
-H |
-H |
Follow command line symlinks. |
du |
|
|
-h |
-L |
Follow symlinks. |
file |
-h |
|
|
-h |
Affect the symlink. |
find |
|
|
-H |
-H |
Follow command line symlinks. |
find |
|
|
-h |
-L |
Follow symlinks. |
find |
-follow |
|
-follow |
|
Follow symlinks. |
ln |
-s |
-s |
-s |
-s |
Create a symbolic link. |
ls |
-L |
-L |
-L |
-L |
Follow symlinks. |
ls |
|
|
|
-H |
Follow command line symlinks. |
mv |
|
|
|
|
Operates on the symlink. |
pax |
|
|
-H |
-H |
Follow command line symlinks. |
pax |
|
|
-h |
-L |
Follow symlinks. |
pwd |
|
|
|
-L |
Printed path may contain symlinks. |
pwd |
|
|
|
-P |
Printed path will not contain symlinks. |
rm |
|
|
|
|
Operates on the symlink. |
tar |
|
|
-H |
|
Follow command line symlinks. |
tar |
|
-h |
-h |
|
Follow symlinks. |
test |
-h |
|
-h |
-h |
Affect the symlink. |
Austin Group Defect 672 is applied, clarifying how this definition applies to directories, and that it does not apply to symbolic links.
Those signals that may be generated synchronously include SIGABRT, SIGBUS, SIGILL, SIGFPE, SIGPIPE, and SIGSEGV.
Any signal sent via the raise() function or a kill() function targeting the current process is also considered synchronous.
The distinction between a "system call" and a "library routine" is an implementation detail that may differ between implementations and has thus been excluded from POSIX.1.
See "Interface, Not Implementation" in the Preface.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/7 is applied, changing from "An implementation-defined device" to "A device".
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/9 is applied, rewording the definition to reference the existing definitions for "group database" and "user database".
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/8 is applied, rewording the definition to remove the requirement for an implementation to define the object.
A "system reboot" is an event initiated by an unspecified circumstance that causes all processes (other than special system processes) to be terminated in an implementation-defined manner, after which any changes to the state and contents of files created or written to by a Conforming POSIX.1 Application prior to the event are implementation-defined.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/10 is applied, changing "An implementation-defined sequence of events" to "An unspecified sequence of events".
These terms specify that for synchronized read operations, pending writes must be successfully completed before the read operation can complete. This is motivated by two circumstances. Firstly, when synchronizing processes can access the same file, but not share common buffers (such as for a remote file system), this requirement permits the reading process to guarantee that it can read data written remotely. Secondly, having data written synchronously is insufficient to guarantee the order with respect to a subsequent write by a reading process, and thus this extra read semantic is necessary.
Austin Group Defect 1122 is applied, adding this definition.
The term "text file" does not prevent the inclusion of control or other non-printable characters (other than NUL). Therefore, standard utilities that list text files as inputs or outputs are either able to process the special characters or they explicitly describe their limitations within their individual descriptions. The definition of "text file" has caused controversy. The only difference between text and binary files is that text files have lines of less than {LINE_MAX} bytes, with no NUL characters, each terminated by a <newline>. The definition allows a file with a single <newline>, or a totally empty file, to be called a text file. If a file ends with an incomplete line it is not strictly a text file by this definition. The <newline> referred to in POSIX.1-2024 is not some generic line separator, but a single character; files created on systems where they use multiple characters for ends of lines are not portable to all conforming systems without some translation process unspecified by POSIX.1-2024.
POSIX.1-2024 defines a live thread to be a flow of control within a process. Each thread has a minimal amount of
private state; most of the state associated with a process is shared among all of the threads in the process. While most
multi-thread extensions to POSIX have taken this approach, others have made different decisions.
Threads need to share resources in order to cooperate. Memory has to be widely shared between threads in order for the threads to cooperate at a fine level of granularity. Threads keep data structures and the locks protecting those data structures in shared memory. For a data structure to be usefully shared between threads, such structures should not refer to any data that can only be interpreted meaningfully by a single thread. Thus, any system resources that might be referred to in data structures need to be shared between all threads. File descriptors, pathnames, and pointers to stack variables are all things that programmers want to share between their threads. Thus, the file descriptor table, the root directory, the current working directory, and the address space have to be shared.
Library implementations are possible as long as the effective behavior is as if system services invoked by one thread do not suspend other threads. This may be difficult for some library implementations on systems that do not provide asynchronous facilities.
See B.2.9 Threads for additional rationale.
Austin Group Defect 792 is applied, changing this definition.
See B.2.9.2 Thread IDs for additional rationale.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
Austin Group Defect 792 is applied, adding this definition.
Austin Group Defect 792 is applied, adding this definition.
All functions required by POSIX.1-2024 need to be thread-safe; see A.4.22 Thread-Safety and B.2.9.1 Thread-Safety for additional rationale.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
There are no references in POSIX.1-2024 to a "passwd file" or a "group file", and there is no requirement that the group or passwd databases be kept in files containing editable text. Many large timesharing systems use passwd databases that are hashed for speed. Certain security classifications prohibit certain information in the passwd database from being publicly readable.
The term "encoded" is used instead of "encrypted" in order to avoid the implementation connotations (such as reversibility or use of a particular algorithm) of the latter term.
The getgrent(), setgrent(), endgrent(), getpwent(), setpwent(), and endpwent() functions are not included as part of the base standard because they provide a linear database search capability that is not generally useful (the getpwuid(), getpwnam(), getgrgid(), and getgrnam() functions are provided for keyed lookup) and because in certain distributed systems, especially those with different authentication domains, it may not be possible or desirable to provide an application with the ability to browse the system databases indiscriminately. They are provided on XSI-conformant systems due to their historical usage by many existing applications.
A change from historical implementations is that the structures used by these functions have fields of the types gid_t and uid_t, which are required to be defined in the <sys/types.h> header. POSIX.1-2024 requires implementations to ensure that these types are defined by inclusion of <grp.h> and <pwd.h>, respectively, without imposing any name space pollution or errors from redefinition of types.
POSIX.1-2024 is silent about the content of the strings containing user or group names. These could be digit
strings. POSIX.1-2024 is also silent as to whether such digit strings bear any relationship to the corresponding (numeric) user or
group ID.
Database Access
The thread-safe versions of the user and group database access functions return values in user-supplied buffers instead of possibly using static data areas that may be overwritten by each call.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/11 is applied, removing the words "of implementation-defined format".
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0016 [511] is applied.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0017 [584] is applied.
The term "virtual processor" was chosen as a neutral term describing all kernel-level schedulable entities, such as processes, Mach tasks, or lightweight processes. Implementing threads using multiple processes as virtual processors, or implementing multiplexed threads above a virtual processor layer, should be possible, provided some mechanism has also been implemented for sharing state between processes or virtual processors. Many systems may also wish to provide implementations of threads on systems providing "shared processes" or "variable-weight processes". It was felt that exposing such implementation details would severely limit the type of systems upon which the threads interface could be supported and prevent certain types of valid implementations. It was also determined that a virtual processor interface was out of the scope of the Rationale (Informative) volume of POSIX.1-2024.
Austin Group Defect 1163 is applied, clarifying the definition of white space and adding definitions of white-space byte, white-space character, and white-space wide character.
This is included to allow POSIX.1-2024 to be adopted as an IEEE standard and a standard of The Open Group, serving both POSIX and the Single UNIX Specification in a core set of volumes.
When POSIX.1 and the Single UNIX Specification were merged, the term "XSI" had been used for over 10 years in connection with the XPG series and the first and second versions of the base volumes of the Single UNIX Specification. The XSI margin code was introduced to denote the extended or more restrictive semantics beyond POSIX that are applicable to UNIX systems.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0018 [690] is applied.
Austin Group Defect 792 is applied, adding this definition.
The general concepts are similar in nature to the definitions section, with the exception that a term defined in general concepts can contain normative requirements.
Case-insensitive matching is defined in this standard in terms of a simple algorithm whereby, for each character in the string to be matched, if the character is uppercase then the lowercase equivalent (if any) is also checked for a match, and if the character is lowercase then the uppercase equivalent (if any) is also checked for a match. It is described this way to make the expected behavior easier to understand; however, implementations may internally use more sophisticated algorithms to improve efficiency, provided that the result is the same as the simple algorithm would produce.
Austin Group Defect 1031 is applied, adding case insensitive comparisons.
There is no additional rationale provided for this section.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0019 [934] is applied.
Austin Group Defect 940 is applied, removing text that was conditional on the all-zero bit pattern of a pointer object being a null pointer, as this is now mandated.
Earlier versions of this standard did not make clear that directory modifications are performed atomically and serially, although that is the historical behavior and was always intended.
Austin Group Defect 672 is applied, adding this subsection.
There is no additional rationale provided for this section.
Allowing an implementation to define extended security controls enables the use of POSIX.1-2024 in environments that require different or more rigorous security than that provided in POSIX.1. Extensions are allowed in two areas: privilege and file access permissions. The semantics of these areas have been defined to permit extensions with reasonable, but not exact, compatibility with all existing practices. For example, the elimination of the superuser definition precludes identifying a process as privileged or not by virtue of its effective user ID.
A process should not try to anticipate the result of an attempt to access data by a priori use of these rules. Rather, it should make the attempt to access data and examine the return value (and possibly errno as well), or use access(). An implementation may include other security mechanisms in addition to those specified in POSIX.1, and an access attempt may fail because of those additional mechanisms, even though it would succeed according to the rules given in this section. (For example, the user's security level might be lower than that of the object of the access attempt.) The supplementary group IDs provide another reason for a process to not attempt to anticipate the result of an access attempt.
Since the current standard does not specify a method for opening a directory for searching, it is unspecified whether search permission on the fd argument to openat() and related functions is based on whether the directory was opened with search mode or on the current permissions allowed by the directory at the time a search is performed. When there is existing practice that supports opening directories for searching, it is expected that these functions will be modified to specify that the search permissions will be granted based on the file access modes of the directory's file descriptor identified by fd, and not on the mode of the directory at the time the directory is searched.
Though the file hierarchy is commonly regarded to be a tree, POSIX.1 does not define it as such for three reasons:
Historically, certain filenames and pathnames have been reserved. This list includes core, /etc/passwd, and so on. Conforming applications should avoid these.
Most historical implementations prohibit case folding in filenames; that is, treating uppercase and lowercase alphabetic characters as identical. However, some consider case folding desirable:
Variants, such as maintaining case distinctions in filenames, but ignoring them in comparisons, have been suggested. Methods of allowing escaped characters of the case opposite the default have been proposed.
Many reasons have been expressed for not allowing case folding, including:
Two proposals were entertained regarding case folding in filenames:
The consensus selected the first proposal. Otherwise, a conforming application would have to assume that case folding would occur when it was not wanted, but that it would not occur when it was wanted.
Filenames should be constructed from the portable filename character set because the use of other characters can be confusing or ambiguous in certain contexts. (For example, the use of a <colon> (':') in a pathname could cause ambiguity if that pathname were included in a PATH definition.)
The constraint on use of the <hyphen-minus> character as the first character of a portable filename is a constraint on application behavior and not on implementations, since applications might not work as expected when such a filename is passed as a command line argument.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0020 [584] is applied.
Earlier versions of this standard did not specify the behavior of aio_fsync(), fdatasync(), or fsync() on directories, nor did they specify constraints on the underlying storage in the absence of calls to aio_fsync(), fdatasync(), or fsync().
Although directory operations are atomic and serializable, they are not necessarily durable. An application that requires a directory modification to be durable should use fdatasync() or fsync() (or aio_fsync()) on the directory. However, the intention of the requirements for directory modifications is that most applications should not need to do this. For example, a common method of updating a file is to create a new temporary file, call fdatasync() or fsync() to synchronize the new file, and then use rename() to replace the old file with the new file. If a crash occurs after the rename(), then the file being updated will have either its old contents or its new contents on the storage device when the system is rebooted. An application needs to synchronize the directory only if it wants to be sure the updated file will have its new contents on the storage device.
Some operations, such as rename(), can affect more than one directory, whereas synchronization calls such as fsync() can affect at most one directory at a time. Two calls to fsync() may be needed after a rename() to ensure its durability.
If the file system is inconsistent after a crash it is usually automatically checked and repaired when the system is rebooted, or can be repaired manually using a utility such as fsck.
If an unrecoverable I/O error occurs when cache is transferred to storage, this standard provides no way for applications to discover the error reliably. Implementations are encouraged to report such errors on subsequent reads of the storage.
Austin Group Defect 672 is applied, adding this subsection.
This section reflects the actions of historical implementations. The times are not updated immediately, but are only marked for update by the functions. An implementation may update these times immediately.
The accuracy of the time update values is intentionally left unspecified so that systems can control the bandwidth of a possible covert channel.
The wording was carefully chosen to make it clear that there is no requirement that the conformance document contain information that might incidentally affect file timestamps. Any function that performs pathname resolution might update several last data access timestamps. Functions such as getpwnam() and getgrnam() might update the last data access timestamp of some specific file or files. It is intended that these are not required to be documented in the conformance document, but they should appear in the system documentation.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0021 [626] is applied.
There is no additional rationale provided for this section.
The methods used to measure the execution time of processes and threads, and the precision of these measurements, may vary considerably depending on the software architecture of the implementation, and on the underlying hardware. Implementations can also make tradeoffs between the scheduling overhead and the precision of the execution time measurements. POSIX.1-2024 does not impose any requirement on the accuracy of the execution time; it instead specifies that the measurement mechanism and its precision are implementation-defined.
Austin Group Defect 1302 is applied, adding the Memory Ordering subsection, adapted from the ISO/IEC 9899:2018 standard.
There is no additional rationale provided for this section.
In older multi-processors, access to memory by the processors was strictly multiplexed. This meant that a processor executing program code interrogates or modifies memory in the order specified by the code and that all the memory operation of all the processors in the system appear to happen in some global order, though the operation histories of different processors are interleaved arbitrarily. The memory operations of such machines are said to be sequentially consistent. In this environment, threads can synchronize using ordinary memory operations. For example, a producer thread and a consumer thread can synchronize access to a circular data buffer as follows:
int rdptr = 0; int wrptr = 0; data_t buf[BUFSIZE];
Thread 1: while (work_to_do) { int next;
buf[wrptr] = produce(); next = (wrptr + 1) % BUFSIZE; while (rdptr == next) ; wrptr = next; }
Thread 2: while (work_to_do) { while (rdptr == wrptr) ; consume(buf[rdptr]); rdptr = (rdptr + 1) % BUFSIZE; }
In modern multi-processors, these conditions are relaxed to achieve greater performance. If one processor stores values in location A and then location B, then other processors loading data from location B and then location A may see the new value of B but the old value of A. The memory operations of such machines are said to be weakly ordered. On these machines, the circular buffer technique shown in the example will fail because the consumer may see the new value of wrptr but the old value of the data in the buffer. In such machines, synchronization can only be achieved through the use of special instructions that enforce an order on memory operations. Most high-level language compilers only generate ordinary memory operations to take advantage of the increased performance. They usually cannot determine when memory operation order is important and generate the special ordering instructions. Instead, they rely on the programmer to use synchronization primitives correctly to ensure that modifications to a location in memory are ordered with respect to modifications and/or access to the same location in other threads. Access to read-only data need not be synchronized. The resulting program is said to be data race-free.
Synchronization is still important even when accessing a single primitive variable (for example, an integer). On machines where the integer may not be aligned to the bus data width or be larger than the data width, a single memory load may require multiple memory cycles. This means that it may be possible for some parts of the integer to have an old value while other parts have a newer value. On some processor architectures this cannot happen, but portable programs cannot rely on this.
In summary, a portable multi-threaded program, or a multi-process program that shares writable memory between processes, has to use the synchronization primitives to synchronize data access. It cannot rely on modifications to memory being observed by other threads in the order written in the application or even on modification of a single variable being seen atomically.
Conforming applications may only use the functions listed to synchronize threads of control with respect to memory access. There are many other candidates for functions that might also be used. Examples are: signal sending and reception, or pipe writing and reading. In general, any function that allows one thread of control to wait for an action caused by another thread of control is a candidate. POSIX.1-2024 does not require these additional functions to synchronize memory access since this would imply the following:
Formal definitions of the memory model were rejected as unreadable by the vast majority of programmers. In addition, most of the formal work in the literature has concentrated on the memory as provided by the hardware as opposed to the application programmer through the compiler and runtime system. It was believed that a simple statement intuitive to most programmers would be most effective. POSIX.1-2024 defines functions that can be used to synchronize access to memory, but it leaves open exactly how one relates those functions to the semantics of each function as specified elsewhere in POSIX.1-2024. POSIX.1-2024 also does not make a formal specification of the partial ordering in time that the functions can impose, as that is implied in the description of the semantics of each function. It simply states that the programmer has to ensure that modifications do not occur "simultaneously" with other access to a memory location.
IEEE Std 1003.1-2001/Cor 1-2002, item XBD/TC1/D6/4 is applied, adding a new paragraph beneath the table of functions: "The pthread_once() function shall synchronize memory for the first call in each thread for a given pthread_once_t object.".
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0022 [863] is applied.
Austin Group Defect 1216 is applied, adding pthread_cond_clockwait(), pthread_mutex_clocklock(), pthread_rwlock_clockrdlock(), pthread_rwlock_clockwrlock(), and sem_clockwait() to the list of functions that synchronize memory.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
Austin Group Defect 1426 is applied, clarifying under what conditions the functions named in this section are required to synchronize memory, and adding pthread_mutex_setprioceiling() to the named functions.
Austin Group Defect 1625 is applied, adding waitid() to the list of functions that synchronize memory with respect to other threads on all successful calls.
It is necessary to differentiate between the definition of pathname and the concept of pathname resolution with respect to the handling of trailing <slash> characters. By specifying the behavior here, it is not possible to provide an implementation that is conforming but extends all interfaces that handle pathnames to also handle strings that are not legal pathnames (because they have trailing <slash> characters).
Pathnames that end with one or more trailing <slash> characters must refer to directory paths. Earlier versions of this standard were not specific about the distinction between trailing <slash> characters on files and directories, and both were permitted.
Two types of implementation have been prevalent; those that ignored trailing <slash> characters on all pathnames regardless, and those that permitted them only on existing directories.
An earlier version of this standard required that a pathname with a trailing <slash> character be treated as if it had a trailing "/." everywhere. This specification was ambiguous. In situations where the intent was that the application wanted to require the implementation to accept the pathname only if it named a directory (existing or to be created as a result of the call performing pathname resolution), literally adding a '.' after the trailing <slash> could be interpreted to require use of that pathname to fail. Some of the uses that created ambiguous requirements included mkdir("newdir/") and rmdir("existing-dir/"). POSIX.1-2024 requires that a pathname with a trailing <slash> be rejected unless it refers to a file that is a directory or to a file that is to be created as a directory. The rename() function and the mv utility further specify that a trailing <slash> cannot be used on a pathname naming a file that does not exist when used as the last argument to rename() or renameat(), or as the last operand to mv.
Note that this change does not break any conforming applications; since there were two different types of implementation, no application could have portably depended on either behavior. This change does however require some implementations to be altered to remain compliant. Substantial discussion over a three-year period has shown that the benefits to application developers outweighs the disadvantages for some vendors.
On a historical note, some early applications automatically appended a '/' to every path. Rather than fix the applications, the system implementation was modified to accept this behavior by ignoring any trailing <slash>.
Each directory has exactly one parent directory which is represented by the name dot-dot in the first directory. No other directory, regardless of linkages established by symbolic links, is considered the parent directory by POSIX.1-2024.
There are two general categories of interfaces involving pathname resolution: those that follow the symbolic link, and those that do not. There are several exceptions to this rule; for example, open(path,O_CREAT|O_EXCL) will fail when path names a symbolic link. However, in all other situations, the open() function will follow the link.
What the filename dot-dot refers to relative to the root directory is implementation-defined. In Version 7 it refers to the root directory itself; this is the behavior mentioned in POSIX.1-2024. In some networked systems the construction /../hostname/ is used to refer to the root directory of another host, and POSIX.1 permits this behavior.
Other networked systems use the construct //hostname for the same purpose; that is, a double initial <slash> is used. There is a potential problem with existing applications that create full pathnames by taking a trunk and a relative pathname and making them into a single string separated by '/', because they can accidentally create networked pathnames when the trunk is '/'. This practice is not prohibited because such applications can be made to conform by simply changing to use "//" as a separator instead of '/':
Application developers should avoid generating pathnames that start with "//". Implementations are strongly encouraged to avoid using this special interpretation since a number of applications currently do not follow this practice and may inadvertently generate "//...".
The term "root directory" is only defined in POSIX.1 relative to the process. In some implementations, there may be no absolute root directory. The initialization of the root directory of a process is implementation-defined.
When the standard says: "Pathname resolution for a given pathname shall yield the same results when used by any interface in POSIX.1-2024 as long as there are no changes to any files evaluated during pathname resolution for the given pathname between resolutions", this applies to absolute pathnames or to relative pathnames from the same current working directory. Using the same relative pathname from two different working directories may yield different results.
Earlier versions of this standard were unclear as to whether a pathname was required to be a character string or just a string. This standard is now clear that filenames are just strings, and that pathname processing is locale-independent.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0023 [541,649,825] and XBD/TC2-2008/0024 [825] are applied.
Austin Group Defect 1603 is applied, making a wording improvement related to symbolic links to directories.
There is no additional rationale provided for this section.
There is no additional rationale provided for this section.
Coordinated Universal Time (UTC) includes leap seconds. However, in POSIX time (seconds since the Epoch), leap seconds are ignored (not applied) to provide an easy and compatible method of computing time differences. Broken-down POSIX time is therefore not necessarily UTC, despite its appearance.
As of December 2007, 23 leap seconds had been added to UTC since the Epoch, 1 January, 1970. Historically, one leap second is added every 15 months on average, so this offset can be expected to grow with time.
Most systems' notion of "time" is that of a continuously increasing value, so this value should increase even during leap seconds. However, not only do most systems not keep track of leap seconds, but most systems are probably not synchronized to any standard time reference. Therefore, it is inappropriate to require that a time represented as seconds since the Epoch precisely represent the number of seconds between the referenced time and the Epoch.
It is sufficient to require that applications be allowed to treat this time as if it represented the number of seconds between the referenced time and the Epoch. It is the responsibility of the vendor of the system, and the administrator of the system, to ensure that this value represents the number of seconds between the referenced time and the Epoch as closely as necessary for the application being run on that system.
It is important that the interpretation of time names and seconds since the Epoch values be consistent across conforming systems; that is, it is important that all conforming systems interpret "536457599 seconds since the Epoch" as 59 seconds, 59 minutes, 23 hours 31 December 1986, regardless of the accuracy of the system's idea of the current time. The expression is given to ensure a consistent interpretation, not to attempt to specify the calendar. The relationship between tm_yday and the day of week, day of month, and month is in accordance with the Gregorian calendar, and so is not specified in POSIX.1.
Consistent interpretation of seconds since the Epoch can be critical to certain types of distributed applications that rely on such timestamps to synchronize events. The accrual of leap seconds in a time standard is not predictable. The number of leap seconds since the Epoch will likely increase. POSIX.1 is more concerned about the synchronization of time between applications of astronomically short duration.
Note that tm_yday is zero-based, not one-based, so the day number in the example above is 364. Note also that the division is an integer division (discarding remainder) as in the C language.
Note also that the meaning of gmtime(), localtime(), and mktime() is specified in terms of this expression. However, the ISO C standard computes tm_yday from tm_mday, tm_mon, and tm_year in mktime(). Because it is stated as a (bidirectional) relationship, not a function, and because the conversion between month-day-year and day-of-year dates is presumed well known and is also a relationship, this is not a problem.
The number of seconds since the epoch overflows a signed 32-bit integer in 2038. This standard requires that time_t is an integer type with a width of at least 64 bits (in conforming programming environments). The requirement that time_t is an integer type is an additional constraint beyond the ISO C standard, which allows a real-floating time_t. Implementation practice has shown that much existing code is unprepared to deal with a floating-point time_t, and that use of struct timespec is a more uniform way to provide sub-second time manipulation within applications.
See also Epoch.
The topic of whether seconds since the Epoch should account for leap seconds has been debated on a number of occasions, and each time consensus was reached (with acknowledged dissent each time) that the majority of users are best served by treating all days identically. (That is, the majority of applications were judged to assume a single length—as measured in seconds since the Epoch—for all days. Thus, leap seconds are not applied to seconds since the Epoch.) Those applications which do care about leap seconds can determine how to handle them in whatever way those applications feel is best. This was particularly emphasized because there was disagreement about what the best way of handling leap seconds might be. It is a practical impossibility to mandate that a conforming implementation must have a fixed relationship to any particular official clock (consider isolated systems, or systems performing "reruns" by setting the clock to some arbitrary time).
Note that as a practical consequence of this, the length of a second as measured by some external standard is not specified. This unspecified second is nominally equal to an International System (SI) second in duration. Applications must be matched to a system that provides the particular handling of external time in the way required by the application.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/12 is applied, making an editorial correction to the paragraph commencing "How any changes to the value of seconds ...".
Austin Group Defect 1627 is applied, clarifying that the relationship between the actual date and time in Coordinated Universal Time, as determined by the International Earth Rotation Service, and the system's current value for seconds since the Epoch is unspecified.
Austin Group Defect 502 is applied, clarifying the range of values that an XSI semaphore can have.
Austin Group Defect 1116 is applied, removing a reference to the Semaphores option that existed in earlier versions of this standard.
POSIX systems interact with their physical environment using a variety of devices (such as analog-digital converters, digital-analog converters, counters, and video graphic equipment), which provide a set of services that cannot be fully utilized in terms of read and/or write semantics. Traditional practice uses a single function, called ioctl(), to encapsulate all the control operations on the different devices connected to the system, both special or common devices. The POSIX.1-1988 standard developers decided not to standardize this interface because it was not type safe, it had a variable number of parameters, and it had behaviors that could not be specified by the standard because they were driver-dependent. Instead, the POSIX.1-1988 standard defined a device-specific application program interface (API) for a common class of drivers, Terminals. Later, The Single UNIX Specification, Version 1 included the ioctl() function, but restricted it to control of STREAMS devices.
Although the POSIX.1-1988 standard's solution for common classes of devices is the best from the point of view of application portability, there is still a need for a way to interact with special, or even common devices, for which developing a full standard API is not practical. The device control option standardized in POSIX.26 and now included in this standard is a general method for interfacing to the widest possible range of devices, through a new service to pass control information and commands between the application and the device drivers.
A driver for a special device will normally not be portable between POSIX implementations, but an application that uses such a driver can be made portable if all functions calling the driver are well defined and standardized. Users and integrators of realtime systems often add drivers for special devices, and a standardized function format for interfacing with these devices greatly simplifies this process.
Austin Group Defect 729 is applied, adding this subsection.
Where the interface of a function required by POSIX.1-2024 precludes thread-safety, an alternate thread-safe form is provided. The names of these thread-safe forms are the same as the non-thread-safe forms with the addition of the suffix "_r". The suffix "_r" is historical, where the 'r' stood for "reentrant".
In some cases, thread-safety is provided by restricting the arguments to an existing function.
See also B.2.9.1 Thread-Safety.
It is intended that undeserved underflow and inexact floating-point exceptions are raised only if avoiding them would be too costly.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0025 [543] is applied.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
There is no additional rationale provided for this section.
There is no additional rationale provided for this section.
Austin Group Defect 351 is applied, adding a requirement relating to declaration utilities.
The notation for spaces allows some flexibility for application output. Note that an empty character position in format represents one or more <blank> characters on the output (not white space, which can include <newline> characters). Therefore, another utility that reads that output as its input must be prepared to parse the data using scanf(), awk, and so on. The 'Δ' character is used when exactly one <space> is output.
The treatment of integers and spaces is different from the printf() function in that they can be surrounded with <blank> characters. This was done so that, given a format such as:
"%d\n",<foo>
the implementation could use a printf() call such as:
printf("%6d\n", foo);
and still conform. This notation is thus somewhat like scanf() in addition to printf().
The printf() function was chosen as a model because most of the standard developers were familiar with it. One difference from the C function printf() is that the l and h conversion specifier characters are not used. As expressed by the Shell and Utilities volume of POSIX.1-2024, there is no differentiation between decimal values for type int, type long, or type short. The conversion specifications %d or %i should be interpreted as an arbitrary length sequence of digits. Also, no distinction is made between single precision and double precision numbers (float or double in C). These are simply referred to as floating-point numbers.
Many of the output descriptions in the Shell and Utilities volume of POSIX.1-2024 use the term "line", such as:
"%s", <input line>
Since the definition of line includes the trailing <newline> already, there is no need to include a '\n' in the format; a double <newline> would otherwise result.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0026 [584] is applied.
Austin Group Defect 1205 is applied, changing the description of the % conversion specifier.
Austin Group Defect 1687 is applied, clarifying the references to <blank> characters to specify they are from the portable character set.
The portable character set is listed in full so there is no dependency on the ISO/IEC 646:1991 standard (or historically ASCII) encoded character set, although the set is identical to the characters defined in the International Reference version of the ISO/IEC 646:1991 standard.
POSIX.1-2024 poses no requirement that multiple character sets or codesets be supported, leaving this as a marketing differentiation for implementors. Although multiple charmap files are supported, it is the responsibility of the implementation to provide the file(s); if only one is provided, only that one will be accessible using the localedef -f option.
The statement about invariance in codesets for the portable character set is worded to avoid precluding implementations where multiple incompatible codesets are available (for instance, ASCII and EBCDIC). The standard utilities cannot be expected to produce predictable results if they access portable characters that vary on the same implementation.
Not all character sets need include the portable character set, but each locale must include it. For example, a Japanese-based locale might be supported by a mixture of character sets: JIS X 0201 Roman (a Japanese version of the ISO/IEC 646:1991 standard), JIS X 0208, and JIS X 0201 Katakana. Not all of these character sets include the portable characters, but at least one does (JIS X 0201 Roman).
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0027 [584,967] and XBD/TC2-2008/0028 [745] are applied.
Encoding mechanisms based on single shifts, such as the EUC encoding used in some Asian and other countries, can be supported via the current charmap mechanism. With single-shift encoding, each character is preceded by a shift code (SS2 or SS3). A complete EUC code, consisting of the portable character set (G0) and up to three additional character sets (G1, G2, G3), can be described using the current charmap mechanism; the encoding for each character in additional character sets G2 and G3 must then include their single-shift code. Other mechanisms to support locales based on encoding mechanisms such as locking shift are not addressed by this volume of POSIX.1-2024.
The encodings for <slash> and <period> are required to be the same across all locales, in part because pathname resolution requires recognition of these bytes. It is a fortunate accident that all common shift-based encodings did not use either <slash> or <period> as a valid second byte in a multi-byte character.
The encodings for <newline> and <carriage-return> are required to be the same across all locales since they are special to the general terminal interface and cannot be changed (see XBD 11.1.9 Special Characters).
Earlier versions of this standard did not state the requirement that the POSIX locale contains 256 single-byte characters. This was an oversight; the intention was always that the POSIX locale should have an 8-bit-clean single-byte encoding.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0029 [663,967] and XBD/TC2-2008/0030 [745] are applied.
The standard does not specify how wide characters are encoded or provide a method for defining wide characters in a charmap. It specifies ways of translating between wide characters and multi-byte characters. The standard does not prevent an extension from providing a method to define wide characters.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/13 is applied, adding a statement that the standard has no means of defining a wide-character codeset.
Austin Group Defect 1302 is applied, aligning this section with the ISO/IEC 9899:2018 standard.
IEEE PASC Interpretation 1003.2 #196 is applied, removing three lines of text dealing with ranges of symbolic names using position constant values which had been erroneously included in the final IEEE P1003.2b draft standard.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/14 is applied, correcting the example and adding a statement that the standard provides no means of defining a wide-character codeset.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/15 is applied, allowing the value zero for the width value of WIDTH and WIDTH_DEFAULT. This is required to cover some existing locales.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0031 [967] is applied.
A requirement was considered that would force utilities to eliminate any redundant locking shifts, but this was left as a quality of implementation issue.
This change satisfies the following requirement from the ISO POSIX-2:1993 standard, Annex H.1:
The support of state-dependent (shift encoding) character sets should be addressed fully. See descriptions of these in XBD 6.2 Character Encoding. If such character encodings are supported, it is expected that this will impact XBD 6.2 Character Encoding, 7. Locale, 9. Regular Expressions, and the comm, cut, diff, grep, head, join, paste, and tail utilities.
The character set description file provides:
Implementations are free to choose their own symbolic names, as long as the names identified by the Base Definitions volume of POSIX.1-2024 are also defined; this provides support for already existing "character names".
The names selected for the members of the portable character set follow the ISO/IEC 8859-1:1998 standard and the ISO/IEC 10646-1:2020 standard. However, several commonly used UNIX system names occur as synonyms in the list:
The names for the control characters in XBD 6. Character Set were taken from the ISO/IEC 4873:1991 standard.
The charmap file was introduced to resolve problems with the portability of, especially, localedef sources. POSIX.1-2024 assumes that the portable character set is constant across all locales, but does not prohibit implementations from supporting two incompatible codings, such as both ASCII and EBCDIC. Such dual-support implementations should have all charmaps and localedef sources encoded using one portable character set, in effect cross-compiling for the other environment. Naturally, charmaps (and localedef sources) are only portable without transformation between systems using the same encodings for the portable character set. They can, however, be transformed between two sets using only a subset of the actual characters (the portable character set). However, the particular coded character set used for an application or an implementation does not necessarily imply different characteristics or collation; on the contrary, these attributes should in many cases be identical, regardless of codeset. The charmap provides the capability to define a common locale definition for multiple codesets (the same localedef source can be used for codesets with different extended characters; the ability in the charmap to define empty names allows for characters missing in certain codesets).
The <escape_char> declaration was added at the request of the international community to ease the creation of portable charmap files on terminals not implementing the default <backslash>-escape. The <comment_char> declaration was added at the request of the international community to eliminate the potential confusion between the <number-sign> and the hash sign.
The octal number notation with no leading zero required was selected to match those of awk and tr and is consistent with that used by localedef. To avoid confusion between an octal constant and the back-references used in localedef source, the octal, hexadecimal, and decimal constants must contain at least two digits. As single-digit constants are relatively rare, this should not impose any significant hardship. Provision is made for more digits to account for systems in which the byte size is larger than 8 bits. For example, a Unicode (ISO/IEC 10646-1:2020 standard) system that has defined 16-bit bytes may require six octal, four hexadecimal, and five decimal digits.
The decimal notation is supported because some newer international standards define character values in decimal, rather than in the old column/row notation.
The charmap identifies the coded character sets supported by an implementation. At least one charmap must be provided, but no implementation is required to provide more than one. Likewise, implementations can allow users to generate new charmaps (for instance, for a new version of the ISO 8859 family of coded character sets), but does not have to do so. If users are allowed to create new charmaps, the system documentation describes the rules that apply (for instance, "only coded character sets that are supersets of the ISO/IEC 646:1991 standard IRV, no multi-byte characters").
This addition of the WIDTH specification satisfies the following requirement from the ISO POSIX-2:1993 standard, Annex H.1:
- (9)
- The definition of column position relies on the implementation's knowledge of the integral width of the characters. The charmap or LC_CTYPE locale definitions should be enhanced to allow application specification of these widths.
The character "width" information was first considered for inclusion under LC_CTYPE but was moved because it is more closely associated with the information in the charmap than information in the locale source (cultural conventions information). Concerns were raised that formalizing this type of information is moving the locale source definition from the codeset-independent entity that it was designed to be to a repository of codeset-specific information. A similar issue occurred with the <code_set_name>, <mb_cur_max>, and <mb_cur_min> information, which was resolved to reside in the charmap definition.
The width definition was added to the IEEE P1003.2b draft standard with the intent that the wcswidth() and/or wcwidth() functions (currently specified in the System Interfaces volume of POSIX.1-2024) be the mechanism to retrieve the character width information.
The description of locales is based on work performed in the UniForum Technical Committee, Subcommittee on Internationalization. Wherever appropriate, keywords are taken from the ISO C standard or the X/Open Portability Guide.
The value used to specify a locale with environment variables is the name specified as the name operand to the localedef utility when the locale was created. This provides a verifiable method to create and invoke a locale.
The "object" definitions need not be portable, as long as "source" definitions are. Strictly speaking, source definitions are portable only between implementations using the same character set(s). Such source definitions, if they use symbolic names only, easily can be ported between systems using different codesets, as long as the characters in the portable character set (see XBD 6.1 Portable Character Set) have common values between the codesets; this is frequently the case in historical implementations. Of source, this requires that the symbolic names used for characters outside the portable character set be identical between character sets. The definition of symbolic names for characters is outside the scope of POSIX.1-2024, but is certainly within the scope of other standards organizations.
Applications can select the desired locale by invoking the setlocale() function (or equivalent) with the appropriate value. If the function is invoked with an empty string, the value of the corresponding environment variable is used. If the environment variable is not set or is set to the empty string, the implementation sets the appropriate environment as defined in XBD 8. Environment Variables.
The locale settings of individual categories cannot be truly independent and still guarantee correct results. For example, when collating two strings, characters must first be extracted from each string (governed by LC_CTYPE ) before being mapped to collating elements (governed by LC_COLLATE ) for comparison. That is, if LC_CTYPE is causing parsing according to the rules of a large, multi-byte code set (potentially returning 20000 or more distinct character codeset values), but LC_COLLATE is set to handle only an 8-bit codeset with 256 distinct characters, meaningful results are obviously impossible.
Earlier versions of this standard stated that if different character sets are used by the locale categories, the results achieved by an application utilizing these categories are undefined. This was felt to be overly restrictive. For example, when setting:
LANG=en_US.utf8 LC_TIME=POSIXon a system where the codeset for the POSIX locale is ASCII and the codeset for en_US.utf8 is UTF-8, all of the characters used in the LC_TIME locale data exist, with the same encoding, in the codeset used for LC_CTYPE (via LANG ), so there is no reason for the behavior to be undefined in this case. This standard now has more precise requirements in this area.
Austin Group Defect 1122 is applied, adding item 3 to the list of ways to select the locale to be used by some C-language functions.
Austin Group Defect 1477 is applied, clarifying the behavior when locale categories have different character sets.
On POSIX.1 implementations the POSIX locale is equal to the C locale, even though the requirements for the POSIX locale are more extensive than the ISO C standard requirements for the C locale. To avoid being classified as a C-language function, the name has been changed to the POSIX locale; the environment variable value can be either "POSIX" or, for historical reasons, "C".
The POSIX definitions mirror the historical UNIX system behavior.
The use of symbolic names for characters in the tables does not imply that the POSIX locale must be described using symbolic character names, but merely that it may be advantageous to do so.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0032 [796] and XBD/TC2-2008/0033 [663] are applied.
The decision to separate the file format from the localedef utility description was only partially editorial. Implementations may provide other interfaces than localedef. Requirements on "the utility", mostly concerning error messages, are described in this way because they are meant to affect the other interfaces implementations may provide as well as localedef.
The text about POSIX2_LOCALEDEF does not mean that internationalization is optional; only that the functionality of the localedef utility is. REs, for instance, must still be able to recognize, for example, character class expressions such as "[[:alpha:]]". A possible analogy is with an applications development environment; while all conforming implementations must be capable of executing applications, not all need to have the development environment installed. The assumption is that the capability to modify the behavior of utilities (and applications) via locale settings must be supported. If the localedef utility is not present, then the only choice is to select an existing (presumably implementation-documented) locale. An implementation could, for example, choose to support only the POSIX locale, which would in effect limit the amount of changes from historical implementations quite drastically. The localedef utility is still required, but would always terminate with an exit code indicating that no locale could be created. Supported locales must be documented using the syntax defined in this chapter. (This ensures that users can accurately determine what capabilities are provided. If the implementation decides to provide additional capabilities to the ones in this chapter, that is already provided for.)
If the option is present (that is, locales can be created), then the localedef utility must be capable of creating locales based on the syntax and rules defined in this chapter. This does not mean that the implementation cannot also provide alternate means for creating locales.
The octal, decimal, and hexadecimal notations are the same employed by the charmap facility (see XBD 6.4 Character Set Description File). To avoid confusion between an octal constant and a back-reference, the octal, hexadecimal, and decimal constants must contain at least two digits. As single-digit constants are relatively rare, this should not impose any significant hardship. Provision is made for more digits to account for systems in which the byte size is larger than 8 bits. For example, a Unicode (see the ISO/IEC 10646-1:2020 standard) system that has defined 16-bit bytes may require six octal, four hexadecimal, and five decimal digits. As with the charmap file, multi-byte characters are described in the locale definition file using "big-endian" notation for reasons of portability. There is no requirement that the internal representation in the computer memory be in this same order.
One of the guidelines used for the development of this volume of POSIX.1-2024 is that characters outside the invariant part of the ISO/IEC 646:1991 standard should not be used in portable specifications. The <backslash> character is not in the invariant part; the <number-sign> is, but with multiple representations: as a <number-sign>, and as a hash sign. As far as general usage of these symbols, they are covered by the "grandfather clause", but for newly defined interfaces, the WG15 POSIX working group has requested that POSIX provide alternate representations. Consequently, while the default escape character remains the <backslash> and the default comment character is the <number-sign>, implementations are required to recognize alternative representations, identified in the applicable source file via the <escape_char> and <comment_char> keywords.
The LC_CTYPE category is primarily used to define the encoding-independent aspects of a character set, such as character classification. In addition, certain encoding-dependent characteristics are also defined for an application via the LC_CTYPE category. POSIX.1-2024 does not mandate that the encoding used in the locale is the same as the one used by the application because an implementation may decide that it is advantageous to define locales in a system-wide encoding rather than having multiple, logically identical locales in different encodings, and to convert from the application encoding to the system-wide encoding on usage. Other implementations could require encoding-dependent locales.
In either case, the LC_CTYPE attributes that are directly dependent on the encoding, such as <mb_cur_max> and the display width of characters, are not user-specifiable in a locale source and are consequently not defined as keywords.
Implementations may define additional keywords or extend the LC_CTYPE mechanism to allow application-defined keywords.
The text "The ellipsis specification shall only be valid within a single encoded character set" is present because it is possible to have a locale supported by multiple character encodings, as explained in the rationale for XBD 6.1 Portable Character Set. An example given there is of a possible Japanese-based locale supported by a mixture of the character sets JIS X 0201 Roman, JIS X 0208, and JIS X 0201 Katakana. Attempting to express a range of characters across these sets is not logical and the implementation is free to reject such attempts.
As the LC_CTYPE character classes are based on the ISO C standard character class definition, the category does not support multi-character elements. For instance, the German character <sharp-s> is traditionally classified as a lowercase letter. There is no corresponding uppercase letter; in proper capitalization of German text, the <sharp-s> will be replaced by "SS"; that is, by two characters. This kind of conversion is outside the scope of the toupper and tolower keywords.
Where POSIX.1-2024 specifies that only certain characters can be specified, as for the keywords digit and xdigit, the specified characters must be from the portable character set, as shown. As an example, only the Arabic digits 0 through 9 are acceptable as digits.
The character classes digit, xdigit, lower, upper, and space have a set of automatically included characters. These only need to be specified if the character values (that is, encoding) differs from the implementation default values. It is not possible to define a locale without these automatically included characters unless some implementation extension is used to prevent their inclusion. Such a definition would not be a proper superset of the C locale, and thus, it might not be possible for the standard utilities to be implemented as programs conforming to the ISO C standard.
The definition of character class digit requires that only ten characters—the ones defining digits—can be specified; alternate digits (for example, Hindi or Kanji) cannot be specified here. However, the encoding may vary if an implementation supports more than one encoding.
The definition of character class xdigit requires that the characters included in character class digit are included here also and allows for different symbols for the hexadecimal digits 10 through 15.
The inclusion of the charclass keyword satisfies the following requirement from the ISO POSIX-2:1993 standard, Annex H.1:
This keyword was previously included in The Open Group specifications and is now mandated in the Shell and Utilities volume of POSIX.1-2024.
The symbolic constant {CHARCLASS_NAME_MAX} was also adopted from The Open Group specifications. Applications portability is enhanced by the use of symbolic constants.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0033 [663], XBD/TC2-2008/0034 [663], XBD/TC2-2008/0035 [584], and XBD/TC2-2008/0036 [584] are applied.
Austin Group Defect 1078 is applied, clarifying that only the specified set of characters can be included in the digit and xdigit classes in all locales, not just the POSIX locale.
Austin Group Defect 1589 is applied, disallowing some characters from being included in the blank class.
The rules governing collation depend to some extent on the use. At least five different levels of increasingly complex collation rules can be distinguished:
While the historical collation order formally is at level 1, for the English language it corresponds roughly to elements at level 2. The user expects to see the output from the ls utility sorted very much as it would be in a dictionary. While telephone book ordering would be an optimal goal for standard collation, this was ruled out as the order would be language-dependent. Furthermore, a requirement was that the order must be determined solely from the text string and the collation rules; no external information (for example, "pronunciation dictionaries") could be required.
As a result, the goal for the collation support is at level 3. This also matches the requirements for the Canadian collation order, as well as other, known collation requirements for alphabetic scripts. It specifically rules out collation based on pronunciation rules or based on semantic analysis of the text.
The syntax for the LC_COLLATE category source meets the requirements for level 3 and has been verified to produce the correct result with examples based on French, Canadian, and Danish collation order. Because it supports multi-character collating elements, it is also capable of supporting collation in codesets where a character is expressed using non-spacing characters followed by the base character (such as the ISO/IEC 6937:2001 standard).
The directives that can be specified in an operand to the order_start keyword are based on the requirements specified in several proposed standards and in customary use. The following is a rephrasing of rules defined for "lexical ordering in English and French" by the Canadian Standards Association (the text in square brackets is rephrased):
It is estimated that this part of POSIX.1-2024 covers the requirements for all European languages, and no particular problems are anticipated with Slavic or Middle East character sets.
The Far East (particularly Japanese/Chinese) collations are often based on contextual information and pronunciation rules (the same ideogram can have different meanings and different pronunciations). Such collation, in general, falls outside the desired goal of POSIX.1-2024. There are, however, several other collation rules (stroke/radical or "most common pronunciation") that can be supported with the mechanism described here.
The character order is defined by the order in which characters and elements are specified between the order_start and order_end keywords. Weights assigned to the characters and elements define the collation sequence; in the absence of weights, the character order is also the collation sequence.
The position keyword provides the capability to consider, in a compare, the relative position of characters not subject to IGNORE. As an example, consider the two strings "o-ring" and "or-ing". Assuming the <hyphen-minus> is subject to IGNORE on the first pass, the two strings compare equal, and the position of the <hyphen-minus> is immaterial. On second pass, all characters except the <hyphen-minus> are subject to IGNORE, and in the normal case the two strings would again compare equal. By taking position into account, the first collates before the second.
This standard requires that all implementation-provided locales define a collation sequence that has a total ordering of all characters unless the locale name has an '@' modifier indicating that it has a special collation sequence. Defining locales in this way eliminates unexpected behavior when non-identical strings can collate equally (for example, sort -u and sort | uniq are not equivalent). The exception for locales with a suitable '@' modifier in the name allows implementations to supply locales which do not have a total ordering of all characters provided that they draw attention to it in the modifier name. For example, @icase could indicate that each upper and lowercase character pair collates equally. Even with an '@' modifier, total ordering is preferred when possible; for example, characters that are "ignored" in dictionary order need not be completely ignored (by using IGNORE for all collation weights), but can instead be given a unique weight after one or more IGNORE weights.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0037 [938], XBD/TC2-2008/0038 [663], and XBD/TC2-2008/0039 [584] are applied.
Austin Group Defect 948 is applied, requiring that all implementation-provided locales define a collation sequence that has a total ordering of all characters unless the locale name has an '@' modifier indicating that it has a special collation sequence.
Austin Group Defect 1740 is applied, noting that it is the responsibility of the locale writer to ensure <NUL> has the lowest primary weight in a collation ordering.
The currency symbol does not appear in LC_MONETARY because it is not defined in the C locale of the ISO C standard.
The ISO C standard limits the size of decimal points and thousands delimiters to single-byte values. In locales based on multi-byte coded character sets, this cannot be enforced; POSIX.1-2024 does not prohibit such characters, but makes the behavior unspecified (in the text "In contexts where other standards ...").
The grouping specification is based on, but not identical to, the ISO C standard. The -1 indicates that no further grouping is performed; the equivalent of {CHAR_MAX} in the ISO C standard.
The text "the value is not available in the locale" is taken from the ISO C standard and is used instead of the "unspecified" text in early proposals. There is no implication that omitting these keywords or assigning them values of "" or -1 produces unspecified results; such omissions or assignments eliminate the effects described for the keyword or produce zero-length strings, as appropriate.
The locale definition is an extension of the ISO C standard localeconv() specification. In particular, rules on how currency_symbol is treated are extended to also cover int_curr_symbol, and p_set_by_space and n_sep_by_space have been augmented with the value 2, which places a <space> between the sign and the symbol. This has been updated to match the ISO/IEC 9899:1999 standard requirements and is an incompatible change from UNIX 98 and the ISO POSIX-2 standard and the ISO POSIX-1:1996 standard requirements. The following table shows the result of various combinations:
|
|
p_sep_by_space |
||
---|---|---|---|---|
|
|
2 |
1 |
0 |
p_cs_precedes = 1 |
p_sign_posn = 0 |
($1.25) |
($ 1.25) |
($1.25) |
|
p_sign_posn = 1 |
+ $1.25 |
+$ 1.25 |
+$1.25 |
|
p_sign_posn = 2 |
$1.25 + |
$ 1.25+ |
$1.25+ |
|
p_sign_posn = 3 |
+ $1.25 |
+$ 1.25 |
+$1.25 |
|
p_sign_posn = 4 |
$ +1.25 |
$+ 1.25 |
$+1.25 |
p_cs_precedes = 0 |
p_sign_posn = 0 |
(1.25 $) |
(1.25 $) |
(1.25$) |
|
p_sign_posn = 1 |
+1.25 $ |
+1.25 $ |
+1.25$ |
|
p_sign_posn = 2 |
1.25$ + |
1.25 $+ |
1.25$+ |
|
p_sign_posn = 3 |
1.25+ $ |
1.25 +$ |
1.25+$ |
|
p_sign_posn = 4 |
1.25$ + |
1.25 $+ |
1.25$+ |
The following is an example of the interpretation of the mon_grouping keyword. Assuming that the value to be formatted is 123456789 and the mon_thousands_sep is <apostrophe>, then the following table shows the result. The third column shows the equivalent string in the ISO C standard that would be used by the localeconv() function to accommodate this grouping.
mon_grouping |
Formatted Value |
ISO C String |
---|---|---|
3;-1 |
123456'789 |
"\3\177" |
3 |
123'456'789 |
"\3" |
3;2;-1 |
1234'56'789 |
"\3\2\177" |
3;2 |
12'34'56'789 |
"\3\2" |
-1 |
123456789 |
"\177" |
In these examples, the octal value of {CHAR_MAX} is 177.
IEEE Std 1003.1-2001/Cor 1-2002, item XBD/TC1/D6/6 adds a correction that permits the Euro currency symbol and addresses extensibility. The correction is stated using the term "should" intentionally, in order to make this a recommendation rather than a restriction on implementations. This allows for flexibility in implementations on how they handle future currency symbol additions.
IEEE Std 1003.1-2001/Cor 1-2002, tem XBD/TC1/D6/5 is applied, adding the int_[np]_* values to the POSIX locale definition of LC_MONETARY .
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/16 is applied, updating the descriptions of p_sep_by_space, n_sep_by_space, int_p_sep_by_space, and int_n_sep_by_space to match the description of these keywords in the ISO C standard and the System Interfaces volume of POSIX.1-2024, localeconv().
Austin Group Defect 1199 is applied, adding a requirement that localedef does not accept certain combinations of *_sign_posn, positive_sign, and negative_sign values.
Austin Group Defect 1241 is applied, clarifying the meaning of empty string values.
See the rationale for LC_MONETARY for a description of the behavior of grouping.
Austin Group Defect 1241 is applied, clarifying the meaning of empty string values.
Although certain of the conversion specifications in the POSIX locale (such as the name of the month) are shown with initial capital letters, this need not be the case in other locales. Programs using these conversion specifications may need to adjust the capitalization if the output is going to be used at the beginning of a sentence.
The LC_TIME descriptions of abday, day, mon, and abmon imply a Gregorian style calendar (7-day weeks, 12-month years, leap years, and so on). Formatting time strings for other types of calendars is outside the scope of POSIX.1-2024.
While the ISO 8601:2019 standard numbers the weekdays starting with Monday, historical practice is to use the Sunday as the first day. Rather than change the order and introduce potential confusion, the days must be specified beginning with Sunday; previous references to "first day" have been removed. Note also that the Shell and Utilities volume of POSIX.1-2024 date utility supports numbering compliant with the ISO 8601:2019 standard.
As specified under date in the Shell and Utilities volume of POSIX.1-2024 and strftime() in the System Interfaces volume of POSIX.1-2024, the conversion specifications corresponding to the optional keywords consist of a modifier followed by a traditional conversion specification (for instance, %Ex). If the optional keywords are not supported by the implementation or are unspecified for the current locale, these modified conversion specifications are treated as the traditional conversion specifications. For example, assume the following keywords:
alt_digits "0th";"1st";"2nd";"3rd";"4th";"5th";\ "6th";"7th";"8th";"9th";"10th"
d_fmt "The %Od day of %B in %Y"
On July 4th 1776, the %x conversion specifications would result in "The 4th day of July in 1776", while on July 14th 1789 it would result in "The 14 day of July in 1789". It can be noted that the above example is for illustrative purposes only; the %O modifier is primarily intended to provide for Kanji or Hindi digits in date formats.
The following is an example for Japan that supports the current plus last three Emperors and reverts to Western style numbering for years prior to the Meiji era. The example also allows for the custom of using a special name for the first year of an era instead of using 1. (The examples substitute romaji where kanji should be used.)
era_d_fmt "%EY%mgatsu%dnichi (%a)"
era "+:2:1990/01/01:+*:Heisei:%EC%Eynen";\ "+:1:1989/01/08:1989/12/31:Heisei:%ECgannen";\ "+:2:1927/01/01:1989/01/07:Shouwa:%EC%Eynen";\ "+:1:1926/12/25:1926/12/31:Shouwa:%ECgannen";\ "+:2:1913/01/01:1926/12/24:Taishou:%EC%Eynen";\ "+:1:1912/07/30:1912/12/31:Taishou:%ECgannen";\ "+:2:1869/01/01:1912/07/29:Meiji:%EC%Eynen";\ "+:1:1868/09/08:1868/12/31:Meiji:%ECgannen";\ "-:1868:1868/09/07:-*::%Ey"
Assuming that the current date is September 21, 1991, a request to date or strftime() would yield the following results:
%Ec - Heisei3nen9gatsu21nichi (Sat) 14:39:26 %EC - Heisei %Ex - Heisei3nen9gatsu21nichi (Sat) %Ey - 3 %EY - Heisei3nen
Example era definitions for the Republic of China:
era "+:2:1913/01/01:+*:ChungHwaMingGuo:%EC%EyNen";\ "+:1:1912/1/1:1912/12/31:ChungHwaMingGuo:%ECYuenNen";\ "+:1:1911/12/31:-*:MingChien:%EC%EyNen"
Example definitions for the Christian Era:
era "+:1:0001/01/01:+*:AD:%EC %Ey";\ "+:1:-0001/12/31:-*:BC:%Ey %EC"
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0040 [912] is applied.
Austin Group Defects 258 and 1166 are applied, adding the alt_mon and ab_alt_mon locale keywords.
Austin Group Defect 1307 is applied, changing the am_pm and t_fmt_ampm keywords and the AM_STR, PM_STR, and T_FMT_AMPM constants in relation to locales that do not support the 12-hour clock format.
The yesstr and nostr locale keywords and the YESSTR and NOSTR langinfo items were formerly used to match user affirmative and negative responses. In POSIX.1-2024, the yesexpr, noexpr, YESEXPR, and NOEXPR extended regular expressions have replaced them. Applications should use the general locale-based messaging facilities to issue prompting messages which include sample desired responses.
Affirmative responses like:
y Yes Yes!
and negative responses like:
N No Never No way!
should all be recognized as affirmative and negative responses, respectively, by the EREs identified by the yesexpr and noexpr keywords for English language-based locales. There is no requirement that multi-line responses nor ambiguous responses like:
no or yes yes or no maybe
be correctly classified by either of these EREs. Application writers are encouraged to include locale-specific suggestions for affirmative and negative responses in prompts.
There is no additional rationale provided for this section.
There is no additional rationale provided for this section.
Austin Group Defects 258 and 1166 are applied, adding the alt_mon and ab_alt_mon locale keywords.
The following is an example of a locale definition file that could be used as input to the localedef utility. It assumes that the utility is executed with the -f option, naming a charmap file with (at least) the following content:
CHARMAP <space> \x20 <dollar> \x24 <A> \101 <a> \141 <A-acute> \346 <a-acute> \365 <A-grave> \300 <a-grave> \366 <b> \142 <C> \103 <c> \143 <c-cedilla> \347 <d> \x64 <H> \110 <h> \150 <eszet> \xb7 <s> \x73 <z> \x7a END CHARMAP
It should not be taken as complete or to represent any actual locale, but only to illustrate the syntax.
# LC_CTYPE lower <a>;<b>;<c>;<c-cedilla>;<d>;...;<z> upper A;B;C;Ç;...;Z space \x20;\x09;\x0a;\x0b;\x0c;\x0d blank \040;\011 toupper (<a>,<A>);(b,B);(c,C);(ç,Ç);(d,D);(z,Z) END LC_CTYPE # LC_COLLATE # # The following example of collation is based on # Canadian standard Z243.4.1-1998, "Canadian Alphanumeric # Ordering Standard for Character Sets of CSA Z234.4 Standard". # (Other parts of this example locale definition file do not # purport to relate to Canada, or to any other real culture.) # The proposed standard defines a 4-weight collation, such that # in the first pass, characters are compared without regard to # case or accents; in the second pass, backwards-compare without # regard to case; in the third pass, forwards-compare without # regard to diacriticals. In the 3 first passes, non-alphabetic # characters are ignored; in the fourth pass, only special # characters are considered, such that "The string that has a # special character in the lowest position comes first. If two # strings have a special character in the same position, the # collation value of the special character determines ordering. # # Only a subset of the character set is used here; mostly to # illustrate the set-up. # collating-symbol <NULL> collating-symbol <LOW_VALUE> collating-symbol <LOWER-CASE> collating-symbol <SUBSCRIPT-LOWER> collating-symbol <SUPERSCRIPT-LOWER> collating-symbol <UPPER-CASE> collating-symbol <NO-ACCENT> collating-symbol <PECULIAR> collating-symbol <LIGATURE> collating-symbol <ACUTE> collating-symbol <GRAVE> # Further collating-symbols follow. # # Properly, the standard does not include any multi-character # collating elements; the one below is added for completeness. # collating_element <ch> from "<c><h>" collating_element <CH> from "<C><H>" collating_element <Ch> from "<C><h>" # order_start forward;backward;forward;forward,position # # Collating symbols are specified first in the sequence to allocate # basic collation values to them, lower than that of any character. <NULL> <LOW_VALUE> <LOWER-CASE> <SUBSCRIPT-LOWER> <SUPERSCRIPT-LOWER> <UPPER-CASE> <NO-ACCENT> <PECULIAR> <LIGATURE> <ACUTE> <GRAVE> <RING-ABOVE> <DIAERESIS> <TILDE> # Further collating symbols are given a basic collating value here. # # Here follow special characters. <space> IGNORE;IGNORE;IGNORE;<space> # Other special characters follow here. # # Here follow the regular characters. <a> <a>;<NO-ACCENT>;<LOWER-CASE>;IGNORE <A> <a>;<NO-ACCENT>;<UPPER-CASE>;IGNORE <a-acute> <a>;<ACUTE>;<LOWER-CASE>;IGNORE <A-acute> <a>;<ACUTE>;<UPPER-CASE>;IGNORE <a-grave> <a>;<GRAVE>;<LOWER-CASE>;IGNORE <A-grave> <a>;<GRAVE>;<UPPER-CASE>;IGNORE <ae> "<a><e>";"<LIGATURE><LIGATURE>";\ "<LOWER-CASE><LOWER-CASE>";IGNORE <AE> "<a><e>";"<LIGATURE><LIGATURE>";\ "<UPPER-CASE><UPPER-CASE>";IGNORE <b> <b>;<NO-ACCENT>;<LOWER-CASE>;IGNORE <B> <b>;<NO-ACCENT>;<UPPER-CASE>;IGNORE <c> <c>;<NO-ACCENT>;<LOWER-CASE>;IGNORE <C> <c>;<NO-ACCENT>;<UPPER-CASE>;IGNORE <ch> <ch>;<NO-ACCENT>;<LOWER-CASE>;IGNORE <Ch> <ch>;<NO-ACCENT>;<PECULIAR>;IGNORE <CH> <ch>;<NO-ACCENT>;<UPPER-CASE>;IGNORE # # As an example, the strings "Bach" and "bach" could be encoded (for # compare purposes) as: # "Bach" <b>;<a>;<ch>;<LOW_VALUE>;<NO_ACCENT>;<NO_ACCENT>;\ # <NO_ACCENT>;<LOW_VALUE>;<UPPER-CASE>;<LOWER-CASE>;\ # <LOWER-CASE>;<NULL> # "bach" <b>;<a>;<ch>;<LOW_VALUE>;<NO_ACCENT>;<NO_ACCENT>;\ # <NO_ACCENT>;<LOW_VALUE>;<LOWER-CASE>;<LOWER-CASE>;\ # <LOWER-CASE>;<NULL> # # The two strings are equal in pass 1 and 2, but differ in pass 3. # # Further characters follow. # UNDEFINED IGNORE;IGNORE;IGNORE;IGNORE # order_end # END LC_COLLATE # LC_MONETARY int_curr_symbol "USD " currency_symbol "$" mon_decimal_point "." mon_grouping 3;0 positive_sign "" negative_sign "-" p_cs_precedes 1 n_sign_posn 0 END LC_MONETARY # LC_NUMERIC copy "US_en.ASCII" END LC_NUMERIC # LC_TIME abday "Sun";"Mon";"Tue";"Wed";"Thu";"Fri";"Sat" # day "Sunday";"Monday";"Tuesday";"Wednesday";\ "Thursday";"Friday";"Saturday" # abmon "Jan";"Feb";"Mar";"Apr";"May";"Jun";\ "Jul";"Aug";"Sep";"Oct";"Nov";"Dec" # mon "January";"February";"March";"April";\ "May";"June";"July";"August";"September";\ "October";"November";"December" # d_t_fmt "%a %b %d %T %Z %Y\n" END LC_TIME # LC_MESSAGES yesexpr "^([yY][[:alpha:]]*)|(OK)" # noexpr "^[nN][[:alpha:]]*" END LC_MESSAGES
The variable environ is not intended to be declared in any header, but rather to be declared by the user for accessing the array of strings that is the environment. This is the traditional usage of the symbol. Putting it into a header could break some programs that use the symbol for their own purposes.
The decision to restrict conforming systems to the use of digits, uppercase letters, and underscores for environment variable names allows applications to use lowercase letters in their environment variable names without conflicting with any conforming system.
In addition to the obvious conflict with the shell syntax for positional parameter substitution, some historical applications (including some shells) exclude names with leading digits from the environment.
Some historical implementations removed certain environment variables during program startup when security criteria were not met, instead of just ignoring them at the point of use. The standard developers decided not to allow this behavior because if a process drops all privileges and sets its effective user and group IDs to be the same as its real user and group IDs before executing a program or utility, the behavior should be the same as if the process had originally met the security criteria.
Austin Group Defect 367 is applied, adding requirements relating to the use of readonly on environment variables that are manipulated by shell built-in utilities.
Austin Group Defect 922 is applied, allowing implementations to ignore some environment variables at the point of use for security reasons.
Austin Group Defect 1561 is applied, clarifying that environment variable values can contain byte sequences that do not form valid characters.
Utilities conforming to the Shell and Utilities volume of POSIX.1-2024 and written in standard C can access the locale variables by issuing the following call:
setlocale(LC_ALL, "")
If this were omitted, the ISO C standard specifies that the C (or POSIX) locale would be used.
The DESCRIPTION of setlocale() requires that when setting all categories of a locale, if the value of any of the environment variable searches yields a locale that is not supported (and non-null), the setlocale() function returns a null pointer and the global locale is unchanged.
For the standard utilities, if any of the environment variables are invalid, it makes sense to default to an implementation-defined, consistent locale environment. It is more confusing for a user to have partial settings occur in case of a mistake. All utilities would then behave in one language/cultural environment. Furthermore, it provides a way of forcing the whole environment to be the implementation-defined default. Disastrous results could occur if a pipeline of utilities partially uses the environment variables in different ways. In this case, it would be appropriate for utilities that use LANG and related variables to exit with an error if any of the variables are invalid. For example, users typing individual commands at a terminal might want date to work if LC_MONETARY is invalid as long as LC_TIME is valid. Since these are conflicting reasonable alternatives, POSIX.1-2024 leaves the results unspecified if the locale environment variables would not produce a complete locale matching the specification of the user.
The LC_MESSAGES variable affects the language of messages generated by the standard utilities.
The description of the environment variable names starting with the characters "LC_" acknowledges the fact that the interfaces presented may be extended as new international functionality is required. In the ISO C standard, names preceded by "LC_" are reserved in the name space for future categories.
To avoid name clashes, new categories and environment variables are divided into two classifications: "implementation-independent" and "implementation-defined".
Implementation-independent names will have the following format:
LC_NAME
where NAME is the name of the new category and environment variable. Capital letters must be used for implementation-independent names.
Implementation-defined names must be in lowercase letters, as below:
LC_name
Austin Group Defect 1122 is applied, adding the LANGUAGE , TEXTDOMAIN , and TEXTDOMAINDIR environment variables and updating NLSPATH with requirements relating to the gettext family of functions and the gettext and ngettext utilities.
Austin Group Defect 1477 is applied, moving a paragraph of rationale about incompatible locale categories to A.7.1 General.
Austin Group Defect 1571 is applied, simplifying the final item in the precedence order for internationalization environment variables.
The default values for the number of column positions when COLUMNS is unset or null, and screen height when LINES is unset or null, are unspecified if the terminal window size cannot be obtained (from tcgetwinsize()) because historical implementations use different methods to determine the values. Users should not need to set these variables in the environment unless there is a specific reason to override the default behavior of the implementation, such as to display data in an area arbitrarily smaller than the terminal or window. Values for these variables that are not decimal integers greater than zero are implicitly undefined values; it is unnecessary to enumerate all of the possible values outside of the acceptable set.
Austin Group Defect 1185 is applied, changing the descriptions of the COLUMNS and LINES environment variables.
In most implementations, the value of such a variable is easily forged, so security-critical applications should rely on other means of determining user identity. LOGNAME is required to be constructed from the portable filename character set for reasons of interchange. No diagnostic condition is specified for violating this rule, and no requirement for enforcement exists. The intent of the requirement is that if extended characters are used, the "guarantee" of portability implied by a standard is void.
Many historical implementations of the Bourne shell do not interpret a trailing <colon> to represent the current working directory and are thus non-conforming. The C Shell and the KornShell conform to POSIX.1-2024 on this point. The usual name of dot may also be used to refer to the current working directory.
Many implementations historically have used a default value of /bin and /usr/bin for the PATH variable. POSIX.1-2024 does not mandate this default path be identical to that retrieved from getconf PATH because it is likely that the standardized utilities may be provided in another directory separate from the directories used by some historical applications.
The standard specifies that (when no <slash> character is included in a command pathname) special built-in utilities and intrinsic utilities are not subject to a search using PATH . All other standard utilities, even if implemented as shell built-ins, are required to be found by searching PATH . This means that if a shell includes a built-in for a standard utility that is not intrinsic, a user can write a utility that will override that built-in. The standard also requires that all standard utilities can be executed by commands like:
find . -type d -exec printf 'Found directory: %s\n' '{}' +
So, other than differences caused by using different shell execution environments, a standard utility that is implemented as a built-in and the non-built-in version of that standard utility are both required to behave as the standard specifies. But, if a non-standard utility is found in PATH before the standard utility's location in PATH , the non-standard utility must be invoked rather than the built-in. For instance, if the shell includes a built-in printf utility (which most shells do), PATH is initialized using:
PATH="$HOME/bin:$(command -p getconf PATH)"
and $HOME/bin/printf is an executable file containing:
command -p printf 'In %s with args:\n' "${0##*/}" >&2 command -p printf ' %s\n' "$@" >&2 command -V printf >&2 command -Vp printf >&2 command -p printf "$@"
then the command:
printf '%s %s\n' HOME "$HOME" PATH "$PATH"
should produce output similar to:
In printf with args: %s %s\n HOME /Users/dwc PATH /Users/dwc/bin:/usr/bin:/bin:/usr/sbin:/sbin printf is a tracked alias for /Users/dwc/bin/printf printf is a shell builtin HOME /Users/dwc PATH /Users/dwc/bin:/usr/bin:/bin:/usr/sbin:/sbin
The current version of the Korn shell installs built-ins into the shell using a builtin utility that allows the built-in to be associated with the pathname of the non-built-in version of that utility. (Unfortunately, some implementations that use ksh93 as their standard sh utility do not make use of this feature and install built-ins for standard utilities that are not associated with a PATH search. And, most other shells incorrectly always use a built-in utility if one is installed, even when it should be overridden by a PATH search that should find the non-standard version of a utility with the name of that built-in.) Some other shells use a <percent-sign> character in a directory pathname in PATH to indicate one or more directories that should be used when processing PATH to determine when non-intrinsic standard utilities should be found. The POSIX.1-2024 revision of the standard allows either of these methods to be used to install built-ins that meet the requirements stated in XCU 2.9.1.4 Command Search and Execution by making the behavior of the built-in path search implementation-defined when a <percent-sign> character is found in PATH .
Austin Group Defect 854 is applied, changing how PATH searching applies to built-in utilities.
Austin Group Defect 1340 is applied, clarifying the description of PATH .
The SHELL variable names the preferred shell of the user; it is a guide to applications. There is no direct requirement that that shell conform to POSIX.1-2024; that decision should rest with the user. It is the intention of the standard developers that alternative shells be permitted, if the user chooses to develop or acquire one. An operating system that builds its shell into the "kernel" in such a manner that alternative shells would be impossible does not conform to the spirit of POSIX.1-2024.
The quoted form of the timezone variable allows timezone names of the form UTC+1 (or any name that contains the <plus-sign> ('+'), the <hyphen-minus> ('-'), or digits), which may be appropriate for countries that do not have an official timezone name. It would be coded as <UTC+1>+1<UTC+2>, which would cause std to have a value of UTC+1 and dst a value of UTC+2, each with a length of 5 characters. This does not appear to conflict with any existing usage. The characters '<' and '>' were chosen for quoting because they are easier to parse visually than a quoting character that does not provide some sense of bracketing (and in a string like this, such bracketing is helpful). They were also chosen because they do not need special treatment when assigning to the TZ variable. Users are often confused by embedding quotes in a string. Because '<' and '>' are meaningful to the shell, the whole string would have to be quoted, but that is easily explained. (Parentheses would have presented the same problems.) Although the '>' symbol could have been permitted in the string by either escaping it or doubling it, it seemed of little value to require that. This could be provided as an extension if there was a need. Timezone names of this new form lead to a requirement that the value of {_POSIX_TZNAME_MAX} change from 3 to 6.
Since the TZ environment variable is usually inherited by all applications started by a user after the value of the TZ environment variable is changed and since many applications run using the C or POSIX locale, using characters that are not in the portable character set in the std and dst fields could cause unexpected results.
Implementations are encouraged to incorporate the IANA timezone database into the timezone database used for TZ values specifying geographical and special timezones, and to provide a method to allow it to be updated in accordance with RFC 6557.
The TZ format beginning with <colon> was originally introduced as a way for implementations to support geographical timezones in the form :Area/Location as an extension, but implementations started to support them without the leading <colon> (as well as with it) and their use without the <colon> became the de-facto standard. Consequently when geographical timezones were added to this standard, it was without the <colon>.
The format of the TZ environment variable is changed in Issue 6 to allow for the quoted form, as defined in earlier versions of the ISO POSIX-1 standard.
IEEE Std 1003.1-2001/Cor 1-2002, item XBD/TC1/D6/7 is applied, adding the ctime_r() and localtime_r() functions to the list of functions that use the TZ environment variable.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0041 [584] is applied.
Austin Group Defect 1030 is applied, making it implementation-defined when the changes to and from Daylight Saving Time occur if the dst field is specified in TZ and the rule field is not.
Austin Group Defect 1252 is applied, changing the time field to allow the hour to range from zero to 167 and allowing a leading sign.
Austin Group Defect 1253 is applied, changing "alternative time" to "Daylight Saving Time".
Austin Group Defect 1410 is applied, removing the ctime_r() function.
Austin Group Defect 1619 is applied, adding support for a third TZ format with values specifying geographical and special timezones.
Austin Group Defects 1638 and 1639 are applied, clarifying the length limits for the std and dst fields of TZ .
Rather than repeating the description of REs for each utility supporting REs, the standard developers preferred a common, comprehensive description of regular expressions in one place. The most common behavior is described here, and exceptions or extensions to this are documented for the respective utilities, as appropriate.
The BRE corresponds to the ed or historical grep type, and the ERE corresponds to the historical egrep type (now grep -E).
The text is based on the ed description and substantially modified, primarily to aid developers and others in the understanding of the capabilities and limitations of REs. Much of this was influenced by internationalization requirements.
It should be noted that the definitions in this section do not cover the tr utility; the tr syntax does not employ REs.
The specification of REs is particularly important to internationalization because pattern matching operations are very basic operations in business and other operations. The syntax and rules of REs are intended to be as intuitive as possible to make them easy to understand and use. The historical rules and behavior do not provide that capability to non-English language users, and do not provide the necessary support for commonly used characters and language constructs. It was necessary to provide extensions to the historical RE syntax and rules to accommodate other languages.
As they are limited to bracket expressions, the rationale for these modifications is in XBD 9.3.5 RE Bracket Expression.
It is possible to determine what strings correspond to subexpressions by recursively applying the leftmost longest rule to each subexpression, but only with the proviso that the overall match is leftmost longest. For example, matching "\(ac*\)c*d[ac]*\1" against acdacaaa matches acdacaaa (with \1=a); simply matching the longest match for "\(ac*\)" would yield \1=ac, but the overall match would be smaller (acdac). Conceptually, the implementation must examine every possible match and among those that yield the leftmost longest total matches, pick the one that does the longest match for the leftmost subexpression, and so on. Note that this means that matching by subexpressions is context-dependent: a subexpression within a larger RE may match a different string from the one it would match as an independent RE, and two instances of the same subexpression within the same larger RE may match different lengths even in similar sequences of characters. For example, in the ERE "(a.*b)(a.*b)", the two identical subexpressions would match four and six characters, respectively, of accbaccccb.
The definition of single character has been expanded to include also collating elements consisting of two or more characters; this expansion is applicable only when a bracket expression is included in the BRE or ERE. An example of such a collating element may be the Dutch ij, which collates as a 'y'. In some encodings, a ligature "i with j" exists as a character and would represent a single-character collating element. In another encoding, no such ligature exists, and the two-character sequence ij is defined as a multi-character collating element. Outside brackets, the ij is treated as a two-character RE and matches the same characters in a string. Historically, a bracket expression only matched a single character. The ISO POSIX-2:1993 standard required bracket expressions like "[^[:lower:]]" to match multi-character collating elements such as "ij". However, this requirement led to behavior that many users did not expect and that could not feasibly be mimicked in user code, and it was rarely if ever implemented correctly. The current standard leaves it unspecified whether a bracket expression matches a multi-character collating element, allowing both historical and ISO POSIX-2:1993 standard implementations to conform.
Also, in the current standard, it is unspecified whether character class expressions like "[:lower:]" can include multi-character collating elements like "ij"; hence "[[:lower:]]" can match "ij", and "[^[:lower:]]" can fail to match "ij". Common practice is for a character class expression to match a collating element if it matches the collating element's first character.
Austin Group Defect 1329 is applied, adding a definition of "leftmost" and updating the definition of "matched" to include an example ERE using the repetition modifier '?'.
Austin Group Defect 1546 is applied, adding a definition of "escape sequence".
The definition of which sequence is matched when several are possible is based on the leftmost-longest rule historically used by deterministic recognizers. This rule is easier to define and describe, and arguably more useful, than the first-match rule historically used by non-deterministic recognizers. It is thought that dependencies on the choice of rule are rare; carefully contrived examples are needed to demonstrate the difference.
A formal expression of the leftmost-longest rule is:
The search is performed as if all possible suffixes of the string were tested for a prefix matching the pattern; the longest suffix containing a matching prefix is chosen, and the longest possible matching prefix of the chosen suffix is identified as the matching sequence.
EREs can optionally use a leftmost-shortest rule for repetitions (enabled via the REG_MINIMAL flag or the '?' repetition modifier), in which case the shortest possible matching prefix is instead identified as the matching sequence for the affected repetition(s).
Historically, most RE implementations only match lines, not strings. However, that is more an effect of the usage than of an inherent feature of REs themselves. Consequently, POSIX.1-2024 does not regard <newline> characters as special; they are ordinary characters, and both a <period> and a non-matching list can match them. Those utilities (like grep) that do not allow <newline> characters to match are responsible for eliminating any <newline> from strings before matching against the RE. The regcomp() function, however, can provide support for such processing without violating the rules of this section.
Some implementations of egrep have had very limited flexibility in handling complex EREs. POSIX.1-2024 does not attempt to define the complexity of a BRE or ERE, but does place a lower limit on it—any RE must be handled, as long as it can be expressed in 256 bytes or less. (Of course, this does not place an upper limit on the implementation.) There are historical programs using a non-deterministic-recognizer implementation that should have no difficulty with this limit. It is possible that a good approach would be to attempt to use the faster, but more limited, deterministic recognizer for simple expressions and to fall back on the non-deterministic recognizer for those expressions requiring it. Non-deterministic implementations must be careful to observe the rules on which match is chosen; the longest match, not the first match, starting at a given character is used.
The term "invalid" highlights a difference between this section and some others: POSIX.1-2024 frequently avoids mandating of errors for syntax violations because they can be used by implementors to trigger extensions. However, the authors of the internationalization features of REs wanted to mandate errors for certain conditions to identify usage problems or non-portable constructs. These are identified within this rationale as appropriate. The remaining syntax violations have been left implicitly or explicitly undefined. For example, the BRE construct "\{1,2,3\}" does not comply with the grammar. A conforming application cannot rely on it producing an error nor matching the literal characters "\{1,2,3\}".
The term "undefined" was used in favor of "unspecified" because many of the situations are considered errors on some implementations, and the standard developers considered that consistency throughout the section was preferable to mixing undefined and unspecified.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0042 [554] is applied.
Austin Group Defect 1031 is applied, replacing text relating to case insensitive comparisons with a reference to XBD 4.1 Case Insensitive Comparisons.
Austin Group Defect 1139 is applied, making minor editorial changes to several subsections of this section and changing them to require that, when not inside a bracket expression, "\]" matches ']'.
There is no additional rationale provided for this section.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0043 [554] is applied.
Austin Group Defect 1546 is applied, adding optional support for "\?", "\+", and "\|".
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0043 [554] is applied.
Austin Group Defect 1546 is applied, adding optional support for "\?", "\+", and "\|".
There is no additional rationale provided for this section.
Range expressions are, historically, an integral part of REs. However, the requirements of "natural language behavior" and portability do conflict. In the POSIX locale, ranges must be treated according to the collating sequence and include such characters that fall within the range based on that collating sequence, regardless of character values. In other locales, ranges have unspecified behavior.
Some historical implementations allow range expressions where the ending range point of one range is also the starting point of the next (for instance, "[a-m-o]"). This behavior should not be permitted, but to avoid breaking historical implementations, it is now undefined whether it is a valid expression and how it should be interpreted.
Current practice in awk and lex is to accept escape sequences in bracket expressions as per XBD Escape Sequences and Associated Actions, while the normal ERE behavior is to regard such a sequence as consisting of two characters. Allowing the awk/lex behavior in EREs would change the normal behavior in an unacceptable way; it is expected that awk and lex will decode escape sequences in EREs before passing them to regcomp() or comparable routines. Each utility describes the escape sequences it accepts as an exception to the rules in this section; the list is not the same, for historical reasons.
As noted previously, the new syntax and rules have been added to accommodate other languages than English. The remainder of this section describes the rationale for these modifications.
In the POSIX locale, a regular expression that starts with a range expression matches a set of strings that are contiguously sorted, but this is not necessarily true in other locales. For example, a French locale might have the following behavior:
$ ls alpha Alpha estimé ESTIMÉ été eurêka $ ls [a-e]* alpha Alpha estimé eurêka
Such disagreements between matching and contiguous sorting are unavoidable because POSIX sorting cannot be implemented in terms of a deterministic finite-state automaton (DFA), but range expressions by design are implementable in terms of DFAs.
Historical implementations used native character order to interpret range expressions. The ISO POSIX-2:1993 standard instead required collating element order (CEO): the order that collating elements were specified between the order_start and order_end keywords in the LC_COLLATE category of the current locale. CEO had some advantages in portability over the native character order, but it also had some disadvantages:
Because of these problems, some implementations of regular expressions continued to use native character order. Others used the collation sequence, which is more consistent with sorting than either CEO or native order, but which departs further from the traditional POSIX semantics because it generally requires "[a-e]" to match either 'A' or 'E' but not both. As a result of this kind of implementation variation, programmers who wanted to write portable regular expressions could not rely on the ISO POSIX-2:1993 standard guarantees in practice.
While revising the standard, lengthy consideration was given to proposals to attack this problem by adding an API for querying the CEO to allow user-mode matchers, but none of these proposals had implementation experience and none achieved consensus. Leaving the standard alone was also considered, but rejected due to the problems described above.
The current standard leaves unspecified the behavior of a range expression outside the POSIX locale. This makes it clearer that conforming applications should avoid range expressions outside the POSIX locale, and it allows implementations and compatible user-mode matchers to interpret range expressions using native order, CEO, collation sequence, or other, more advanced techniques. The concerns which led to this change were raised in IEEE PASC interpretation 1003.2 #43 and others, and related to ambiguities in the specification of how multi-character collating elements should be handled in range expressions. These ambiguities had led to multiple interpretations of the specification, in conflicting ways, which led to varying implementations. As noted above, efforts were made to resolve the differences, but no solution has been found that would be specific enough to allow for portable software while not invalidating existing implementations.
The standard developers recognize that collating elements are important, such elements being common in several European languages; for example, 'ch' or 'll' in traditional Spanish; 'aa' in several Scandinavian languages. Existing internationalized implementations have processed, and continue to process, these elements in range expressions. Efforts are expected to continue in the future to find a way to define the behavior of these elements precisely and portably.
The ISO POSIX-2:1993 standard required "[b-a]" to be an invalid expression in the POSIX locale, but this requirement has been relaxed in this version of the standard so that "[b-a]" can instead be treated as a valid expression that does not match any string.
The standard specifies three possible behaviors for regular expressions such as "[:alpha:]". One behavior is the traditional implementation, which behaves like "[:ahlp]". Another, for alignment with the tr utility, is to treat it like "[[:alpha:]]". And finally, the standard allows rejecting the regular expression as invalid, as a means of alerting a user to the non-portable aspect of that regular expression. The set of regular expressions with this undefined behavior is limited solely to the expressions where the outer '[' and ']' of the bracket expression can be confused with the missing bracket pair '[' and ']' necessary to form a collating symbol, equivalence class, or character class; thus "[_:alpha:]" or "[::]" do not trigger the unspecified behavior.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0044 [938], XBD/TC2-2008/0045 [872], XBD/TC2-2008/0046 [938], XBD/TC2-2008/0047 [584], and XBD/TC2-2008/0048 [584] are applied.
Austin Group Defect 948 is applied, requiring that an ordinary character in a matching list only matches that character.
Austin Group Defect 1190 is applied, clarifying which characters lose their special meaning inside a bracket expression.
Austin Group Defect 1288 is applied, changing "rejected as an error" to "treated as an invalid bracket expression".
The limit of nine back-references to subexpressions in the RE is based on the use of a single-digit identifier; increasing this to multiple digits would break historical applications. This does not imply that only nine subexpressions are allowed in REs. The following is a valid BRE with ten subexpressions:
\(\(\(ab\)*c\)*d\)\(ef\)*\(gh\)\{2\}\(ij\)*\(kl\)*\(mn\)*\(op\)*\(qr\)*
The standard developers regarded the common historical behavior, which supported "\n*", but not "\n\{min,max\}", "\(...\)*", or "\(...\)\{min,max\}", as a non-intentional result of a specific implementation, and they supported both duplication and interval expressions following subexpressions and back-references.
The changes to the processing of the back-reference expression remove an unspecified or ambiguous behavior in the Shell and Utilities volume of POSIX.1-2024, aligning it with the requirements specified for the regcomp() expression, and is the result of PASC Interpretation 1003.2-92 #43 submitted for the ISO POSIX-2:1993 standard.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0049 [595] is applied.
There is no additional rationale provided for this section.
Often, the <dollar-sign> is viewed as matching the ending <newline> in text files. This is not strictly true; the <newline> is typically eliminated from the strings to be matched, and the <dollar-sign> matches the terminating null character.
The ability of '^', '$', and '*' to be non-special in certain circumstances may be confusing to some programmers, but this situation was changed only in a minor way from historical practice to avoid breaking many historical scripts. Some consideration was given to making the use of the anchoring characters undefined if not escaped and not at the beginning or end of strings. This would cause a number of historical BREs, such as "2^10", "$HOME", and "$1.35", that relied on the characters being treated literally, to become invalid.
However, one relatively uncommon case was changed to allow an extension used on some implementations. Historically, the BREs "^foo" and "\(^foo\)" did not match the same string, despite the general rule that subexpressions and entire BREs match the same strings. To increase consensus, POSIX.1-2024 has allowed an extension on some implementations to treat these two cases in the same way by declaring that anchoring may occur at the beginning or end of a subexpression. Therefore, portable BREs that require a literal <circumflex> at the beginning or a <dollar-sign> at the end of a subexpression must escape them. Note that a BRE such as "a\(^bc\)" will either match "a^bc" or nothing on different systems under the rules.
ERE anchoring has been different from BRE anchoring in all historical systems. An unescaped anchor character has never matched its literal counterpart outside a bracket expression. Some implementations treated "foo$bar" as a valid expression that never matched anything; others treated it as invalid. POSIX.1-2024 mandates the former, valid unmatched behavior.
Some implementations have extended the BRE syntax to add alternation. For example, the subexpression "\(foo$\|bar\)" would match either "foo" at the end of the string or "bar" anywhere. The extension is triggered by the use of the undefined "\|" sequence. Because the BRE is undefined for portable scripts, the extending system is free to make other assumptions, such that the '$' represents the end-of-line anchor in the middle of a subexpression. If it were not for the extension, the '$' would match a literal <dollar-sign> under the rules.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0049 [595] is applied.
Austin Group Defect 1546 is applied, adding optional support for "\?", "\+", and "\|".
Austin Group Defect 1579 is applied, eliminating an inconsistency between the list items relating to <circumflex> and <dollar-sign>.
As with BREs, the standard developers decided to make the interpretation of escaped ordinary characters undefined.
The <right-parenthesis> is not listed as an ERE special character because it is only special in the context of a preceding <left-parenthesis>. If found without a preceding <left-parenthesis>, the <right-parenthesis> has no special meaning.
The interval expression, "{m,n}", has been added to EREs. Historically, the interval expression has only been supported in some ERE implementations. The standard developers estimated that the addition of interval expressions to EREs would not decrease consensus and would also make BREs more of a subset of EREs than in many historical implementations.
It was suggested that, in addition to interval expressions, back-references ('\n') should also be added to EREs. This was rejected by the standard developers as likely to decrease consensus.
In historical implementations, multiple duplication symbols are usually interpreted from left to right and treated as additive. As an example, "a+*b" matches zero or more instances of 'a' followed by a 'b'. In POSIX.1-2024, multiple duplication symbols are undefined; that is, they cannot be relied upon for conforming applications. One reason for this is to provide some scope for future enhancements.
The precedence of operations differs between EREs and those in lex; in lex, for historical reasons, interval expressions have a lower precedence than concatenation.
Austin Group Defect 1139 is applied, making minor editorial changes to several subsections of this section and changing them to require that, when not inside a bracket expression, "\]" matches ']' and "\}" matches '}'.
There is no additional rationale provided for this section.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0050 [554] is applied.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0050 [554] is applied.
There is no additional rationale provided for this section.
There is no additional rationale provided for this section.
Austin Group Defects 793 and 1329 are applied, adding the repetition modifier '?' and the REG_MINIMAL flag.
There is no additional rationale provided for this section.
There is no additional rationale provided for this section.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0051 [595] is applied.
The grammars are intended to represent the range of acceptable syntaxes available to conforming applications. There are instances in the text where undefined constructs are described; as explained previously, these allow implementation extensions. There is no intended requirement that an implementation extension must somehow fit into the grammars shown here.
The BRE grammar does not permit L_ANCHOR or R_ANCHOR inside "\(" and "\)" (which implies that '^' and '$' are ordinary characters). This reflects the semantic limits on the application, as noted in XBD 9.3.8 BRE Expression Anchoring. Implementations are permitted to extend the language to interpret '^' and '$' as anchors in these locations, and as such, conforming applications cannot use unescaped '^' and '$' in positions inside "\(" and "\)" that might be interpreted as anchors.
The ERE grammar does not permit several constructs that XBD 9.4.2 ERE Ordinary Characters and 9.4.3 ERE Special Characters specify as having undefined results:
Implementations are permitted to extend the language to allow these. Conforming applications cannot use such constructs.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0052 [554] is applied.
Austin Group Defect 1139 is applied, updating QUOTED_CHAR to add \] to the BRE list and add \] and \} to the ERE list, and changing "outside bracket expressions" to "except inside bracket expressions".
Austin Group Defect 1546 is applied, adding optional support for \?, \+, and \| in BREs.
The removal of the Back_open_paren Back_close_paren option from the nondupl_RE specification is the result of PASC Interpretation 1003.2-92 #43 submitted for the ISO POSIX-2:1993 standard. Although the grammar required support for null subexpressions, this section does not describe the meaning of, and historical practice did not support, this construct.
Austin Group Defect 1546 is applied, adding optional support for \?, \+, and \| in BREs.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0052 [554] and XBD/TC2-2008/0053 [916] are applied.
A description of the historical /usr/tmp was omitted, removing any concept of differences in emphasis between the / and /usr directories. The descriptions of /bin, /usr/bin, /lib, and /usr/lib were omitted because they are not useful for applications. In an early draft, a distinction was made between system and application directory usage, but this was not found to be useful.
The directories / and /dev are included because the notion of a hierarchical directory structure is key to other information presented elsewhere in POSIX.1-2024. In early drafts, it was argued that special devices and temporary files could conceivably be handled without a directory structure on some implementations. For example, the system could treat the characters "/tmp" as a special token that would store files using some non-POSIX file system structure. This notion was rejected by the standard developers, who required that all the files in this section be implemented via POSIX file systems.
The /tmp directory is retained in POSIX.1-2024 to accommodate historical applications that assume its availability. Implementations are encouraged to provide suitable directory names in the environment variable TMPDIR and applications are encouraged to use the contents of TMPDIR for creating temporary files.
The standard files /dev/null and /dev/tty are required to be both readable and writable to allow applications to have the intended historical access to these files.
The standard file /dev/console has been added for alignment with the Single UNIX Specification.
IEEE Std 1003.1-2001/Cor 2-2004, item XBD/TC2/D6/17 is applied, making it clear that the requirements for documenting terminal support are in the system documentation.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0054 [967] is applied.
If the implementation does not support this interface on any device types, it should behave as if it were being used on a device that is not a terminal device (in most cases errno will be set to [ENOTTY] on return from functions defined by this interface). This is based on the fact that many applications are written to run both interactively and in some non-interactive mode, and they adapt themselves at runtime. Requiring that they all be modified to test an environment variable to determine whether they should try to adapt is unnecessary. On a system that provides no general terminal interface, providing all the entry points as stubs that return [ENOTTY] (or an equivalent, as appropriate) has the same effect and requires no changes to the application.
Although the needs of both interface implementors and application developers were addressed throughout POSIX.1-2024, this section pays more attention to the needs of the latter. This is because, while many aspects of the programming interface can be hidden from the user by the application developer, the terminal interface is usually a large part of the user interface. Although to some extent the application developer can build missing features or work around inappropriate ones, the difficulties of doing that are greater in the terminal interface than elsewhere. For example, efficiency prohibits the average program from interpreting every character passing through it in order to simulate character erase, line kill, and so on. These functions should usually be done by the operating system, possibly at the interrupt level.
The tc*() functions were introduced as a way of avoiding the problems inherent in the traditional ioctl() function and in variants of it that were proposed. For example, tcsetattr() is specified in place of the use of the TCSETA ioctl() command function. This allows specification of all the arguments in a manner consistent with the ISO C standard unlike the varying third argument of ioctl(), which is sometimes a pointer (to any of many different types) and sometimes an int.
The advantages of this new method include:
The disadvantages include:
The issue of modem control was excluded from POSIX.1-2024 on the grounds that:
The O_TTY_INIT flag for open() has been added to POSIX.1-2024 to solve a problem encountered by applications written for earlier versions of this standard which need to open a modem or similar device and initialize all of the parameter settings. Using the tcgetattr()-modify-tcsetattr() method mandated by the standard could result in non-conforming behavior if the device had previously been used with non-conforming parameter settings, on implementations which do not reset the parameter settings in between the last close of the device by one application and the first open by another application. To avoid this problem, some application developers were resorting to using memset() to zero the termios structure before setting all of the standard parameters, but this risks non-conforming behavior on systems where some non-standard parameter needs a non-zero value in order for the terminal to behave in a conforming manner.
On systems which do reset the parameter settings to defaults between uses of a terminal device, it is expected that either O_TTY_INIT will have the value zero or open(ttypath, O_RDWR|O_TTY_INIT) will do nothing additional.
The standard developers considered an alternative solution of a special fildes argument for the tcgetattr() call to obtain default parameters. However, this would not be adequate if a system supports several different types of terminal device and the default settings need to differ between the different types. With the O_TTY_INIT open flag, the implementor can determine which device type is being opened.
The standard developers also considered a special POSIX_TTY_INIT value for the termios structure used in tcsetattr(), which would reset the values if used immediately after an open() call. However, it was felt that this would lead to confusion amongst application developers who wanted to reset the parameters at other points, and implementations might diverge.
Austin Group Defect 1466 is applied, changing the terminology used for pseudo-terminal devices.
There is a potential race when the members of the foreground process group on a terminal leave that process group, either by exit or by changing process groups. After the last process exits the process group, but before the foreground process group ID of the terminal is changed (usually by a job control shell), it would be possible for a new process to be created with its process ID equal to the terminal's foreground process group ID. That process might then become the process group leader and accidentally be placed into the foreground on a terminal that was not necessarily its controlling terminal. As a result of this problem, the controlling terminal is defined to not have a foreground process group during this time.
The cases where a controlling terminal has no foreground process group occur when all processes in the foreground process group either terminate and are waited for or join other process groups via setpgid() or setsid(). If the process group leader terminates, this is the first case described; if it leaves the process group via setpgid(), this is the second case described (a process group leader cannot successfully call setsid()). When one of those cases causes a controlling terminal to have no foreground process group, it has two visible effects on applications. The first is the value returned by tcgetpgrp(). The second (which occurs only in the case where the process group leader terminates) is the sending of signals in response to special input characters. The intent of POSIX.1-2024 is that no process group be wrongly identified as the foreground process group by tcgetpgrp() or unintentionally receive signals because of placement into the foreground.
In 4.3 BSD, the old process group ID continues to be used to identify the foreground process group and is returned by the function equivalent to tcgetpgrp(). In that implementation it is possible for a newly created process to be assigned the same value as a process ID and then form a new process group with the same value as a process group ID. The result is that the new process group would receive signals from this terminal for no apparent reason, and POSIX.1-2024 precludes this by forbidding a process group from entering the foreground in this way. It would be more direct to place part of the requirement made by the last sentence under fork(), but there is no convenient way for that section to refer to the value that tcgetpgrp() returns, since in this case there is no process group and thus no process group ID.
One possibility for a conforming implementation is to behave similarly to 4.3 BSD, but to prevent this reuse of the ID, probably in the implementation of fork(), as long as it is in use by the terminal.
Another possibility is to recognize when the last process stops using the terminal's foreground process group ID, which is when the process group lifetime ends, and to change the terminal's foreground process group ID to a reserved value that is never used as a process ID or process group ID. (See the definition of process group lifetime in the definitions section.) The process ID can then be reserved until the terminal has another foreground process group.
The 4.3 BSD implementation permits the leader (and only member) of the foreground process group to leave the process group by calling the equivalent of setpgid() and to later return, expecting to return to the foreground. There are no known application needs for this behavior, and POSIX.1-2024 neither requires nor forbids it (except that it is forbidden for session leaders) by leaving it unspecified.
POSIX.1-2024 does not specify a mechanism by which to allocate a controlling terminal. This is normally done by a system utility (such as getty) and is considered an administrative feature outside the scope of POSIX.1-2024.
Historical implementations allocate controlling terminals on certain open() calls. Since open() is part of POSIX.1, its behavior had to be dealt with. The traditional behavior is not required because it is not very straightforward or flexible for either implementations or applications. However, because of its prevalence, it was not practical to disallow this behavior either. Thus, a mechanism was standardized to ensure portable, predictable behavior in open().
Some historical implementations deallocate a controlling terminal on the last system-wide close. This behavior in neither required nor prohibited. Even on implementations that do provide this behavior, applications generally cannot depend on it due to its system-wide nature.
The access controls described in this section apply only to a process that is accessing its controlling terminal. A process accessing a terminal that is not its controlling terminal is effectively treated the same as a member of the foreground process group. While this may seem unintuitive, note that these controls are for the purpose of job control, not security, and job control relates only to the controlling terminal of a process. Normal file access permissions handle security.
If the process calling read() or write() is in a background process group that is orphaned, it is not desirable to stop the process group, as it is no longer under the control of a job control shell that could put it into the foreground again. Accordingly, calls to read() or write() functions by such processes receive an immediate error return. This is different from 4.2 BSD, which kills orphaned processes that receive terminal stop signals.
The foreground/background/orphaned process group check performed by the terminal driver must be repeatedly performed until the calling process moves into the foreground or until the process group of the calling process becomes orphaned. That is, when the terminal driver determines that the calling process is in the background and should receive a job control signal, it sends the appropriate signal (SIGTTIN or SIGTTOU) to every process in the process group of the calling process and then it allows the calling process to immediately receive the signal. The latter is typically performed by blocking the process so that the signal is immediately noticed. Note, however, that after the process finishes receiving the signal and control is returned to the driver, the terminal driver must re-execute the foreground/background/orphaned process group check. The process may still be in the background, either because it was continued in the background by a job control shell, or because it caught the signal and did nothing.
The terminal driver repeatedly performs the foreground/background/orphaned process group checks whenever a process is about to access the terminal. In the case of write() or the control tc*() functions, the check is performed at the entry of the function. In the case of read(), the check is performed not only at the entry of the function, but also after blocking the process to wait for input characters (if necessary). That is, once the driver has determined that the process calling the read() function is in the foreground, it attempts to retrieve characters from the input queue. If the queue is empty, it blocks the process waiting for characters. When characters are available and control is returned to the driver, the terminal driver must return to the repeated foreground/background/orphaned process group check again. The process may have moved from the foreground to the background while it was blocked waiting for input characters.
Austin Group Defect 1151 is applied, adding tcsetwinsize().
There is no additional rationale provided for this section.
The term "character" is intended here. ERASE should erase the last character, not the last byte. In the case of multi-byte characters, these two may be different.
4.3 BSD has a WERASE character that erases the last "word" typed (but not any preceding <blank> or <tab> characters). A word is defined as a sequence of non-<blank> characters, with <tab> characters counted as <blank> characters. Like ERASE, WERASE does not erase beyond the beginning of the line. This WERASE feature has not been specified in POSIX.1 because it is difficult to define in the international environment. It is only useful for languages where words are delimited by <blank> characters. In some ideographic languages, such as Japanese and Chinese, words are not delimited at all. The WERASE character should presumably go back to the beginning of a sentence in those cases; practically, this means it would not be used much for those languages.
It should be noted that there is a possible inherent deadlock if the application and implementation conflict on the value of {MAX_CANON}. With ICANON set (if IXOFF is enabled) and more than {MAX_CANON} characters transmitted without a <linefeed>, transmission will be stopped, the <linefeed> (or <carriage-return> when ICRLF is set) will never arrive, and the read() will never be satisfied.
An application should not set IXOFF if it is using canonical mode unless it knows that (even in the face of a transmission error) the conditions described previously cannot be met or unless it is prepared to deal with the possible deadlock in some other way, such as timeouts.
It should also be noted that this can be made to happen in non-canonical mode if the trigger value for sending IXOFF is less than VMIN and VTIME is zero.
Some points to note about MIN and TIME:
These two points highlight the dual purpose of the MIN/TIME feature. Cases A and B, where MIN>0, exist to handle burst-mode activity (for example, file transfer programs) where a program would like to process at least MIN characters at a time. In case A, the inter-character timer is activated by a user as a safety measure; in case B, it is turned off.
Cases C and D exist to handle single-character timed transfers. These cases are readily adaptable to screen-based applications that need to know if a character is present in the input queue before refreshing the screen. In case C, the read is timed; in case D, it is not.
Another important note is that MIN is always just a minimum. It does not denote a record length. That is, if a program does a read of 20 bytes, MIN is 10, and 25 characters are present, 20 characters are returned to the user. In the special case of MIN=0, this still applies: if more than one character is available, they all will be returned immediately.
There is no additional rationale provided for this section.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0055 [745] is applied.
There is no additional rationale provided for this section.
POSIX.1-2024 does not specify that a close() on a terminal device file include the equivalent of a call to tcflow(fd,TCOON).
An implementation that discards output at the time close() is called after reporting the return value to the write() call that data was written does not conform with POSIX.1-2024. An application has functions such as tcdrain(), tcflush(), and tcflow() available to obtain the detailed behavior it requires with respect to flushing of output.
At the time of the last close on a terminal device, an application relinquishes any ability to exert flow control via tcflow().
This structure is part of an interface that, in general, retains the historic grouping of flags. Although a more optimal structure for implementations may be possible, the degree of change to applications would be significantly larger.
Some historical implementations treated a long break as multiple events, as many as one per character time. The wording in POSIX.1 explicitly prohibits this.
Although the ISTRIP flag is normally superfluous with today's terminal hardware and software, it is historically supported. Therefore, applications may be using ISTRIP, and there is no technical problem with supporting this flag. Also, applications may wish to receive only 7-bit input bytes and may not be connected directly to the hardware terminal device (for example, when a connection traverses a network).
Also, there is no requirement in general that the terminal device ensures that high-order bits beyond the specified character size are cleared. ISTRIP provides this function for 7-bit characters, which are common.
In dealing with multi-byte characters, the consequences of a parity error in such a character, or in an escape sequence affecting the current character set, are beyond the scope of POSIX.1 and are best dealt with by the application processing the multi-byte characters.
POSIX.1 does not describe post-processing of output to a terminal or detailed control of that from a conforming application. (That is, translation of <newline> to <carriage-return> followed by <linefeed> or <tab> processing.) There is nothing that a conforming application should do to its output for a terminal because that would require knowledge of the operation of the terminal. It is the responsibility of the operating system to provide post-processing appropriate to the output device, whether it is a terminal or some other type of device.
Extensions to POSIX.1 to control the type of post-processing already exist and are expected to continue into the future. The control of these features is primarily to adjust the interface between the system and the terminal device so the output appears on the display correctly. This should be set up before use by any application.
In general, both the input and output modes should not be set absolutely, but rather modified from the inherited state.
This section could be misread that the symbol "CSIZE" is a title in the termios c_cflag field. Although it does serve that function, it is also a required symbol, as a literal reading of POSIX.1 (and the caveats about typography) would indicate.
Non-canonical mode is provided to allow fast bursts of input to be read efficiently while still allowing single-character input.
The ECHONL function historically has been in many implementations. Since there seems to be no technical problem with supporting ECHONL, it is included in POSIX.1 to increase consensus.
The alternate behavior possible when ECHOK or ECHOE are specified with ICANON is permitted as a compromise depending on what the actual terminal hardware can do. Erasing characters and lines is preferred, but is not always possible.
Permitting VMIN and VTIME to overlap with VEOF and VEOL was a compromise for historical implementations. Only when backwards-compatibility of object code is a serious concern to an implementor should an implementation continue this practice. Correct applications that work with the overlap (at the source level) should also work if it is not present, but not the reverse.
The standard developers considered that recent trends toward diluting the SYNOPSIS sections of historical reference pages to the equivalent of:
command [options][operands]
were a disservice to the reader. Therefore, considerable effort was placed into rigorous definitions of all the command line arguments and their interrelationships. The relationships depicted in the synopses are normative parts of POSIX.1-2024; this information is sometimes repeated in textual form, but that is only for clarity within context.
The use of "undefined" for conflicting argument usage and for repeated usage of the same option is meant to prevent conforming applications from using conflicting arguments or repeated options unless specifically allowed (as is the case with ls, which allows simultaneous, repeated use of the -C, -l, and -1 options). Many historical implementations will tolerate this usage, choosing either the first or the last applicable argument. This tolerance can continue, but conforming applications cannot rely upon it. (Other implementations may choose to print usage messages instead.)
The use of "undefined" for conflicting argument usage also allows an implementation to make reasonable extensions to utilities where the implementor considers mutually-exclusive options according to POSIX.1-2024 to have a sensible meaning and result.
POSIX.1-2024 does not define the result of a command when an option-argument or operand is not followed by ellipses and the application specifies more than one of that option-argument or operand. This allows an implementation to define valid (although non-standard) behavior for the utility when more than one such option or operand is specified.
The requirements for option-arguments are summarized as follows:
|
SYNOPSIS Shows: |
|
---|---|---|
|
-a arg |
-c[arg] |
Conforming application uses: |
-a arg |
-carg or -c |
System supports: |
-a arg and -aarg |
-carg and -c |
Non-conforming applications may use: |
-aarg |
N/A |
Earlier versions of this standard included obsolescent syntax which showed some options with (mandatory) adjacent option-arguments in the SYNOPSIS for some utilities. These have since been removed. For all options with mandatory option-arguments, the SYNOPSIS now shows <blank> characters between the option and the option-argument; however, historical usage has not been consistent in this area; therefore, <blank> characters are required to be used by conforming applications and to be handled by all implementations, but implementations are also required to handle an adjacent option-argument in order to preserve backwards-compatibility for old scripts. One of the justifications for selecting the multiple-argument method was that the single-argument case is inherently ambiguous when the option-argument can legitimately be a null string.
POSIX.1-2024 explicitly states that digits are permitted as operands and option-arguments. The lower and upper bounds for the values of the numbers used for operands and option-arguments were derived from the ISO C standard values for {LONG_MIN} and {LONG_MAX}. The requirement on the standard utilities is that numbers in the specified range do not cause a syntax error, although the specification of a number need not be semantically correct for a particular operand or option-argument of a utility. For example, the specification of:
dd obs=3000000000
would yield undefined behavior for the application and could be a syntax error because the number 3000000000 is outside of the range -2147483647 to +2147483647. On the other hand:
dd obs=2000000000
may cause some error, such as "blocksize too large", rather than a syntax error.
POSIX.1-2008, Technical Corrigendum 2, XBD/TC2-2008/0056 [584] and XBD/TC2-2008/0057 [813] are applied.
Austin Group Defect 1062 is applied, correcting the spacing in some example SYNOPSIS lines.
This section is based on the rules listed in the SVID. It was included for two reasons:
It is recommended that all future utilities and applications use these guidelines to enhance "user portability". The fact that some historical utilities could not be changed (to avoid breaking historical applications) should not deter this future goal.
The voluntary nature of the guidelines is highlighted by repeated uses of the word should throughout. This usage should not be misinterpreted to imply that utilities that claim conformance in their OPTIONS sections do not always conform.
Guidelines 1 and 2 encourage utility writers to use only characters from the portable character set because use of locale-specific characters may make the utility inaccessible from other locales. Use of uppercase letters is discouraged due to problems associated with porting utilities to systems that do not distinguish between uppercase and lowercase characters in filenames. Use of non-alphanumeric characters is discouraged due to the number of utilities that treat non-alphanumeric characters in "special" ways depending on context (such as the shell using white-space characters to delimit arguments, various quote characters for quoting, the <dollar-sign> to introduce variable expansion, etc.).
In XCU 2.9.1 Simple Commands, it is further stated that a command used in the Shell Command Language cannot be named with a trailing <colon>.
Guideline 3 was changed to allow alphanumeric characters (letters and digits) from the character set to allow compatibility with historical usage. Historical practice allows the use of digits wherever practical, and there are no portability issues that would prohibit the use of digits. In fact, from an internationalization viewpoint, digits (being non-language-dependent) are preferable over letters (a -2 is intuitively self-explanatory to any user, while in the -f filename the letter 'f' is a mnemonic aid only to speakers of Latin-based languages where "filename" happens to translate to a word that begins with 'f'. Since Guideline 3 still retains the word "single", multi-digit options are not allowed. Instances of historical utilities that used them have been marked obsolescent, with the numbers being changed from option names to option-arguments.
It was difficult to achieve a satisfactory solution to the problem of name space in option characters. When the standard developers desired to extend the historical cc utility to accept ISO C standard programs, they found that all of the portable alphabet was already in use by various vendors. Thus, they had to devise a new name, c89 (subsequently superseded by c99 and now by c17), rather than something like cc -X. There were suggestions that implementors be restricted to providing extensions through various means (such as using a <plus-sign> as the option delimiter or using option characters outside the alphanumeric set) that would reserve all of the remaining alphanumeric characters for future POSIX standards. These approaches were resisted because they lacked the historical style of UNIX systems. Furthermore, if a vendor-provided option should become commonly used in the industry, it would be a candidate for standardization. It would be desirable to standardize such a feature using historical practice for the syntax (the semantics can be standardized with any syntax). This would not be possible if the syntax was one reserved for the vendor. However, since the standardization process may lead to minor changes in the semantics, it may prove to be better for a vendor to use a syntax that will not be affected by standardization.
Guideline 8 includes the concept of <comma>-separated lists in a single argument. It is up to the utility to parse such a list itself because getopt() just returns the single string. This situation was retained so that certain historical utilities would not violate the guidelines. Applications preparing for international use should be aware of an occasional problem with <comma>-separated lists: in some locales, the <comma> is used as the radix character. Thus, if an application is preparing operands for a utility that expects a <comma>-separated list, it should avoid generating non-integer values through one of the means that is influenced by setting the LC_NUMERIC variable (such as awk, bc, printf, or printf()).
Unless explicitly stated otherwise in the utility description, Guideline 9 requires applications to put options before operands, and requires utilities to accept any such usage without misinterpreting operands as options. For example, if an implementation of the printf utility supports a -e option as an extension, the command:
printf %s -e
must output the string "-e" without interpreting the -e as an option. Similarly, the command:
ls myfile -l
must interpret the -l argument as a second file operand, not as a -l option.
Applications calling any utility with a first operand starting with '-' should usually specify --, as indicated by Guideline 10, to mark the end of the options. This is true even if the SYNOPSIS in the Shell and Utilities volume of POSIX.1-2024 does not specify any options; implementations may provide options as extensions to the Shell and Utilities volume of POSIX.1-2024. The standard utilities that do not support Guideline 10 indicate that fact in the OPTIONS section of the utility description.
Guideline 7 allows any string to be an option-argument; an option-argument can begin with any character, can be - or --, and can be an empty string. For example, the commands pr -h -, pr -h --, pr -h -d, pr -h +2, and pr -h " contain the option-arguments -, --, -d, +2, and an empty string, respectively. Conversely, the command pr -h -- -d treats -d as an option, not as an argument, because the -- is an option-argument here, not a delimiter.
Guideline 11 was modified to clarify that the order of different options should not matter relative to one another. However, the order of repeated options that also have option-arguments may be significant; therefore, such options are required to be interpreted in the order that they are specified. The make utility is an instance of a historical utility that uses repeated options in which the order is significant. Multiple files are specified by giving multiple instances of the -f option; for example:
make -f common_header -f specific_rules target
Guideline 13 does not imply that all of the standard utilities automatically accept the operand '-' to mean standard input or output, nor does it specify the actions of the utility upon encountering multiple '-' operands. It simply says that, by default, '-' operands are not used for other purposes in the file reading or writing (but not when using stat(), unlink(), touch, and so on) utilities. In earlier versions of this standard, all information concerning actual treatment of the '-' operand is found in the individual utility sections. Many implementations, however, treated '-' as standard input or output and many applications depended on this behavior even though it was not standard. This behavior is now implementation-defined. Portable applications should not use '-' to mean standard input or output unless it is explicitly stated to do so in the utility description and they should always use './-' if they intend to refer to a file named - in the current working directory.
Guideline 14 is intended to prohibit implementations that would treat the command ls -l -d as if it were ls -- -l -d or ls -l -- -d.
The standard permits implementations to have extensions that violate the Utility Syntax Guidelines so long as when the utility is used in line with the forms defined by the standard it follows the Utility Syntax Guidelines. Thus, CONVERSION ERROR (.Cm) head -42 file and ls--help are permitted extensions. The intent is to allow extensions so long as the standard form is accepted and follows the guidelines.
An area of concern was that as implementations mature, implementation-defined utilities and implementation-defined utility options will result. The idea was expressed that there needed to be a standard way, say an environment variable or some such mechanism, to identify implementation-defined utilities separately from standard utilities that may have the same name. It was decided that there already exist several ways of dealing with this situation and that it is outside of the scope to attempt to standardize in the area of non-standard items. A method that exists on some historical implementations is the use of the so-called /local/bin or /usr/local/bin directory to separate local or additional copies or versions of utilities. Another method that is also used is to isolate utilities into completely separate domains. Still another method to ensure that the desired utility is being used is to request the utility by its full pathname. There are many approaches to this situation; the examples given above serve to illustrate that there is more than one.
Each header reference page has a common layout of sections describing the interface. This layout is similar to the manual page or "man" page format shipped with most UNIX systems, and each header has sections describing the SYNOPSIS and DESCRIPTION. These are the two sections that relate to conformance.
Additional sections are informative, and add considerable information for the application developer. APPLICATION USAGE sections provide additional caveats, issues, and recommendations to the developer. RATIONALE sections give additional information on the decisions made in defining the interface.
FUTURE DIRECTIONS sections act as pointers to related work that may impact the interface in the future, and often cautions the developer to architect the code to account for a change in this area. Note that a future directions statement should not be taken as a commitment to adopt a feature or interface in the future.
The CHANGE HISTORY section describes when the interface was introduced, and how it has changed.
Option labels and margin markings in the page can be useful in guiding the application developer.
The headers removed in Issue 8 (from the Issue 7 base document) are as follows:
Removed Headers in Issue 8 |
|
---|---|
|
|
|
|
return to top of page