Realtime
This section defines system interfaces to support the source portability of applications with realtime requirements. The definition of realtime used in defining the scope of XSI provisions is:
- Realtime in operating systems: the ability of the operating system to provide a required level of service in a bounded response time.
The key elements of defining the scope are:
- defining a sufficient set of functionality to cover a significant part of the realtime application program domain, and
- defining sufficient performance constraints and performance-related functions to allow a realtime application to achieve deterministic response from the system.
Specifically within the scope, it is required to define interfaces that do not preclude high-performance implementations on traditional uniprocessor realtime systems.
Wherever possible, the requirements of other application environments are included in this interface definition. It is beyond the scope of these interfaces to support networking or multiprocessor functionality.
The specific functional areas included in this section and their scope include:
- Semaphores: A minimum synchronisation primitive to serve as a basis for more complex synchronisation mechanisms to be defined by the application program.
- Process memory locking: A performance improvement facility to bind application programs into the high-performance random access memory of a computer system. This avoids potential latencies introduced by the operating system in storing parts of a program that were not recently referenced on secondary memory devices.
- Memory mapped files and shared memory objects: A performance improvement facility to allow for programs to access files as part of the address space and for separate application programs to have portions of their address space commonly accessible.
- Priority scheduling: A performance and determinism improvement facility to allow applications to determine the order in which threads that are ready to run are granted access to processor resources.
- Realtime signal extension: A determinism improvement facility that augments the BASE signals mechanism to enable asynchronous signal notifications to an application to be queued without impacting compatibility with the existing signals interface.
- Timers: A functionality and determinism improvement facility to increase the resolution and capabilities of the time-base interface.
- POSIX Interprocess communication: A functionality enhancement to add a high-performance, deterministic interprocess communication facility for local communication. Network transparency is beyond the scope of this interface.
- Synchronised input and output: A determinism and robustness improvement mechanism to enhance the data input and output mechanisms, so that an application can insure that the data being manipulated is physically present on secondary mass storage devices.
- Asynchronous input and output: A functionality enhancement to allow an application process to queue data input and output commands with asynchronous notification of completion. This facility includes in its scope the requirements of supercomputer applications.
All the interfaces defined in the Realtime Feature Group will be portable, although some of the numeric parameters used by an implementation may have hardware dependencies.
Signal Generation and Delivery
Some signal-generating functions, such as high-resolution timer expiration, asynchronous I/O completion, interprocess message arrival, and the sigqueue() function, support the specification of an application-defined value, either explicitly as a parameter to the function or in a sigevent structure parameter. The sigevent structure is defined in <signal.h> and contains at least the following members:
Member Type Member Name Description int sigev_notify Notification type int sigev_signo Signal number union sigval sigev_value Signal value void(*)(unsigned sigval) sigev_notify_function Notification (pthread_attr_t*) sigev_notify_attributes Notification attributes The sigev_notify member specifies the notification mechanism to use when an asynchronous event occurs. This document defines the following values for the sigev_notify member:
- SIGEV_NONE
- No asynchronous notification will be delivered when the event of interest occurs.
- SIGEV_SIGNAL
- A queued signal, with an application-defined value, will be generated when the event of interest occurs.
- SIGEV_THREAD
- A notification function will be called to perform notification.
An implementation may define additional notification mechanisms.
The sigev_signo member specifies the signal to be generated. The sigev_value member is the application-defined value to be passed to the signal-catching function at the time of the signal delivery as the si_value member of the siginfo_t structure.
The sigval union is defined in <signal.h> and contains at least the following members:
The sival_int member is used when the application-defined value is of type int; the sival_ptr member is used when the application-defined value is a pointer.
Member Type Member Name Description int sival_int Integer signal value void * sival_ptr Pointer signal value When a signal is generated by the sigqueue() function or any signal-generating function that supports the specification of an application-defined value, the signal is marked pending and, if the SA_SIGINFO flag is set for that signal, the signal is queued to the process along with the application-specified signal value. Multiple occurrences of signals so generated are queued in FIFO order. It is unspecified whether signals so generated are queued when the SA_SIGINFO flag is not set for that signal.
Signals generated by the kill() function or other events that cause signals to occur, such as detection of hardware faults, alarm() timer expiration, or terminal activity, and for which the implementation does not support queuing, have no effect on signals already queued for the same signal number.
When multiple unblocked signals, all in the range SIGRTMIN to SIGRTMAX, are pending, the behaviour is as if the implementation delivers the pending unblocked signal with the lowest signal number within that range. No other ordering of signal delivery is specified.
If, when a pending signal is delivered, there are additional signals queued to that signal number, the signal remains pending. Otherwise, the pending indication is reset.
Asynchronous I/O
An asynchronous I/O control block structure aiocb is used in many asynchronous I/O function interfaces. It is defined in <aio.h> and has at least the following members:
Member Type Member Name Description int aio_fildes File descriptor off_t aio_offset File offset volatile void* aio_buf Location of buffer size_t aio_nbytes Length of transfer int aio_reqprio Request priority offset struct sigevent aio_sigevent Signal number and value int aio_lio_opcode Operation to be performed The aio_fildes element is the file descriptor on which the asynchronous operation is to be performed.
If O_APPEND is not set for the file descriptor aio_fildes, and if aio_fildes is associated with a device that is capable of seeking, then the requested operation takes place at the absolute position in the file as given by aio_offset, as if lseek() were called immediately prior to the operation with an offset argument equal to aio_offset and a whence argument equal to SEEK_SET . If O_APPEND is set for the file descriptor, or if aio_fildes is associated with a device that is incapable of seeking, write operations append to the file in the same order as the calls were made, with the following exception. Under implementation-dependent circumstances, such as operation on a multiprocessor or when requests of differing priorities are submitted at the same time, the ordering restriction may be relaxed. After a successful call to enqueue an asynchronous I/O operation, the value of the file offset for the file is unspecified. The aio_nbytes and aio_buf elements are the same as the nbyte and buf arguments defined by read() and write() respectively.
If _POSIX_PRIORITIZED_IO and _POSIX_PRIORITY_SCHEDULING are defined, then asynchronous I/O is queued in priority order, with the priority of each asynchronous operation based on the current scheduling priority of the calling process. The aio_reqprio member can be used to lower (but not raise) the asynchronous I/O operation priority and will be within the range zero through AIO_PRIO_DELTA_MAX, inclusive. The order of processing of requests submitted by processes whose schedulers are not SCHED_FIFO or SCHED_RR is unspecified. The priority of an asynchronous request is computed as (process scheduling priority) minus aio_reqprio. The priority assigned to each asynchronous I/O request is an indication of the desired order of execution of the request relative to other asynchronous I/O requests for this file. If _POSIX_PRIORITIZED_IO is defined, requests issued with the same priority to a character special file will be processed by the underlying device in FIFO order; the order of processing of requests of the same priority issued to files that are not character special files is unspecified. Numerically higher priority values indicate requests of higher priority. The value of aio_reqprio has no effect on process scheduling priority. When prioritized asynchronous I/O requests to the same file are blocked waiting for a resource required for that I/O operation, the higher-priority I/O requests will be granted the resource before lower-priority I/O requests are granted the resource. The relative priority of asynchronous I/O and synchronous I/O is implementation-dependent. If _POSIX_PRIORITIZED_IO is defined, the implementation defines for which files I/O prioritization is supported.
The aio_sigevent determines how the calling process will be notified upon I/O completion as specified in
Signal Generation and Delivery . If aio_sigevent.sigev_notify is SIGEV_NONE, then no signal will be posted upon I/O completion, but the error status for the operation and the return status for the operation will be set appropriately.The aio_lio_opcode field is used only by the lio_listio() call. The lio_listio() call allows multiple asynchronous I/O operations to be submitted at a single time. The function takes as an argument an array of pointers to aiocb structures. Each aiocb structure indicates the operation to be performed (read or write) via the aio_lio_opcode field.
The address of the aiocb structure is used as a handle for retrieving the error status and return status of the asynchronous operation while it is in progress.
The aiocb structure and the data buffers associated with the asynchronous I/O operation are being used by the system for asynchronous I/O while, and only while, the error status of the asynchronous operation is equal to EINPROGRESS. Applications must not modify the aiocb structure while the structure is being used by the system for asynchronous I/O.
The return status of the asynchronous operation is the number of bytes transferred by the I/O operation. If the error status is set to indicate an error completion, then the return status is set to the return value that the corresponding read(), write(), or fsync() call would have returned. When the error status is not equal to EINPROGRESS, the return status reflects the return status of the corresponding synchronous operation.
Memory Management
Range memory locking and memory mapping operations are defined in terms of pages. Implementations may restrict the size and alignment of range lockings and mappings to be on page-size boundaries. The page size, in bytes, is the value of the configurable system variable {PAGESIZE}. If an implementation has no restrictions on size or alignment, it may specify a 1 byte page size.Memory locking guarantees the residence of portions of the address space. It is implementation-dependent whether locking memory guarantees fixed translation between virtual addresses (as seen by the process) and physical addresses. Per-process memory locks are not inherited across a fork(), and all memory locks owned by a process are unlocked upon exec or process termination. Unmapping of an address range removes any memory locks established on that address range by this process.
Memory Mapped Files provide a mechanism that allows a process to access files by directly incorporating file data into its address space. Once a file is mapped into a process address space, the data can be manipulated as memory. If more than one process maps a file, its contents are shared among them. If the mappings allow shared write access then data written into the memory object through the address space of one process appears in the address spaces of all processes that similarly map the same portion of the memory object.
Shared memory objects are named regions of storage that may be independent of the file system and can be mapped into the address space of one or more processes to allow them to share the associated memory.
An unlink() of a file or shm_unlink() of a shared memory object, while causing the removal of the name, does not unmap any mappings established for the object. Once the name has been removed, the contents of the memory object are preserved as long as it is referenced. The memory object remains referenced as long as a process has the memory object open or has some area of the memory object mapped.
Mapping may be restricted to disallow some types of access. References to whole pages within the mapping but beyond the current length of an object result in a SIGBUS signal. SIGBUS is used in this context to indicate an error using the object. The size of the object is unaffected by access beyond the end of the object. Write attempts to memory that was mapped without write access, or any access to memory mapped PROT_NONE, results in a SIGSEGV signal. SIGSEGV is used in this context to indicate a mapping error. References to unmapped addresses result in a SIGSEGV signal.
Scheduling Policies
The scheduling semantics described in this specification are defined in terms of a conceptual model that contains a set of thread lists. No implementation structures are necessarily implied by the use of this conceptual model. It is assumed that no time elapses during operations described using this model, and therefore no simultaneous operations are possible. This model discusses only processor scheduling for runnable threads, but it should be noted that greatly enhanced predictability of realtime applications will result if the sequencing of other resources takes processor scheduling policy into account.
There is, conceptually, one thread list for each priority. Any runnable thread may be on any thread list. Multiple scheduling policies are provided. Each non-empty thread list is ordered, contains a head as one end of its order, and a tail as the other. The purpose of a scheduling policy is to define the allowable operations on this set of lists (for example, moving threads between and within lists).
Each process is controlled by an associated scheduling policy and priority. These parameters may be specified by explicit application execution of the sched_setscheduler() or sched_setparam() functions.
Each thread is controlled by an associated scheduling policy and priority. These parameters may be specified by explicit application execution of the pthread_setschedparam() function.
Associated with each policy is a priority range. Each policy definition specifies the minimum priority range for that policy. The priority ranges for each policy may or may not overlap the priority ranges of other policies.
A conforming implementation selects the thread that is defined as being at the head of the highest priority non-empty thread list to become a running thread, regardless of its associated policy. This thread is then removed from its thread list.
Three scheduling policies are specifically required. Other implementation-dependent scheduling policies may be defined. The following symbols are defined in the header <sched.h>:
Symbol Description SCHED_FIFO First in-first out (FIFO) scheduling policy. SCHED_RR Round robin scheduling policy. SCHED_OTHER Another scheduling policy. The values of these symbols will be distinct.
SCHED_FIFO
Conforming implementations include a scheduling policy called the FIFO scheduling policy.Threads scheduled under this policy are chosen from a thread list that is ordered by the time its threads have been on the list without being executed; generally, the head of the list is the thread that has been on the list the longest time, and the tail is the thread that has been on the list the shortest time.
Under the SCHED_FIFO policy, the modification of the definitional thread lists is as follows:
- When a running thread becomes a preempted thread, it becomes the head of the thread list for its priority.
- When a blocked thread becomes a runnable thread, it becomes the tail of the thread list for its priority.
- When a running thread calls the sched_setscheduler() function, the process specified in the function call is modified to the specified policy and the priority specified by the param argument.
- When a running thread calls the sched_setparam() function, the priority of the process specified in the function call is modified to the priority specified by the param argument.
- When a running thread calls the pthread_setschedparam() function, the thread specified in the function call is modified to the specified policy and the priority specified by the param argument.
- If a thread whose policy or priority has been modified is a running thread or is runnable, it then becomes the tail of the thread list for its new priority.
- When a running thread issues the sched_yield() function, the thread becomes the tail of the thread list for its priority.
- At no other time will the position of a thread with this scheduling policy within the thread lists be affected.
For this policy, valid priorities shall be within the range returned by the function sched_get_priority_max() and sched_get_priority_min() when SCHED_FIFO is provided as the parameter. Conforming implementations provide a priority range of at least 32 priorities for this policy.
SCHED_RR
Conforming implementations include a scheduling policy called the round robin scheduling policy. This policy is identical to the SCHED_FIFO policy with the additional condition that when the implementation detects that a running thread has been executing as a running thread for a time period of the length returned by the function sched_rr_get_interval() or longer, the thread becomes the tail of its thread list and the head of that thread list is removed and made a running thread.
The effect of this policy is to ensure that if there are multiple SCHED_RR threads at the same priority, one of them will not monopolise the processor. An application should not rely only on the use of SCHED_RR to ensure application progress among multiple threads if the application includes threads using the SCHED_FIFO policy at the same or higher priority levels or SCHED_RR threads at a higher priority level.
A thread under this policy that is preempted and subsequently resumes execution as a running thread completes the unexpired portion of its round-robin-interval time period.
For this policy, valid priorities will be within the range returned by the functions sched_get_priority_max() and sched_get_priority_min() when SCHED_RR is provided as the parameter. Conforming implementations will provide a priority range of at least 32 priorities for this policy.
SCHED_OTHER
Conforming implementations include one scheduling policy identified as SCHED_OTHER (which may execute identically with either the FIFO or round robin scheduling policy). The effect of scheduling threads with the SCHED_OTHER policy in a system in which other threads are executing under SCHED_FIFO or SCHED_RR is implementation-dependent.
This policy is defined to allow conforming applications to be able to indicate that they no longer need a realtime scheduling policy in a portable manner.
For threads executing under this policy, the implementation uses only priorities within the range returned by the functions sched_get_priority_max() and sched_get_priority_min() when SCHED_OTHER is provided as the parameter.
Clocks and Timers
The header file <time.h> defines the types and manifest constants used by the timing facility.Time Value Specification Structures
Many of the timing facility functions accept or return time value specifications. A time value structure timespec specifies a single time value and includes at least the following members:
Member Type Member Name Description time_t tv_sec Seconds long tv_nsec Nanoseconds The tv_nsec member is only valid if greater than or equal to zero, and less than the number of nanoseconds in a second (1000 million). The time interval described by this structure is (tv_sec * 10'-.4m'9'.4m' + tv_nsec) nanoseconds.
A time value structure itimerspec specifies an initial timer value and a repetition interval for use by the per-process timer functions. This structure includes at least the following members:
Member Type Member Name Description struct timespec it_interval Timer period struct timespec it_value Timer expiration If the value described by it_value is non-zero, it indicates the time to or time of the next timer expiration (for relative and absolute timer values, respectively). If the value described by it_value is zero, the timer is disarmed.
If the value described by it_interval is non-zero, it specifies an interval to be used in reloading the timer when it expires; that is, a periodic timer is specified. If the value described by it_interval is zero, the timer will be disarmed after its next expiration; that is, a one-shot timer is specified.
Timer Event Notification Control Block
Per-process timers may be created that notify the process of timer expirations by queuing a realtime extended signal. The sigevent structure, defined in <signal.h>, is used in creating such a timer. The sigevent structure contains the signal number and an application-specific data value to be used when notifying the calling process of timer expiration events.
Manifest Constants
The following constants are defined in <time.h>:
- CLOCK_REALTIME
- The identifier for the systemwide realtime clock.
- TIMER_ABSTIME
- Flag indicating time is absolute with respect to the clock associated with a timer.
The maximum allowable resolution for the CLOCK_REALTIME clock and all timers based on this clock, including the nanosleep() function, is represented by {_POSIX_CLOCKRES_MIN} and is defined as 20 ms (1/50 of a second). Implementations may support smaller values of resolution for the CLOCK_REALTIME clock to provide finer granularity time bases.
The minimum allowable maximum value for the CLOCK_REALTIME clock and absolute timers based on it is the same as that defined by the ISO C standard for the time_t type.