This section summarizes the deliberations of the IEEE P1003.15 (Batch Environment) working group in the development of the Batch Environment Services and Utilities option, which covers a set of services and utilities defining a batch processing system.
This informative section contains historical information concerning the contents of the amendment and describes why features were included or discarded by the working group.
The supercomputing technical committee began as a "Birds Of a Feather" (BOF) at the January 1987 Usenix meeting. There was enough general interest to form a supercomputing attachment to the /usr/group working groups. Several subgroups rapidly formed. Of those subgroups, the batch group was the most ambitious. The first early meetings were spent evaluating user needs and existing batch implementations.
To evaluate user needs, individuals from the supercomputing community came and presented their needs. Common requests were flexibility, interoperability, control of resources, and ease-of-use. Backward-compatibility was not an issue. The working group then evaluated some existing systems. The following different systems were evaluated:
Convex Distributed Batch
MDQS from Ballistics Research Laboratory (BRL)
Finally, NQS was chosen as a model because it satisfied not only the most user requirements, but because it was public domain, already implemented on a variety of hardware platforms, and network-based.
Deferred processing of work under the control of a scheduler has been a feature of most proprietary operating systems from the earliest days of multi-user systems in order to maximize utilization of the computer.
The arrival of UNIX systems proved to be a dilemma to many hardware providers and users because it did not include the sophisticated batch facilities offered by the proprietary systems. This omission was rectified in 1986 by NASA Ames Research Center who developed the Network Queuing System (NQS) as a portable UNIX application that allowed the routing and processing of batch "jobs" in a network. To encourage its usage, the product was later put into the public domain. It was promptly picked up by UNIX hardware providers, and ported and developed for their respective hardware and UNIX implementations.
Many major vendors, who traditionally offer a batch-dominated environment, ported the public-domain product to their systems, customized it to support the capabilities of their systems, and added many customer-requested features.
Due to the strong hardware provider and customer acceptance of NQS, it was decided to use NQS as the basis for the POSIX Batch Environment amendment in 1987. Other batch systems considered at the time included CTSS, MDQS (a forerunner of NQS from the Ballistics Research Laboratory), and PROD (a Los Alamos Labs development). None were thought to have both the functionality and acceptability of NQS.
The base standard at and batch utilities are not sufficient to meet the batch processing needs in a supercomputing environment and additional functionality in the areas of resource management, job scheduling, system management, and control of output is required.
The concept of a batch job is closely related to a session with a session leader. The main difference is that a batch job does not have a controlling terminal. There has been much debate over whether to use the term "request" or "job". Job was the final choice because of the historical use of this term in the batch environment.
The current definition for job identifiers is not sufficient with the model of destinations. The current definition is:
Using the model of destination, a host may include multiple batch nodes, the location of which is identified uniquely by a name or directory service. If the current definition is used, batch nodes running on the same host would have to coordinate their use of sequence numbers, as sequence numbers are assigned by the originating host. The alternative is to use the originating batch node name instead of the originating host name.
The reasons for wishing to run more than one batch system per host could be the following.
A test and production batch system are maintained on a single host. This is most likely in a development facility, but could also arise when a site is moving from one version to another. The new batch system could be installed as a test version that is completely separate from the production batch system, so that problems can be isolated to the test system. Requiring the batch nodes to coordinate their use of sequence numbers creates a dependency between the two nodes, and that defeats the purpose of running two nodes.
A site has multiple departments using a single host, with different management policies. An example of contention might be in job selection algorithms. One group might want a FIFO type of selection, while another group wishes to use a more complex algorithm based on resource availability. Again, requiring the batch nodes to coordinate is an unnecessary binding.
The proposal eventually accepted was to replace originating host with originating batch node. This supplies sufficient granularity to ensure unique job identifiers. If more than one batch node is on a particular host, they each have their own unique name.
The queue portion of a destination is not part of the job identifier as these are not required to be unique between batch nodes. For instance, two batch nodes may both have queues called small, medium, and large. It is only the batch node name that is uniquely identifiable throughout the batch system. The queue name has no additional function in this context.
Assume there are three batch nodes, each of which has its own name server. On batch node one, there are no queues. On batch node two, there are fifty queues. On batch node three, there are forty queues. The system administrator for batch node one does not have to configure queues, because there are none implemented. However, if a user wishes to send a job to either batch node two or three, the system administrator for batch node one must configure a destination that maps to the appropriate batch node and queue. If every queue is to be made accessible from batch node one, the system administrator has to configure ninety destinations.
To avoid requiring this, there should be a mechanism to allow a user to separate the destination into a batch node name and a queue name. Then, an implementation that is configured to get to all the batch nodes does not need any more configuration to allow a user to get to all of the queues on all of the batch nodes. The node name is used to locate the batch node, while the queue name is sent unchanged to that batch node.
The following are requirements that a destination identifier must be capable of providing:
The ability to direct a job to a queue in a particular batch node.
The ability to direct a job to a particular batch node.
The ability to group at a higher level than just one queue. This includes grouping similar queues across multiple batch nodes (this is a pipe queue).
The ability to group batch nodes. This allows a user to submit a job to a group name with no knowledge of the batch node configuration. This also provides aliasing as a special case. Aliasing is a group containing only one batch node name. The group name is the alias.
In addition, the administrator has the following requirements:
The ability to control access to the queues.
The ability to control access to the batch nodes.
The ability to control access to groups of queues (pipe queues).
The ability to configure retry time intervals and durations.
The requirements of the user are met by destination as explained in the following.
The user has the ability to specify a queue name, which is known only to the batch node specified. There is no configuration of these queues required on the submitting node.
The user has the ability to specify a batch node whose name is network-unique. The configuration required is that the batch node be defined as an application, just as other applications such as FTP are configured.
Once a job reaches a queue, it can again become a user of the batch system. The batch node can choose to send the job to another batch node or queue or both. In other words, the routing is at an application level, and it is up to the batch system to choose where the job will be sent. Configuration is up to the batch node where the queue resides. This provides grouping of queues across batch nodes or within a batch node. The user submits the job to a queue, which by definition routes the job to other queues or nodes or both.
A node name may be given to a naming service, which returns multiple addresses as opposed to just one. This provides grouping at a batch node level. This is a local issue, meaning that the batch node must choose only one of these addresses. The list of addresses is not sent with the job, and once the job is accepted on another node, there is no connection between the list and the job. The requirements of the administrator are met by destination as explained in the following.
The control of queues is a batch system issue, and will be done using the batch administrative utilities.
The control of nodes is a network issue, and will be done through whatever network facilities are available.
The control of access to groups of queues (pipe queues) is covered by the control of any other queue. The fact that the job may then be sent to another destination is not relevant.
The propagation of a job across more than one point-to-point connection was dropped because of its complexity and because all of the issues arising from this capability could not be resolved. It could be provided as additional functionality at some time in the future.
The addition of network as a defined term was done to clarify the difference between a network of batch nodes as opposed to a network of hosts. A network of batch nodes is referred to as a batch system. The network refers to the actual host configuration. A single host may have multiple batch nodes.
In the absence of a standard network naming convention, this option establishes its own convention for the sake of consistency and expediency. This is subject to change, should a future working group develop a standard naming convention for network pathnames.
During the development of the Batch Environment Services and Utilities option, a number of topics were discussed at length which influenced the wording of the normative text but could not be included in the final text. The following items are some of the most significant terms and concepts of those discussed:
Small and Consistent Command Set
Often, conventional utilities from UNIX systems have a very complicated utility syntax and usage. This can often result in confusion and errors when trying to use them. The Batch Environment Services and Utilities option utility set, on the other hand, has been paired to a small set of robust utilities with an orthogonal calling sequence.
This feature permits an already executing process to checkpoint or save its contents. Some implementations permit this at both the batch utility level (for example, checkpointing this job upon its abnormal termination) or from within the job itself via a system call. Support of checkpoint/restart is optional. A conscious, careful effort was made to make the qsub utility consistently refer to checkpoint/restart as optional functionality.
When a user submits a job for batch processing, they can designate it "rerunnable" in that it will automatically resume execution from the start of the job if the machine on which it was executing crashes for some reason. The decision on whether the job will be rerun or not is entirely up to the submitter of the job and no decisions will be made within the batch system. A job that is rerunnable and has been submitted with the proper checkpoint/restart switch will first be checkpointed and execution begun from that point. Furthermore, use of the implementation-defined checkpoint/restart feature will not be defined in this context.
All utilities exit with error status zero (0) if successful, one (1) if a user error occurred, and two (2) for an internal Batch Environment Services and Utilities option error.
Level of Portability
Portability is specified at both the user, operator, and administrator levels. A conforming batch implementation prevents identical functionality and behavior at all these levels. Additionally, portable batch shell scripts with embedded Batch Environment Services and Utilities option utilities add an additional level of portability.
A small set of globally understood resources, such as memory and CPU time, is specified. All conforming batch implementations are able to process them in a manner consistent with the yet-to-be-developed resource management model. Resources not in this amendment set are ignored and passed along as part of the argument stream of the utility.
Queue position is the place a job occupies in a queue. It is dependent on a variety of factors such as submission time and priority. Since priority may be affected by the implementation of fair share scheduling, the definition of queue position is implementation-defined.
A numerical queue ID is an external requirement for purposes of accounting. The identification number was chosen over queue name for processing convenience.
A common notion of "jobs" is a collection of processes whose process group cannot be altered and is used for resource management and accounting. This concept is implementation-defined and, as such, has been omitted from the batch amendment.
Bytes versus Words
Except for one case, bytes are used as the standard unit for memory size. Furthermore, the definition of a word varies from machine to machine. Therefore, bytes will be the default unit of memory size.
The standard definition of regular expressions is much too broad to be used in the batch utility syntax. All that is needed is a simple concept of "all''; for example, delete all my jobs from the named queue. For this reason, regular expressions have been eliminated from the batch amendment.
How much data should be displayed locally through functions? Local policy dictates the amount of privacy. Library functions must be used to create and enforce local policy. Network and local qstats must reflect the policy of the server machine.
Remote Host Naming Convention
It was decided that host names would be a maximum of 255 characters in length, with at most 15 characters being shown in displays. The 255 character limit was chosen because it is consistent with BSD. The 15-character limit was an arbitrary decision.
Network administration is important, but is outside the scope of the batch amendment. Network administration could be done with rsh. However, authentication becomes two-sided.
Network Administration Philosophy
Keep it simple. Centralized management should be possible. For example, Los Alamos needs a dumb set of CPUs to be managed by a central system versus several independently-managed systems as is the general case for the Batch Environment Services and Utilities option.
Operator Utility Defaults (that is, Default Host, User, Account, and so on)
It was decided that usability would override orthogonality and syntactic consistency.
The Batch System Manager and Operator Distinction
The distinction between manager and operator is that operators can only control the flow of jobs. A manager can alter the batch system configuration in addition to job flow. POSIX makes a distinction between user and system administrator but goes no further. The concepts of manager and operator privileges fall under local policy. The distinction between manager and operator is historical in batch environments, and the Batch Environment Services and Utilities option has continued that distinction.
The Batch System Administrator
An administrator is equivalent to a batch system manager.
This rationale is provided as informative rather than normative text, to avoid placing requirements on implementors regarding the use of symbolic constants, but at the same time to give implementors a preferred practice for assigning values to these constants to promote interoperability.
The Checkpoint and Minimum_Cpu_Interval attributes induce a variety of behavior depending upon their values. Some jobs cannot or should not be checkpointed. Other users will simply need to ensure job continuation across planned downtimes; for example, scheduled preventive maintenance. For users consuming expensive resources, or for jobs that run longer than the mean time between failures, however, periodic checkpointing may be essential. However, system administrators must be able to set minimum checkpoint intervals on a queue-by-queue basis to guard against, for example, naive users specifying interval values too small on memory-intensive jobs. Otherwise, system overhead would adversely affect performance.
The use of symbolic constants, such as NO_CHECKPOINT, was introduced to lend a degree of formalism and portability to this option.
Support for checkpointing is optional for servers. However, clients must provide for the -c option, since in a distributed environment the job may run on a server that does provide such support, even if the host of the client does not support the checkpoint feature.
If the user does not specify the -c option, the default action is left unspecified by this option. Some implementations may wish to do checkpointing by default; others may wish to checkpoint only under an explicit request from the user.
The Priority attribute has been made non-optional. All clients already had been required to support the -p option. The concept of prioritization is common in historical implementations. The default priority is left to the server to establish.
The Hold_Types attribute has been modified to allow for implementation-defined hold types to be passed to a batch server.
It was the intent of the IEEE P1003.15 working group to mandate the support for the Resource_List attribute in this option by referring to another amendment, specifically the IEEE P1003.1a draft standard. However, during the development of the IEEE P1003.1a draft standard this was excluded. As such this requirement has been removed from the normative text.
The Shell_Path attribute has been modified to accept a list of shell paths that are associated with a host. The name of the attribute has been changed to Shell_Path_List.
This section was defined to meet the goal of a "Small and Consistent Command Set" for this option.