Universal Measurement Architecture

Universal Measurement Architecture Guide
Copyright © 1997 The Open Group

Introduction

Performance Measurement in Open Systems

The commercialisation of POSIX-based computing is continuing at a rapid pace, adding capabilities not just expected, but desperately needed by MIS shops and new commercial users. One such feature is performance management. The users familiar with mainframe data processing environments are used to having sophisticated tools available to determine resource utilisation, predict system capacities and growth paths, and even to compare CPU models for making purchase decisions.

Although the open system concept is creating a revolution in applications development and system migration paths, certain capabilities (such as performance management) have not been standardised. Currently there is generally insufficient performance management functionality in Open Systems, and even where it does exist it is often provided in a different way on systems from different vendors.

Key areas of work in the development of the UMA specifications include performance data availability and interfaces for its collection. Until the data and interfaces are standardised, each computer vendor, performance software vendor, or large end user is faced with the task of kernel modification to collect the necessary data, development of a proprietary kernel interface to move the data to user-space, and development of custom performance monitoring and management software. Until such interfaces are standardised, few performance management tools will be built because of the cost of their migration between operating system versions or POSIX-based system implementations.

As open systems become the operating systems of choice for larger, faster, and more complex computer systems, there is an increased need to effectively manage these systems. But there exists little software to support performance management of these complex systems. For example, administrators of standard UNIX systems must rely on the system activity reporter (sar) data to manage their systems. However, such information is often insufficent in scope, inadequate in depth and cannot be properly controlled, especially by multiple performance management applications in distributed environments. Performance management of large applications, including databases, often has to rely on accounting data to measure activity, but such data can be inappropriate since it was intended for a different purpose.

There are several reasons for the lack of performance management software. One reason is that many of the desired metrics are not available. Another reason is the fear that release-to-release kernel changes will make it necessary to frequently modify performance-related applications. This discourages developers from using any but the most basic metrics or developing any but the most basic applications, particularly in cases where the application must execute on platforms supplied by different vendors. There are, furthermore, no well-defined interfaces for obtaining even the existing performance data from the kernel, and the current access methods are restrictive and expensive.

Issues addressed by the UMA

In this section, the reasons for the definition of the UMA are outlined in terms of the issues that have arisen with existing performance facilities in Open Systems.

Kernel Data

Extracting performance data from the kernel of an Open System has traditionally been done by methods which involve user level utilities accessing the kernel data structures. An example of this is the UNIX /dev/kmem interface which has historically been the primary interface used by UNIX System performance measurement utilities for extracting data from the kernel. This mechanism generally relies on the user level performance utility using the name of a particular data structure to derive from the symbol table the virtual address of the structure. It can then access the kernel data (using /dev/kmem in the case of UNIX) to seek to and read the value of that data structure. The advantage of this approach is its generality: if the address of a data structure can be found, its value can be read. But its generality is also a disadvantage. Since almost any data structure can be used to provide performance data, the tendency is to do so without regard to whether it is supported. This makes it very difficult to maintain a performance application across releases when data structures change. For example, programs such as ps and sadc have been notoriously difficult to maintain from release to release.

Processing Cost

The retrieval of each virtually contiguous piece of information requires a seek system call and a read system call to extract the information from the kernel. If there are many such pieces, the central processing unit (cpu) costs of gathering the information can be very high. Also, since each piece requires a separate seek and read, it is very hard to guarantee that the data obtained is consistent.

Access Permissions

For security reasons kernel data is not set to be readable by ordinary users. Thus performance utilities (such as ps and sadc in the case of UNIX) must be run as privileged programs. Ordinary programs must invoke the performance utilities and read data either through pipes or files. This adds to the cost of accessing this information.

Binary Compatibility

In order to reduce the number of seeks and reads necessary to obtain the data, many metrics are combined into a single data structure (for example, sysinfo in UNIX). The result is that programs must be aware of the layout and contents of the data structure. If the data structure layout or content change significantly between releases, binary compatibility cannot be maintained; the programs must be recompiled with new headers that reflect the new data structure layout and contents.

Data Synchronisation

Using a variety of user space collectors to gather data can result in skewed collection times due to the scheduling delays for each process (see Collection Time Skew from Separate Collection Components for a UNIX example). Hence if two user level utilities (for example sar and stats in the case of UNIX) obtain performance information that is then analysed as if it refers to the same time period, this skew means that the usefulness of the data is impaired. A common source of user level collection would reduce such time skews.


 

 sar   |______|_______|_____|______|______|_____|______|_____|_______|

 stats |______|______|_______|_____|______|______|_______|___ ___|______|

 clock |______|______|______|______|______|______|______|______|______|

Figure: Collection Time Skew from Separate Collection Components

Data Applicability

The privileged utilities that collect kernel information needed for performance analysis is often oriented towards a particular use for the data. An example of this is the use of accounting information for performance analysis. The effect of this is that performance applications often get information they do not want, get it in the wrong form or cannot get it at all.

Measurement Applications

Existing performance measurement applications suffer from the lack of facilities specific to their requirements to obtain performance information. The issues in the previous section concerning kernel data obviously contribute to the problems faced by these applications but in addition there are general issues that apply.

Multiple Data Collection

There may be several measurement applications running, performing different analyses of performance information. It is commonly the case that there is no common collection mechanism between such applications, resulting in the same data being collected, distributed and stored separately by each application.

Control of Collection

Where there are several measurement applications running, each may try to control the way in which performance data is collected resulting in a conflict. So, for example, where a privileged program is invoked to collect performance information, one application may set the collection interval to one value and another may set it to a different value.

Methods of Collection

Where measurement applications have to use a variety of mechanisms to effect the collection of performance information, the writing of such applications is unnecessarily complex. Different methods have to be written to collect very similar data from different sources and provision must be made for additional methods to appear for different systems and new release.

Real Time Data

Measurement applications that wish to have access to real time data as opposed to historical data have to use different mechanisms. The effect of this is that data may be collected, distributed and stored more than once and that it is difficult to write an application that will work on both real time and historical data.

Events

By the nature of the mechanisms that are used to obtain performance information it is difficult to integrate events, and the information they contain, into the pool of performance information. Measurement applications should be able, if they wish, to access events as well as synchronously requested data.

New Information

Increasingly systems are becoming capable of dynamic reconfiguration (for example hot pull discs) and measurement applications need to be able to find out dynamically the objects that exist and the performance information they can supply. Measurement applications also need to have a mechanism by which they can be notified of changes that have occurred (that is, an event mechanism).

Data Description

Generic measurement applications need to be able to handle classes of objects without necessarily being aware of detailed differences between different classes of the same general type. So, for example, it should be possible for a measurement application to be able to use the performance information from any make and type of disc device. However, specialised applications should be able to make use of detailed information from a particular make of device.

Figure: Components of a Distributed Transaction

Distributed systems

Finally, we must consider the distributed environment. In the past, performance analysis activities of a single platform at a time were meaningful because most, if not all, of the processing of a user interaction took place on a single platform. In the emerging open systems environment, however, this is no longer the case. Components of a Distributed Transaction illustrates the situation where a user interaction is serviced by processing on a number of platforms and in addition, these platforms may be supplied by a variety of vendors. In this case, the response time experienced by the user is dependent on the response times of the individual service platforms and on the response times of various network components. To be able to perform an analysis of response time requires that data be captured and tagged with identification at least at a transaction level and that there be a mechanism that can gather this data from distributed systems where it is captured¹.

Scope and Purpose of UMA

To help address the above data collection issues and limitations, the following three specifications for Universal Measurement Architecture (UMA) have been developed:

UMA Performance Measurement Data Pool (DPD)
UMA Data Capture Interface (DCI)
UMA Measurement Layer Interface (MLI).

This Guide describes the benefits and features of the Universal Measurement Architecture, and serves as an introduction to these UMA specification documents for those new to this architecture.

The Universal Measurement Architecture (UMA) provides support for the collection, management and reporting of performance data and events.

Its goals include:

standardisation and portability of interfaces and data
collection from both kernel and application sources
distributed access - multiple system images
control of collection overhead through common collection, configurable metrics and threshold filtering of data
improved data capture synchronisation
scalable and extensible services
seamless access between historical and current data
simple specification of interval and event data reporting.

UMA, therefore, may be considered as a powerful agent for collecting and managing performance data.

The following Chapters describe the interfaces and services in more detail.

Footnotes

1.: The tagging of workload components is predominately the concern of provider instrumentation and the analysis of performance data is an issue for measurement applications; both are formally outside of the scope of UMA itself, which is focused on the control of data acquisition and on the delivery and management of performance data. UMA does provide a mechanism (UMAWorkInfo instances) for containing and transmitting a flexible number of workload identifiers which may include a transaction ID. It will be necessary to track emerging instrumentation methodologies and standards efforts from DCE, ISO, and OMG working groups to ensure that UMA remains capable of appropriate functionality in this area.

Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy of this publication.

Contents

Next section

Index