Universal Measurement Architecture - Features and Benefits of the UMA Interfaces

Universal Measurement Architecture Guide
Copyright © 1997 The Open Group

Features and Benefits of the UMA Interfaces

This Chapter provides further description of the interfaces defined in the two UMA interface specifications:

the UMA Data Capture Layer Interface (DCI) specification (see reference DCI)
the UMA Measurement Layer Interface (MLI) specification (see reference MLI).

It also describes how they relate to one another.

Features and Benefits of the DCI

The Data Capture Interface (DCI) is the lowest architectural layer in the Universal Measurement Architecture (UMA). This section will describe the DCI and the services provided by the DCI, give an understanding of the problems solved by this layer, and the problems that the DCI was not meant to solve.

The DCI is a collection of programming interfaces. The DCI specification defines the set of DCI interfaces and the arguments and return values for those interfaces. The DCI specification also defines the service provided by these programming interfaces.

Performance Management and the DCI

The DCI addresses several important problems in the performance management arena:

it provides a consistent interface between system functions that are providing metrics and those functions that consume these metrics
it allows any system entity, applications, daemons, or the operating system to provide metrics
it separates the metric source from the method for acquiring the metrics. This allows metric consumers to use a uniform acquisition method regardless of source.

One of the problems the DCI was not meant to solve is the transmission of data across the network. The DCI interfaces explicitly limit their scope to metric transmission between providers and consumers on the same system. The reason for this scope limitation is that the intersystem metric transmission problem is already addressed by both the higher UMA architectural level (MLI and Data Services Layer) and by existing solutions, such as SNMP.

In summary, the DCI is a relatively simple collection of interfaces to provide a uniform mechanism for transmitting and collecting performance information from any system entity, from the operating system to applications. Its primary benefits are the standardisation of the collection interface, the elimination of prior knowledge of the metrics being collected, and use of a uniform access mechanism regardless of metric source.

DCI Service

The service provided by this API (Application Programming Interface) is twofold. First, the DCI acts as a connection broker between those system components which produce metrics (metric providers) and those system components which consume metrics (metric consumers). Second, the DCI provides a repository, called the DCI name space, for metric providers to store information about the set of available metrics. Metrics consumers can traverse and interrogate the DCI name space to find out information about the available metric set. It is not the metrics that are stored in the name space, instead it is information about the metrics; the metrics themselves are managed and supplied by the individual metrics providers. The DCI structure and the client/server relationships are illustrated in DCI Structure and Client/Server Relationships .

Figure: DCI Structure and Client/Server Relationships

To clarify the relationship of this figure with the one previous showing the full UMA architecture, it should be observed that when an implementation includes the MLI, the UMA Data Services and Measurement Control Layers (which are the MLI service layers) play the role of a DCI metrics consumer.

Name Space

Through use of the DCI, performance applications and MLI service consumers can traverse the name space and find out what type of metrics are available, the units and data types of individual metrics, the number and type of available measured objects, and human readable descriptive labels for both the metrics and measured objects.

Through the use of wildcards in the description of a metric, a metric consumer can request multiple metrics in one call. This enables the cost of delivering the metrics to be reduced and the skew between metrics to be minimised.

In the terminology used by the DCI specification, the metrics name space contains names for metric classes and instances of those classes. A metric class is a grouping of metrics and the information used to describe that metric set. These metric attributes are such things as units and data types. A metric instance is a representation of a measured object, such as a disk. Thus there can be a metric class which describes disk I/O metrics and that class can have five instances, one for each of the system's disk drives. A metrics consumer can find out about the metrics by reading the metric class attributes. The consumer can then read the disk performance data for one or all class instantiations. It is at this point the DCI's connection broker service comes into play. The metrics consumer does not require prior knowledge of which provider supports the desired metric set nor does it need to know the mechanics of how those metrics are delivered. This is all handled by the DCI service and the relationship it maintains with the set of system providers.

Polled Metric and Event Support

There are two types of data supported by the DCI: polled metrics and events. The distinction between the two is whether the consumer or provider is primarily responsible for metric delivery. In the case of polled metrics, the consumer gets metrics at whatever rate is convenient. In the case of event metrics, consumers must wait for providers to deliver events as these events occur. Traces in UMA are implemented as high frequency events and are normally directed to a file by the DCI consumer.

Features and Benefits of the MLI

The Measurement Layer Interface (MLI) is an application programming interface (API) and a set of services (the UMA Data Services Layer) that simplify the implementation of measurement application programs (MAPs) in a distributed environment.

Note:: In the following discussion we refer to both polled data and events as data.

The MLI implements the following aspects of the UMA architecture:

allows simple specification of polled data and event collection parameters,
establishes a consistent message architecture for UMA data, and data is available in simply parsed structures
manages the distributed collection, reporting and recording of current and historical data
provides synchronised capture of data
implements filtering of data based on selection criteria and thresholds to minimise network traffic
implements seamless switching between current and historical data.

Data Collection, Reporting and Recording

Through the MLI, a MAP can specify the types and characteristics of data to be reported to a MAP. UMA distinguishes between the reporting of data to a MAP and data collection. A MAP requests data from a specified source to be reported to a specified destination. The UMA services act on behalf of a MAP to perform the actual data collection through the Data Capture Interface (DCI). Performance overhead is minimised by making use of existing collections in progress for other MAPs that have requested the same performance measurement data or events.

UMA Messages

UMA messages provide the basis for transmitting existing notifications and data from UMA to a MAP. In addition, they are the default basis for transmitting requests between Data Services Layers on distributed nodes. The data in UMA messages is identified by classes and subclasses; this is defined in detail in UMA Data Pool .

Control Messages

UMA control messages include MAP requests to the UMA facility, UMA condition notifications from UMA to a MAP, and in distributed environments, request and acknowledgment messages between remote and local Data Services Layers.

Data Messages

Data Messages contain either interval data, event data or configuration data:

interval data is requested by a MAP for capture at the end of a specified time interval. The data reported through the MLI is the difference in value of the requested metrics over the interval, or absolute counter values
event data consists of notification messages to the MAP indicating that some predefined set of system events has occurred. The system events include UMA configuration (for example, the availability of metrics), system configuration (for example, hardware information - such as number/type of processors), and process-end summaries.
configuration data contains data informing an MLI-based application of the data classes and subclasses available for each registered provider to the DCI.

Certain message subclasses have both interval and event forms. This permits the MAP to select whether data is to be reported at each interval end, at an event (for example, the termination of a process), or both.

Depending on the specified destination, a data message may be directed to the MAP itself, to UMADS (a common UMA data storage facility), or to a private file for later processing.

Screening and Filtering of Data

UMA provides two means by which the message traffic to a MAP (and possibly connected network traffic) can be reduced:

establish threshold settings, thereby preventing the transmission of data messages unless the threshold conditions are satisfied (for example when the runqueue length reaches a particular value)
adjust the granularity of the collected data (for example, by restricting reporting to a particular process or user id).

Constructed Workloads and Summarisation

The UMA MLI supports requesting of workload construction by permitting the labelling of workloads. These constructed workloads typically represent the result of a request for filtering and/or summarisation of workload metric subclasses. For example, one could request the selection of all commands starting with the letters "abc" and one could additionally request that a specific per-work unit metric subclass report its process metrics over the sum of all processes whose command names start with these same letters.

A constructed workload is assigned an identifier by the caller which can then be used to tag this workload for later reference.

A special constructed workload that is the complement of a specified workload is also available. The complement workload metrics are derived by subtracting the selected per-work-unit workload data values from the available global equivalents. For example, for reporting at the process level, if the selection criterion is "User Name: Albert", then the cpu utilization metric for the complement workload would consist of the global cpu utilization minus the usage for all processes running under the user name "Albert".

UMA Data Storage

UMA provides for the reading and writing of messages to and from conventional (private) files. In addition, UMA provides UMADS, a common facility for access and maintenance of historical data.

UMADS maintains individually accessible collections of data by host, but there is no requirement that data for a specific host be kept on that host. Instead, a systems administrator can arrange to have UMADS collections for any number of hosts stored on performance data servers.

Seamless Access

UMA provides seamless access between historical and recent (live) data. This means that a MAP may be receiving UMADS historical data until the time reaches the present, at which time UMA automatically switches its source to provide live data (see Seamless Switch - Historical to Recent Data ).

Figure: Seamless Switch - Historical to Recent Data

UMA also provides a seek mechanism, so that a MAP can navigate through time and can seamlessly access UMADS data from the present time (or the reverse). A seek from RECENT (current data) to UMADS (historical data) is illustrated in Backwards Seek - Recent to Historical Data .

Figure: Backwards Seek - Recent to Historical Data

Data Capture Synchronisation

The UMA Data Services and Measurement Control Layers enable better synchronised data capture in two ways:

first, the UMA Data Services can utilise global time synchronisation facilities, if they are available, to ensure that polled data collections on different platforms occur at the same time
second, on each individual platform, the UMA Measurement Control Layer merges all measurement requests for polled data so that they may be requested at one time by a single process. This reduces the time skew of the data to the length of the collection time itself. UMA provides an optional additional check on the time skew at the UMA subclass collection level. If the collection time duration for the subclass is inordinately long, the capture can be re-attempted immediately.

Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy of this publication.

Contents

Next section

Index