Systems Management: Application Response Measurement (ARM) API
Copyright © 1998 The Open Group
Scope and Purpose
The applications that are used to run businesses have changed
dramatically over the past few years. In the early 1980s, large
applications generally executed on large computers, and were accessed
from "dumb" terminals. Non-networked applications executing on
personal computers were just beginning to be widely used. Since then,
these two application models have moved steadily towards each other,
fusing together to form distributed (networked) applications.
The most common programming model for distributed applications is the
client/server model. In a client/server application, the application
is split into two or more parts. One part is the user or "client"
part, and this part generally executes on a personal computer or
workstation. The "server" parts execute on computers that provide
functions for the client part, that is, they serve the client
application. The client and server can run on the same system, but
generally they are on different systems. The client part of an
application may invoke one or more functions on one or more servers,
and it may do a significant amount of processing itself combining,
manipulating, or analyzing the data provided by the servers.
An example of a client/server application might be processing a sales
order by retrieving inventory information from one database, sales
information from another database, and pricing information from a
third. The client part of the application determines if there is
sufficient inventory to accept the order, calculates the price based
on current market conditions, factors in price discounts for this
particular customer, and then invokes more server functions to
complete processing of the order.
By contrast, host-centric applications contain all the application
logic in one computer system, and users connect through "dumb"
terminals to use the application. Examples of the protocols used by
these applications are 3270, Telnet, and X-Windows. The response time
as seen by a user for a transaction can generally be broken down into
two components: the time to process the transaction on the host, and
the time for the input message and the output response. Processing
time at the terminal is usually trivial.
Measuring Service Levels
A monitoring product running at the host is able to measure the
service levels of host-centric applications. The monitor observes the
input request message that starts the transaction, and then observes
the outbound response back to the terminal. The difference between the
two times is the amount of time to process the transaction on the
host. The monitor generally also measures the time for the outbound
response to be sent to the terminal and an acknowledgment to be
received, using this as an approximation of the transit time. The
combination of the host and transit times is an approximation of the
service level seen by the user.
Monitoring the performance and the availability of distributed
applications has not proved to be easy to do. Some of the fundamental
assumptions that the host-centric methods depend on do not hold true.
Some examples showing why this is so are:
The user is typically running an application on a multitasking PC or
workstation. When the user presses a key or the mouse button, the
specified transaction starts, but the user may be able to continue
doing other operations. Put another way, there is no reliable way to
correlate keyboard or mouse input operations with business
One user transaction (which would be classified as a business
transaction) may spawn several other component transactions, some of
which may execute locally and some remotely. Any measurement agents
that exist only in the network layer or in a host (server) will not
see the entire picture.
The data may be sent through the network using various protocols, not
just one, making the task of packet decoding and correlation much more
Client/server applications can be complex, taking different execution
paths and spawning different component transactions, depending on the
results of previous component transactions. Every permutation could
take a different form when it goes across the communication link,
making it that much harder to reliably correlate network or host
(server) observations with what the user sees.
In spite of these difficulties, the need to monitor distributed
applications has never been greater. They are increasingly being used
in mission-critical roles. An approach that solves the problems listed
above is to let the application itself participate in the process. A
developer knows unambiguously when transactions begin and end, both
those that are visible to the user, and the component transactions
that invoke transactions on remote servers.
ARMing Your Applications
With the Application Response Measurement (ARM) API, sections of an
application can be marked to define business transactions. By invoking
ARM API function calls at the beginning and end of each transaction,
the application can be monitored by any of the measurement agents that
use data generated by the ARM API. Programs executing on client or
server systems can be instrumented.
By instrumenting an application to call the ARM API, that application
can be managed by any of the measurement agents that implement ARM.
The advantage of this approach is that the user of the application can
choose the measurement agent that best meets their needs, without
needing to change the application.
Using ARM, system administrators will be able to answer key questions
Is the application working correctly (available)?
How is the application performing? What is the response time? What is
the workload throughput? You will be measuring the actual service
levels experienced by your users.
Why is an application not available or performing poorly? What
operation was the application performing when the problem occurred? If
a remote server/application was being invoked when the problem
occurred, which one?
Who is using the application, how much are they using it, and what
kind of operations are being performed? Which servers are providing
the services? This information is useful for capacity planning and for
Figure: ARM in the Enterprise
This diagram shows how enterprise management applications, measurement
agents that implement the ARM API, and business applications that call
the ARM API, work together to provide a robust way to monitor
ARM Version 1.0 and Version 2.0
The ARM version 1.0 API was not adopted or published by The Open
Nevertheless, since the ARM version 1.0 API has been released
by the ARM working group of the CMG,
it is appropriate to position this ARM version 2.0 API
in the context of its predecessor.
Several additional features in ARM version 2.0 API improve the ways
applications can be managed, compared to ARM version 1.0 API:
You can indicate that a transaction is a component of another
transaction. Also, you can do transaction correlation within one
system or across multiple systems. This permits a better understanding
of the overall transaction, how much time each part of the transaction
is taking, and where problems are occurring.
You can provide additional information about the transaction, such as
the number of bytes or records being processed, or about the state of
the application at the moment that the transaction is being processed,
such as the length of a work queue. This information (called
application-defined metrics) is useful to better understand response
times, and how the application can be tuned to perform better.
You can use the new logging agent to do simple verification of your
instrumentation. It allows you to determine if the correct parameters
are being passed on each call, but it does not function as a
ARM version 2.0 API is backward compatible with ARM version 1.0.
Applications instrumented to the ARM 1.0 API will continue to function
correctly with agents that implement the additional features of the
ARM 2.0 API. Applications instrumented with ARM 2.0 will function
correctly with agents that implement the features of ARM 1.0.
Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy
of this publication.