Previous section.
Systems Management: Application Response Measurement (ARM) API
Copyright © 1998 The Open Group
Advanced Topics
The following topics provide information on more advanced
implementations using the ARM 2.0 API.
Additional Data Passed in ARM Function Calls
The following two types of additional data can now be provided via the
ARM 2.0 API:
-
Transaction correlation data
You can indicate that a transaction is a component of another
transaction. You can do transaction correlation within one system
or across multiple systems. This permits a better understanding
of the overall transaction, how much time each part of the
transaction is taking, and where problems are occurring.
-
Application-defined metrics
Application-defined metrics provide additional information about the
transaction, such as the number of bytes or records being
processed, or about the state of the application at the moment
that the transaction is being processed, such as the length
of a work queue. This information is useful to better
understand response times, and how the application can be
tuned to perform better.
Transaction Correlation
Many client/server transactions consist of one transaction visible to
the user, and any number of nested component transactions that are invoked by the one
visible transaction. These component transactions are the children of the parent
transaction (or the child of another child component transaction). It's very useful to
know how much each component transaction contributes to the total response time of the
visible transaction. Similarly, a failure in one of the component transactions will often
lead to a failure in the visible transaction, and this information is also very useful.
There are two facilities that the application developer can use to
provide this information to measurement agents that implement
the ARM 2.0 API:
-
On the same
arm_start(),
the application can request that the measurement agent assign
and return a correlator for this instance of the transaction
(that is, a parent correlator). Note that the agent has the
option of not providing the correlator, because it may not
support the capability (ARM Version 1.0 agents do not
support correlators), or because it is operating under
a policy to suppress generating them.
-
When indicating the start of a child transaction with an
arm_start(),
the application can provide a
correlator provided from a parent transaction.
This allows the measurement agent to know
the parent/child relationship.
Figure: Transaction Response Time Correlation
The figure shows the concept for a simple model.
The principle can be extended to a model
of arbitrary complexity.
-
Client A starts transaction T1, requesting a correlator via,
and is assigned C1.
-
Client A sends a request (T1) to Server B, and includes C1 in the
request.
-
Server B starts transaction T2, passing C1 as the parent. At the same
time it requests a correlator and is assigned C2.
-
Server B sends a request (T2) to Server C, and includes C2 in the
request.
-
Server C starts transaction T3, passing C2 as the parent.
-
T3 stops, T2 stops, and T1 stops.
If the correlation application collects all the data about these
transactions, it can put together the total picture, knowing that T1 is the parent of T2
(via C1), and T2 is the parent of T3 (via C2). The parent/child relationship could be from
a client to a server, or within one program.
An application using the ARM API need not be concerned with the format
of the correlators. Measurement agents generate correlators.
Changes Needed for Transaction Correlation
Each application responsible for a component of the overall transaction
(client and server) will require some modifications. Applications have three
responsibilities:
-
Request correlators for transactions with one or more child transactions
(via
arm_start())
by getting the appropriate flag.
-
Send the assigned correlators to the child transaction(s) along with the
data needed to invoke the child transaction(s) itself.
This is done by first checking that the agent assigned a
correlator, and then sending the number of bytes in the correlator.
The length is stored by the agent in the Correlator Length field.
-
Pass correlators received from parent transactions to the measurement
agents (via
arm_start())
by storing the correlator in the optional buffer and setting
the appropriate flag.
To enable a correlation application to analyze the correlators coming
from different systems, measurement agents follow conventions when
creating correlators. Included within the correlator is information
identifying the system, the transaction class (from
arm_getid()),
the
transaction instance (from
arm_start()),
and some flags. The format is
flexible and extendible so more conventions can be added as the need
arises. See
Measurement Agent Information
on Measurement Agent Information,
for information on the correlator format.
Correlators are passed in the
arm_start()
calls by utilizing the data
buffer. This same data buffer is used to pass application-defined
metrics, as described in
Format of Data Buffer in arm_start/arm_update/arm_stop
,
which describes format of the data buffer in
arm_start(),
arm_update(),
and
arm_stop().
Correlators are ignored in
arm_update()
and
arm_stop()
calls.
If a correlator is being requested, the data buffer should be 256
bytes, to allow for a variable size correlator. If a correlator is
being passed to the measurement agent, and none is requested, the
length may be truncated based on the correlator length.
If you only wanted to do transaction correlation in your application
and not provide application-defined metrics, you can zero out the
metrics (set the Flags Second Byte to zero and fill with zeros 80
bytes for the metrics descriptions).
- Note:
- Other than the length, the correlator format need not be
understood by the application developer, as it is opaque.
Application-Defined Metrics
Application-defined metrics can tell you more about the transaction or
about the state of the application at the moment that the transaction
is being processed. Three likely uses are envisioned as described
below:
-
Specify characteristics of the transaction that will affect the
response time, or that are useful for workload planning. Examples are
the number of bytes in a file transfer or print job, or the number of
records being processed. A file transfer of 100 megabytes would
certainly be expected to take longer than a transfer of 100 kilobytes.
-
Specify information about the current state of the application.
Examples would be the length of a workload queue, the amount of memory
allocated, or the number of threads being used. This information is
useful for adjusting workloads by shifting work between systems, or
tuning the application. If a comparison of response times versus
threads shows that congestion builds and response times increase
dramatically if, for example, eight threads are used instead of
twelve, the application can be recompiled or instructed to use more
threads, which may result in a dramatic improvement in performance.
-
Specify information that can be used in diagnosing problems. Examples
are error codes returned from services invoked by the application, or
information about the transaction itself such as the part number being
processed.
In setting up application-defined metrics,
arm_getid()
is used to define
the context (or
meta-data)
for a buffer of values that can be passed
at
arm_start(),
arm_update()
or
arm_stop().
Actual values are passed in
arm_start(),
arm_update()
and
arm_stop().
The length of the buffer is
specified in the
data_size
parameter.
Choosing a Data Type
The additional data provided in the data buffer uses metric and/or
string fields. (See later sections of this Chapter for information
on the format of the data buffer.)
Four general data types can be specified for each
field:
-
Counter
-
Gauge
-
Numeric id
-
String
This section provides some suggestions about which data type to use.
Counter
A counter should be used when it makes sense to sum up the values over
an interval. Examples are bytes printed and records written. The
values can also be averaged, maximums and minimums (per transaction)
can be calculated, and other kinds of statistical calculations can be
performed.
If a counter is used, its initial value must be set in the
arm_start()
call. The difference between the value in the
arm_start()
and the
arm_stop()
(or the value in the last
arm_update()
call if no metric value
is passed in
arm_stop()),
equals the amount attributed to this
transaction. Similarly, the difference between successive
arm_update()
calls, or from the
arm_start()
to the first
arm_update()
call, or from the last
arm_update()
to the
arm_stop()
call, equals the value for the time period between the calls.
Here are three examples of how a counter would probably be used:
-
The counter is set to zero at
arm_start()
and to some value at
arm_stop()
(or the last arm_update call). In this case, the application probably
measured the value for this transaction and provided that value in the
arm_stop()
call. The application always sets the value to zero in the
arm_start()
call, so the value at
arm_stop()
reflects both the difference from the
arm_start()
value and the absolute value.
-
The counter is x1 at
arm_start(),
x2 at its arm_stop, x2 at the next
arm_start(),
and x3 at its
arm_stop().
In this case, the application is
probably keeping a rolling counter. Perhaps this is a server
application that counts the total workload. The application simply
takes a snapshot of the counter at the start of a transaction and
another snapshot at the end of the transaction. The agent determines
the difference attributed to this transaction.
-
The counter is x1 at
arm_start(),
x2 at
arm_stop(),
x3 (not equal to x2) at the next
arm_start(),
and x4 at
arm_stop().
In this case, the
application is probably keeping a rolling counter as in the previous
example. But in this case the measurement represents a value affected
by other users or transaction classes, so the value often changes from
one
arm_stop()
to the next
arm_start()
for the same transaction class.
Gauge
A gauge should be used instead of a counter when it is not meaningful
to sum up the values over an interval. An example is the amount of
memory used. If you were measuring the amount of memory used over 20
transactions in an interval and the average usage for each of these
transactions was 15 MB, it does not make sense to say that 20*15=300
MB of memory used over the interval. It would make sense to say that
the average was 15 MB, that the median was 12 MB, and that the
standard deviation was 8 MB. These are the kinds of operations that an
agent will typically apply to gauges. The values can also be averaged,
maximums and minimums per transaction calculated, and other kinds of
statistical calculations performed.
Gauges can be provided on
arm_start(),
arm_update(),
and
arm_stop()
calls.
This creates the potential for different interpretations. If several
values are provided for a transaction (one on an
arm_start(),
one on each
arm_update(),
and one on an
arm_stop()),
which one(s) should be used?
In order to have consistent interpretation, the following conventions
apply. Measurement agents are free to process the data in any way
within these guidelines.
-
The maximum value for a transaction will be the largest valid value
passed at any time during the transaction.
-
The minimum value for a transaction will be the smallest valid value
passed at any time during the transaction.
-
The mean value for a transaction will be the mean of all valid values
passed at any time during the transaction. All values will be weighted
equally.
-
The median value for a transaction will be the median of all valid
values passed at any time during the transaction. All values will be
weighted equally.
-
The last value for a transaction will be the last valid value passed
at any time during the transaction.
Numeric ID
A numeric id is simply a numeric value that is used as an identifier,
and not as a measurement value. Examples are message numbers and error
codes. It is not meaningful to sum, average, or manipulate these
values in any arithmetic way. By using numeric id instead of a gauge
or counter, the application indicates this to the measurement agent.
An agent could create statistical summaries based on these values,
such as generating a frequency histogram by error code, but this is
done by counting the numbers, not by summing them or performing any
other arithmetic operation.
String
A measurement agent should process a string in the same way as a
numeric id. As with numeric ids, it is not meaningful to do arithmetic
operations on a string value.
Format of Data Buffer in arm_getid
Format
| Size
| 101 (int32) (identifies "meta-data" format)
|
---|
Flags:
The flags indicate which Metric and String descriptions are included
in the buffer.
| 4 bytes
| First Byte (bit8) = 0
Second Byte (bit8)
abcdefg0, where a through g each denote the value of a bit flag:
-
-
a = 1 if there is a description for Metric #1, otherwise a = 0
b = 1 if there is a description for Metric #2, otherwise b = 0
c = 1 if there is a description for Metric #3, otherwise c = 0
d = 1 if there is a description for Metric #4, otherwise d = 0
e = 1 if there is a description for Metric #5, otherwise e = 0
f = 1 if there is a description for Metric #6, otherwise f = 0
g = 1 if there is a description for String #1, otherwise g = 0
Third Byte (bit8) = 0
Fourth Byte (bit8) = 0
|
Metric #1 desc.
| 48 bytes
| The first 4 bytes (int32) define the type of data that will be passed
in the 8 byte field. See the description below this table for an
explanation of the different data types.
-
-
1 = ARM_Counter32
2 = ARM_Counter64
3 = ARM_CntrDivr32
4 = ARM_Gauge32
5 = ARM_Gauge64
6 = ARM_GaugeDivr32
7 = ARM_NumericID32
8 = ARM_NumericID64
9 = ARM_String8
The last 44 bytes (char*) are the name of the metric. This is a NULL
terminated character string. A possible use of this name is to display
it along with the current value, either on a user interface or in a
report.
|
Metric #2 desc.
| 48 bytes
| Same as Metric description #1.
|
Metric #3 desc.
| 48 bytes
| Same as Metric description #1.
|
Metric #4 desc.
| 48 bytes
| Same as Metric description #1.
|
Metric #5 desc.
| 48 bytes
| Same as Metric description #1.
|
Metric #6 desc.
| 48 bytes
| Same as Metric description #1.
|
String #1 desc.
| 48 bytes
| The first 4 bytes (int32) define the type of data that will be in the
field. Only one data type is valid in this field.
10 = ARM_String32
The last 44 bytes (char*) are the name of the String #1 field. It is a
NULL terminated character string. A possible use of this name is to
display it along with the current value, either on a user interface or
in a report.
|
Data Type Definitions
- ARM_Counter32
An unsigned32 value that increases up to the maximum value that the
counter can hold, at which point it resets to zero and continues
counting up from zero. Except for the reset back to zero, the value
can never decrease. The counter is in the first four bytes, and the
second four bytes are unused.
- ARM_Counter64
An unsigned64 counter (see ARM_Counter32, except it is 64 bits long).
- ARM_CntrDivr32
A combination of two unsigned32 integers, with ARM_Counter32 in the
first four bytes, and an unsigned32 divisor in the second four bytes.
The total value is ARM_CntrDivr32. The purpose of this format is to be
able to represent decimal values without using floating point formats.
- ARM_Gauge32
An int32 (signed) value that can increase or decrease. The gauge is in
the first four bytes, and the second four bytes are unused.
- ARM_Gauge64
An int64 (signed) gauge (see ARM_Gauge32, except it is 64 bits long).
- ARM_GaugeDivr32
A combination of two integers, one an int32 (signed) and one an
unsigned32. ARM_Gauge32 is in the first four bytes, and an unsigned32
divisor in the second four bytes. The total value is ARM_GaugeDivr32.
The purpose of this format is to be able to represent decimal values
without using floating point formats.
- ARM_NumericID32
An unsigned32 value that should not be used in arithmetic operations
because it is used as an identifier, not as a measurement. For
example, a message number or error code. The numeric id is in the
first four bytes, and the second four bytes are unused.
- ARM_NumericID64
An unsigned64 value that should not be used in arithmetic operations
because it is used as an identifier, not as a measurement. An example
is a message number or error code.
- ARM_String8
An 8 byte string that is
not
NULL terminated. If the string is less
than eight bytes long, it must be padded with blanks. The character
set is ASCII or EBCDIC, depending on whatever is standard for that
platform. Unlike the NULL terminated character strings passed in
various places in the API, these strings cannot be reliably converted
to other code pages, so it is suggested you use only the common
characters in the first 128 characters of the Latin code pages. See
Internationalization
for more information on internationalization.
- ARM_String32
A 32 byte string that is
not
NULL terminated. If the string is less
than 32 bytes long, it must be padded with blanks. The character set
is ASCII or EBCDIC, depending on whatever is standard on that
platform. Unlike the NULL terminated character strings passed in
various places in the API, these strings cannot be reliably converted
to other code pages, so it is suggested you use only the common
characters in the first 128 characters of the Latin code pages. See
the "Internationalization" section on page 56 for more information.
Format of Data Buffer in arm_start/arm_update/arm_stop
Format 1
Format
| Size
| 1 (int32) (2 is special, for arm_update())
|
---|
Flags:
The flags indicate which fields are included in the buffer.
| 4 bytes
| First Byte (bit8): Only valid for
arm_start().
Ignored on
arm_update()
and
arm_stop().
abcd0000, where a,b,c,d each denote the value of a bit flag. a,b,d are
set by the application. c is set by the measurement agent.
a = 1 if the application is passing the correlator from a parent
transaction in the Correlator field; otherwise a = 0.
b = 1 if the application is requesting that the agent generate a
correlator for the transaction (the one indicated by this
arm_start()),
otherwise b = 0. If a correlator is being requested, the
data buffer should be 256 bytes, to allow for a variable size
correlator.
c = 1 if the agent is returning a correlator in the Correlator field.
When set, the value in the Correlator field overlays any previous
value. This flag will only be set when three conditions are met,
otherwise c=0:
-
The application has set bit b = 1.
-
The agent supports this function (agents that only support version 1.0
of the ARM API do not).
-
The agent is running in a mode where the generation of correlators is
enabled (that is, there might be an installation policy to disable the
generation of correlators, either temporarily or permanently).
If this bit is
not
set to 1, there is no correlator, and therefore the
application should not forward the contents of the Correlator field.
d = 1 if the application is requesting that the agent trace this
transaction. This might be done when a dummy test transaction is being
executed, or when an error has occurred. Each agent can choose how and
if it should honor the request, and administrators who configure the
agent may establish the policy.
Second Byte (bit8)
abcdefg0, where a through g each denote the value of a bit flag:
a = 1 if a value is passed in Metric #1, otherwise a = 0
b = 1 if a value is passed in Metric #2, otherwise b = 0
c = 1 if a value is passed in Metric #3, otherwise c = 0
d = 1 if a value is passed in Metric #4, otherwise d = 0
e = 1 if a value is passed in Metric #5, otherwise e = 0
f = 1 if a value is passed in Metric #6, otherwise f = 0
g = 1 if a value is passed in String #1, otherwise g = 0
It is perfectly permissible for an application to pass none or some of
the metrics on each call, and to change which metrics are passed from
call to call. This holds true for
arm_start(),
arm_update(),
and
arm_stop()
calls. The one requirement that must be adhered to is that the meaning
and position of the field must have been defined with the
arm_getid()
call (see
Format of Data Buffer in arm_getid
for the format of data buffer in
arm_getid()).
Third Byte (bit8) = 0
Fourth Byte (bit8) = 0
|
Metric #1
| 8 bytes
| The metric fields are used by the application to pass useful
information about the transaction or the state of the application to
the measurement agent. The field contains one or two integers, or a
string variable. The use of the field and the format of the field are
determined by the buffer passed on the
arm_getid()
call (see
Format of Data Buffer in arm_getid
for the format of data buffer in
arm_getid()).
See
Choosing a Data Type
for more information on choosing a data type, and
Data Type Definitions
for data type definitions.
|
Metric #2
| 8 bytes
| Same as Metric #1.
|
Metric #3
| 8 bytes
| Same as Metric #1.
|
Metric #4
| 8 bytes
| Same as Metric #1.
|
Metric #5
| 8 bytes
| Same as Metric #1.
|
Metric #6
| 8 bytes
| Same as Metric #1.
|
String #1
| 32 bytes
| A string variable of up to 32 characters. The string is not NULL
terminated, and is padded with blanks if it is less than 32
characters. Any information can be included in the string. Examples
would be a part number being processed, or an error code.
|
Correlator
|
| The field has two different uses depending on whether it is passed on
the call from the application to the measurement agent, or if it is
passed in the return from the agent:
-
The application can pass in the correlator from a parent transaction
to the agent. This allows the agent to correlate the parent
transaction to the component transaction being started with this
arm_start()
call.
-
The agent can return a correlator for the transaction being started by
this
arm_start()
call. The application could then pass this correlator
to applications that it invokes, and they in turn could pass it as the
parent correlator in
arm_start()
calls that they make.
If the correlator returned bit is set (Flags First Byte c=1), the
application can either pass the entire 168 byte correlator. Or if you
want to optimize, the application can choose to read the correlator
length field and only pass the number of bytes containing data,
starting with the 2 bytes of the correlator length.
See
Transaction Correlation
for more information on correlating transactions.
Also, see
Measurement Agent Information
for more information on the content of the correlator.
|
| Length
2 bytes
| The Correlator length field (unsigned 16) specifies the length of a
correlator (including this field) generated by a measurement agent
(when bit c is set in the first Flags byte).
If this value is zero, it means that the agent is not returning a
correlator, and therefore there is no reason to pass this
correlator on to other parts of the application (or servers that it
calls).
This field is considered a part of the correlator and must be included
in the forwarded correlator data.
|
| Data
0-166 bytes
| The Correlator data field is used to show the parent/child
relationship between transactions. (Note: the application instrumenter
has no need to understand the correlator format, as it is
opaque).
|
Format 2
In the
arm_update()
calls with a Format field containing the value 2,
the buffer may have the following format:
Format
| Size
| 2 (int32)
|
---|
Data
| 1020 bytes
(maximum)
| Contains the data. The length of the buffer is determined by the
data_size
parameter. The format of the data is not defined, but it is
suggested that the data be formatted as plain-text characters so it
can be understood without requiring a special formatting program. The
agent cannot summarize the data over an interval, it must be treated
as trace data. One suggestion is to format all information as
plain-text characters so it can be read by a person without a special
formatting program.
Note that because the data in an opaque buffer cannot be summarized,
and processing by the agent may consist of logging the data to a trace
file, many calls at a high frequency could result in a loss of data or
a slowing down of the system, most likely due to an excessive amount
of file I/O. Therefore it is recommended that the call be used only in
special situations. NULL termination is not required.
|
Three Ways to Instrument within a Transaction Instance
There are three methodologies for instrumenting within a transaction
instance. The first two are useful when the transaction is within one
application. The last one is useful when the transaction is
distributed across applications or systems.
-
Instrument a transaction using
arm_update()
as a
heartbeat,
when it is
an operation that takes a long time to complete (several minutes or
hours) and you want to show the overall progress of the transaction in
numeric form.
If these transactions have different steps associated with processing
each record, you may want to instrument these steps with component
transactions (as described below), or use repeated calls to
arm_update()
to show the overall progress of the transaction. For example, the
transaction may process a million records. A call to
arm_update()
could
be made for every 1000 records or every minute of processing. This
could show the progress of the transaction based on the number of
times
arm_update()
was called or with one or more application-defined metrics.
-
Instrument a transaction using component transactions when it is a
long transaction that has many steps. A transaction can be defined for
the overall transaction and then nested transactions can be defined
for each of the steps. A step might represent a single discrete
operation, or it could represent a large number of operations, such as
copying 1000 files. This allows for the monitoring of each of the
steps as well as the overall transaction.
For example, step 1 takes about 20 minutes, step 2 takes about 40
minutes, and step 3 takes about 10 minutes. Each step can have a
defined transaction as well as the overall transaction. So you would
define 3 component transactions monitoring each step, plus one
transaction that monitors the overall transaction.
-
Instrument using transaction correlation when the transaction has
components that span several applications or systems. This approach is
more complex than the previous two as it requires changes to all the
applications involved in processing components of the transaction, but
it is the most accurate way to track transaction response time
spanning systems.
Internationalization
The ARM API is designed to enable applications to use native code
pages and languages, and for measurement agents to be able to support
many different languages. Users of agents should contact the providers
to see if the agent supports the needed code pages and languages.
The ARM API supports any code page as long as no characters are
encoded with binary zero bytes (octets). This is because most strings
are passed as NULL terminated strings, and the NULL terminator
character is a binary zero byte. If a binary zero byte is encountered
before the end of the string, the agent would interpret the zero byte
as the NULL terminator and truncate the string. Most code pages meet
this requirement.
These are code pages that contain binary zero bytes, but there are
alternate ways to encode the characters. A well-known example is the
Unicode standard. In its native format using 16 bit characters
(UTC-2), there are binary zero bytes. However, the UTF-8 encoding of
the same Unicode characters does not contain binary zero bytes, and
this format is entirely compatible with the ARM API.
Agents that support native languages will often use the following
technique. When the application links to the agent it links to a part
of the agent that executes in the same process space as the
application. Typically this small part of the agent communicates with
the main part of the agent across an inter-process communications
(IPC) channel. The small part of the agent that executes in the same
process as the application can issue an operating system call to find
out what code page and language the process is using. It can then pass
this information to the main part of the agent, and the main part of
the agent can convert from the native code page as necessary.
There are the following three restrictions on the use of native
languages.
-
The strings can contain no binary zero bytes except for the NULL
terminator character (as was mentioned above).
-
All the strings should be encoded using the same code page and
language information as the process that executes the
arm_init()
call.
This also implies that the code page and language information should
not change after the
arm_init()
call.
-
This technique does not apply to any string data passed within the
optional buffers on
arm_start(),
arm_update(),
and
arm_stop().
This is
because these strings are not null terminated (note that it
does
apply to the metric descriptions passed within the optional buffer on
arm_getid()).
Further, these strings are often about things that are
external to the program, such as a part number or an error code, so
the requirement to use the same code page and language information as
the process is unacceptable. The application developer is strongly
recommended to restrict these strings to the first 128 bytes of the
standard Latin code pages for ASCII and EBCDIC (depending on the
platform).
,iX EBCDIC
Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy
of this publication.