Previous section.

Systems Management: Application Response Measurement (ARM) API
Copyright © 1998 The Open Group

Advanced Topics

The following topics provide information on more advanced implementations using the ARM 2.0 API.

Additional Data Passed in ARM Function Calls

The following two types of additional data can now be provided via the ARM 2.0 API:

Transaction Correlation

Many client/server transactions consist of one transaction visible to the user, and any number of nested component transactions that are invoked by the one visible transaction. These component transactions are the children of the parent transaction (or the child of another child component transaction). It's very useful to know how much each component transaction contributes to the total response time of the visible transaction. Similarly, a failure in one of the component transactions will often lead to a failure in the visible transaction, and this information is also very useful.

There are two facilities that the application developer can use to provide this information to measurement agents that implement the ARM 2.0 API:

Changes Needed for Transaction Correlation
Each application responsible for a component of the overall transaction (client and server) will require some modifications. Applications have three responsibilities:

To enable a correlation application to analyze the correlators coming from different systems, measurement agents follow conventions when creating correlators. Included within the correlator is information identifying the system, the transaction class (from arm_getid()), the transaction instance (from arm_start()), and some flags. The format is flexible and extendible so more conventions can be added as the need arises. See Measurement Agent Information on Measurement Agent Information, for information on the correlator format.

Correlators are passed in the arm_start() calls by utilizing the data buffer. This same data buffer is used to pass application-defined metrics, as described in Format of Data Buffer in arm_start/arm_update/arm_stop , which describes format of the data buffer in arm_start(), arm_update(), and arm_stop(). Correlators are ignored in arm_update() and arm_stop() calls.

If a correlator is being requested, the data buffer should be 256 bytes, to allow for a variable size correlator. If a correlator is being passed to the measurement agent, and none is requested, the length may be truncated based on the correlator length.

If you only wanted to do transaction correlation in your application and not provide application-defined metrics, you can zero out the metrics (set the Flags Second Byte to zero and fill with zeros 80 bytes for the metrics descriptions).

Note:
Other than the length, the correlator format need not be understood by the application developer, as it is opaque.

Application-Defined Metrics

Application-defined metrics can tell you more about the transaction or about the state of the application at the moment that the transaction is being processed. Three likely uses are envisioned as described below:

  1. Specify characteristics of the transaction that will affect the response time, or that are useful for workload planning. Examples are the number of bytes in a file transfer or print job, or the number of records being processed. A file transfer of 100 megabytes would certainly be expected to take longer than a transfer of 100 kilobytes.

  2. Specify information about the current state of the application. Examples would be the length of a workload queue, the amount of memory allocated, or the number of threads being used. This information is useful for adjusting workloads by shifting work between systems, or tuning the application. If a comparison of response times versus threads shows that congestion builds and response times increase dramatically if, for example, eight threads are used instead of twelve, the application can be recompiled or instructed to use more threads, which may result in a dramatic improvement in performance.

  3. Specify information that can be used in diagnosing problems. Examples are error codes returned from services invoked by the application, or information about the transaction itself such as the part number being processed.

In setting up application-defined metrics, arm_getid() is used to define the context (or meta-data) for a buffer of values that can be passed at arm_start(), arm_update() or arm_stop(). Actual values are passed in arm_start(), arm_update() and arm_stop(). The length of the buffer is specified in the data_size parameter.

Choosing a Data Type

The additional data provided in the data buffer uses metric and/or string fields. (See later sections of this Chapter for information on the format of the data buffer.)

Four general data types can be specified for each field:

This section provides some suggestions about which data type to use.

Counter
A counter should be used when it makes sense to sum up the values over an interval. Examples are bytes printed and records written. The values can also be averaged, maximums and minimums (per transaction) can be calculated, and other kinds of statistical calculations can be performed.

If a counter is used, its initial value must be set in the arm_start() call. The difference between the value in the arm_start() and the arm_stop() (or the value in the last arm_update() call if no metric value is passed in arm_stop()), equals the amount attributed to this transaction. Similarly, the difference between successive arm_update() calls, or from the arm_start() to the first arm_update() call, or from the last arm_update() to the arm_stop() call, equals the value for the time period between the calls.

Here are three examples of how a counter would probably be used:

Gauge
A gauge should be used instead of a counter when it is not meaningful to sum up the values over an interval. An example is the amount of memory used. If you were measuring the amount of memory used over 20 transactions in an interval and the average usage for each of these transactions was 15 MB, it does not make sense to say that 20*15=300 MB of memory used over the interval. It would make sense to say that the average was 15 MB, that the median was 12 MB, and that the standard deviation was 8 MB. These are the kinds of operations that an agent will typically apply to gauges. The values can also be averaged, maximums and minimums per transaction calculated, and other kinds of statistical calculations performed.

Gauges can be provided on arm_start(), arm_update(), and arm_stop() calls. This creates the potential for different interpretations. If several values are provided for a transaction (one on an arm_start(), one on each arm_update(), and one on an arm_stop()), which one(s) should be used? In order to have consistent interpretation, the following conventions apply. Measurement agents are free to process the data in any way within these guidelines.

Numeric ID
A numeric id is simply a numeric value that is used as an identifier, and not as a measurement value. Examples are message numbers and error codes. It is not meaningful to sum, average, or manipulate these values in any arithmetic way. By using numeric id instead of a gauge or counter, the application indicates this to the measurement agent. An agent could create statistical summaries based on these values, such as generating a frequency histogram by error code, but this is done by counting the numbers, not by summing them or performing any other arithmetic operation.
String
A measurement agent should process a string in the same way as a numeric id. As with numeric ids, it is not meaningful to do arithmetic operations on a string value.

Format of Data Buffer in arm_getid

Format Size 101 (int32) (identifies "meta-data" format)
Flags:

The flags indicate which Metric and String descriptions are included in the buffer.

4 bytes First Byte (bit8) = 0

Second Byte (bit8)
abcdefg0, where a through g each denote the value of a bit flag:

a = 1 if there is a description for Metric #1, otherwise a = 0
b = 1 if there is a description for Metric #2, otherwise b = 0
c = 1 if there is a description for Metric #3, otherwise c = 0
d = 1 if there is a description for Metric #4, otherwise d = 0
e = 1 if there is a description for Metric #5, otherwise e = 0
f = 1 if there is a description for Metric #6, otherwise f = 0
g = 1 if there is a description for String #1, otherwise g = 0

Third Byte (bit8) = 0

Fourth Byte (bit8) = 0

Metric #1 desc. 48 bytes The first 4 bytes (int32) define the type of data that will be passed in the 8 byte field. See the description below this table for an explanation of the different data types.
1 = ARM_Counter32
2 = ARM_Counter64
3 = ARM_CntrDivr32
4 = ARM_Gauge32
5 = ARM_Gauge64
6 = ARM_GaugeDivr32
7 = ARM_NumericID32
8 = ARM_NumericID64
9 = ARM_String8

The last 44 bytes (char*) are the name of the metric. This is a NULL terminated character string. A possible use of this name is to display it along with the current value, either on a user interface or in a report.
Metric #2 desc. 48 bytes Same as Metric description #1.
Metric #3 desc. 48 bytes Same as Metric description #1.
Metric #4 desc. 48 bytes Same as Metric description #1.
Metric #5 desc. 48 bytes Same as Metric description #1.
Metric #6 desc. 48 bytes Same as Metric description #1.
String #1 desc. 48 bytes The first 4 bytes (int32) define the type of data that will be in the field. Only one data type is valid in this field.

10 = ARM_String32

The last 44 bytes (char*) are the name of the String #1 field. It is a NULL terminated character string. A possible use of this name is to display it along with the current value, either on a user interface or in a report.

Data Type Definitions

ARM_Counter32

An unsigned32 value that increases up to the maximum value that the counter can hold, at which point it resets to zero and continues counting up from zero. Except for the reset back to zero, the value can never decrease. The counter is in the first four bytes, and the second four bytes are unused.

ARM_Counter64

An unsigned64 counter (see ARM_Counter32, except it is 64 bits long).

ARM_CntrDivr32

A combination of two unsigned32 integers, with ARM_Counter32 in the first four bytes, and an unsigned32 divisor in the second four bytes. The total value is ARM_CntrDivr32. The purpose of this format is to be able to represent decimal values without using floating point formats.

ARM_Gauge32

An int32 (signed) value that can increase or decrease. The gauge is in the first four bytes, and the second four bytes are unused.

ARM_Gauge64

An int64 (signed) gauge (see ARM_Gauge32, except it is 64 bits long).

ARM_GaugeDivr32

A combination of two integers, one an int32 (signed) and one an unsigned32. ARM_Gauge32 is in the first four bytes, and an unsigned32 divisor in the second four bytes. The total value is ARM_GaugeDivr32. The purpose of this format is to be able to represent decimal values without using floating point formats.

ARM_NumericID32

An unsigned32 value that should not be used in arithmetic operations because it is used as an identifier, not as a measurement. For example, a message number or error code. The numeric id is in the first four bytes, and the second four bytes are unused.

ARM_NumericID64

An unsigned64 value that should not be used in arithmetic operations because it is used as an identifier, not as a measurement. An example is a message number or error code.

ARM_String8

An 8 byte string that is not NULL terminated. If the string is less than eight bytes long, it must be padded with blanks. The character set is ASCII or EBCDIC, depending on whatever is standard for that platform. Unlike the NULL terminated character strings passed in various places in the API, these strings cannot be reliably converted to other code pages, so it is suggested you use only the common characters in the first 128 characters of the Latin code pages. See Internationalization for more information on internationalization.

ARM_String32

A 32 byte string that is not NULL terminated. If the string is less than 32 bytes long, it must be padded with blanks. The character set is ASCII or EBCDIC, depending on whatever is standard on that platform. Unlike the NULL terminated character strings passed in various places in the API, these strings cannot be reliably converted to other code pages, so it is suggested you use only the common characters in the first 128 characters of the Latin code pages. See the "Internationalization" section on page 56 for more information.

Format of Data Buffer in arm_start/arm_update/arm_stop

Format 1

Format Size 1 (int32) (2 is special, for arm_update())
Flags:

The flags indicate which fields are included in the buffer.

4 bytes First Byte (bit8): Only valid for arm_start(). Ignored on arm_update() and arm_stop().

abcd0000, where a,b,c,d each denote the value of a bit flag. a,b,d are set by the application. c is set by the measurement agent.

a = 1 if the application is passing the correlator from a parent transaction in the Correlator field; otherwise a = 0.

b = 1 if the application is requesting that the agent generate a correlator for the transaction (the one indicated by this arm_start()), otherwise b = 0. If a correlator is being requested, the data buffer should be 256 bytes, to allow for a variable size correlator.

c = 1 if the agent is returning a correlator in the Correlator field. When set, the value in the Correlator field overlays any previous value. This flag will only be set when three conditions are met, otherwise c=0:

  1. The application has set bit b = 1.

  2. The agent supports this function (agents that only support version 1.0 of the ARM API do not).

  3. The agent is running in a mode where the generation of correlators is enabled (that is, there might be an installation policy to disable the generation of correlators, either temporarily or permanently).

If this bit is not set to 1, there is no correlator, and therefore the application should not forward the contents of the Correlator field.

d = 1 if the application is requesting that the agent trace this transaction. This might be done when a dummy test transaction is being executed, or when an error has occurred. Each agent can choose how and if it should honor the request, and administrators who configure the agent may establish the policy.

Second Byte (bit8)

abcdefg0, where a through g each denote the value of a bit flag:
a = 1 if a value is passed in Metric #1, otherwise a = 0
b = 1 if a value is passed in Metric #2, otherwise b = 0
c = 1 if a value is passed in Metric #3, otherwise c = 0
d = 1 if a value is passed in Metric #4, otherwise d = 0
e = 1 if a value is passed in Metric #5, otherwise e = 0
f = 1 if a value is passed in Metric #6, otherwise f = 0
g = 1 if a value is passed in String #1, otherwise g = 0

It is perfectly permissible for an application to pass none or some of the metrics on each call, and to change which metrics are passed from call to call. This holds true for arm_start(), arm_update(), and arm_stop() calls. The one requirement that must be adhered to is that the meaning and position of the field must have been defined with the arm_getid() call (see Format of Data Buffer in arm_getid for the format of data buffer in arm_getid()).

Third Byte (bit8) = 0

Fourth Byte (bit8) = 0

Metric #1 8 bytes The metric fields are used by the application to pass useful information about the transaction or the state of the application to the measurement agent. The field contains one or two integers, or a string variable. The use of the field and the format of the field are determined by the buffer passed on the arm_getid() call (see Format of Data Buffer in arm_getid for the format of data buffer in arm_getid()).

See Choosing a Data Type for more information on choosing a data type, and Data Type Definitions for data type definitions.

Metric #2 8 bytes Same as Metric #1.
Metric #3 8 bytes Same as Metric #1.
Metric #4 8 bytes Same as Metric #1.
Metric #5 8 bytes Same as Metric #1.
Metric #6 8 bytes Same as Metric #1.
String #1 32 bytes A string variable of up to 32 characters. The string is not NULL terminated, and is padded with blanks if it is less than 32 characters. Any information can be included in the string. Examples would be a part number being processed, or an error code.
Correlator   The field has two different uses depending on whether it is passed on the call from the application to the measurement agent, or if it is passed in the return from the agent:

  1. The application can pass in the correlator from a parent transaction to the agent. This allows the agent to correlate the parent transaction to the component transaction being started with this arm_start() call.

  2. The agent can return a correlator for the transaction being started by this arm_start() call. The application could then pass this correlator to applications that it invokes, and they in turn could pass it as the parent correlator in arm_start() calls that they make.

If the correlator returned bit is set (Flags First Byte c=1), the application can either pass the entire 168 byte correlator. Or if you want to optimize, the application can choose to read the correlator length field and only pass the number of bytes containing data, starting with the 2 bytes of the correlator length.

See Transaction Correlation for more information on correlating transactions. Also, see Measurement Agent Information for more information on the content of the correlator.

  Length
2 bytes
The Correlator length field (unsigned 16) specifies the length of a correlator (including this field) generated by a measurement agent (when bit c is set in the first Flags byte).

If this value is zero, it means that the agent is not returning a correlator, and therefore there is no reason to pass this correlator on to other parts of the application (or servers that it calls).

This field is considered a part of the correlator and must be included in the forwarded correlator data.

  Data
0-166 bytes
The Correlator data field is used to show the parent/child relationship between transactions. (Note: the application instrumenter has no need to understand the correlator format, as it is opaque).


Format 2
In the arm_update() calls with a Format field containing the value 2, the buffer may have the following format:

Format Size 2 (int32)
Data 1020 bytes
(maximum)
Contains the data. The length of the buffer is determined by the data_size parameter. The format of the data is not defined, but it is suggested that the data be formatted as plain-text characters so it can be understood without requiring a special formatting program. The agent cannot summarize the data over an interval, it must be treated as trace data. One suggestion is to format all information as plain-text characters so it can be read by a person without a special formatting program.

Note that because the data in an opaque buffer cannot be summarized, and processing by the agent may consist of logging the data to a trace file, many calls at a high frequency could result in a loss of data or a slowing down of the system, most likely due to an excessive amount of file I/O. Therefore it is recommended that the call be used only in special situations. NULL termination is not required.

Three Ways to Instrument within a Transaction Instance

There are three methodologies for instrumenting within a transaction instance. The first two are useful when the transaction is within one application. The last one is useful when the transaction is distributed across applications or systems.

  1. Instrument a transaction using arm_update() as a heartbeat, when it is an operation that takes a long time to complete (several minutes or hours) and you want to show the overall progress of the transaction in numeric form.

    If these transactions have different steps associated with processing each record, you may want to instrument these steps with component transactions (as described below), or use repeated calls to arm_update() to show the overall progress of the transaction. For example, the transaction may process a million records. A call to arm_update() could be made for every 1000 records or every minute of processing. This could show the progress of the transaction based on the number of times arm_update() was called or with one or more application-defined metrics.

  2. Instrument a transaction using component transactions when it is a long transaction that has many steps. A transaction can be defined for the overall transaction and then nested transactions can be defined for each of the steps. A step might represent a single discrete operation, or it could represent a large number of operations, such as copying 1000 files. This allows for the monitoring of each of the steps as well as the overall transaction.

    For example, step 1 takes about 20 minutes, step 2 takes about 40 minutes, and step 3 takes about 10 minutes. Each step can have a defined transaction as well as the overall transaction. So you would define 3 component transactions monitoring each step, plus one transaction that monitors the overall transaction.

  3. Instrument using transaction correlation when the transaction has components that span several applications or systems. This approach is more complex than the previous two as it requires changes to all the applications involved in processing components of the transaction, but it is the most accurate way to track transaction response time spanning systems.

Internationalization

The ARM API is designed to enable applications to use native code pages and languages, and for measurement agents to be able to support many different languages. Users of agents should contact the providers to see if the agent supports the needed code pages and languages.

The ARM API supports any code page as long as no characters are encoded with binary zero bytes (octets). This is because most strings are passed as NULL terminated strings, and the NULL terminator character is a binary zero byte. If a binary zero byte is encountered before the end of the string, the agent would interpret the zero byte as the NULL terminator and truncate the string. Most code pages meet this requirement.

These are code pages that contain binary zero bytes, but there are alternate ways to encode the characters. A well-known example is the Unicode standard. In its native format using 16 bit characters (UTC-2), there are binary zero bytes. However, the UTF-8 encoding of the same Unicode characters does not contain binary zero bytes, and this format is entirely compatible with the ARM API.

Agents that support native languages will often use the following technique. When the application links to the agent it links to a part of the agent that executes in the same process space as the application. Typically this small part of the agent communicates with the main part of the agent across an inter-process communications (IPC) channel. The small part of the agent that executes in the same process as the application can issue an operating system call to find out what code page and language the process is using. It can then pass this information to the main part of the agent, and the main part of the agent can convert from the native code page as necessary.

There are the following three restrictions on the use of native languages.


Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy of this publication.

Contents Next section Index