In addition to the responsibility of transferring system metrics, the DCI design must also meet the goals laid down in the introduction. The design of the interface must be especially concerned with the goals of extensibility, portability, and with gathering of metrics from multiple sources.
The problem to be solved by the DCI is a communications problem, in this case, the communication of system metrics between multiple information providers and multiple information consumers where all of the providers and consumers reside on a single system. Although metrics information comes from a single system, the "system" is not a monolithic entity, but a collection of services. These services include the operating system (what is traditionally thought of as the "system"), user space server programs, and applications. Thus it is useful, in spite of the DCI single system scope, to describe the solution in networking terms.
The DCI solution can be seen in
In
Both the providers and consumers use the same interface - the DCI - to perform their tasks, although they use different aspects of that interface. The "service" provided by the DCI is a set of functions that can be used by the metrics providers and consumers. These functions give providers the ability to transmit metrics and consumers to receive metrics without prior knowledge of each other's existence or of the underlying transport mechanism. This work is carried out invisibly by the underlying service, referred to as the DCI metrics service or server.
Thus the Data Capture Interface can be completely described by its set
of functions, the structure of the transmitted data, and the behaviour
of the DCI Server.
The details of this specification can be found
in
The relationships between consumers, providers, and the DCI service can
be seen in
In its role as a metrics
Most systems have multiple transport mechanisms that can be used to pass data between the metrics providers and the metrics consumers. These mechanisms can be divided into two classes. The first class requires the metrics provider to actively participate in the transport. The second class requires no action on the part of the metric provider other than the registration and maintenance of the statistics.
An example of the latter class is shared memory: the metrics provider indicates the location of its metric to the DCI service, and the service then provides this address to interested metrics consumers. An example of a transport mechanism which requires provider action is sockets: the metrics provider would have to wait for and reply to consumer requests for its metrics.
The type of transport is unspecified and completely up to the DCI implementor. A sample of possible transport mechanisms are shared memory, proc file system, Streams, Mach messages, sockets, pipes, remote procedure calls, and files. An implementation could even use more than one type of transport mechanism. An example is the use of system calls to acquire kernel metrics and a user space IPC mechanism, such as shared memory, to acquire application metrics.
It is very important to realise that the type of transport mechanism is invisible to the providers and consumers. What is specified by the DCI is a small set of generic methods that providers can use to send their metrics. When a provider registers its metrics in the name space, it specifies the method to be used by that provider to supply metric values. When a consumer queries a metric's value, the particular method registered by the metric provider is invoked to obtain the desired data.
The DCI Server stores metric attribute structures. These structures describe the characteristics of each metric and are used to allow the metrics to be self-identifying. Metrics consumers do not have to have any prior knowledge about metric characteristics such as its data type, units, etc.
The DCI Server
stores access control information. Different
systems have varying requirements for how strictly system information
should be restricted. The specification of a hierarchical name space
and the storage and use of access control information in that name
space allows the DCI to meet the range of requirements.
The choice of how much access control is used is entirely up
to the implementation, this specification simply makes access control
possible. This subject is described in more detail in
The metric server's maintenance of a name space implies that a long term relationship has been established between a provider and the server. The extent to which this relationship is preserved in the event of subsystem failure is implementation defined. In particular, the impact of the abnormal termination of the server (or any provider) on metrics that have been registered in the namespace must be defined by the implementation.
The drawn boundary very definitely does not indicate the division of functions between system and user address spaces. Any system specific mechanisms necessary to provide the service functions and metrics transport are not within the scope of this specification. This restriction of the scope of this specification makes it possible to meet the goals of implementation on a wide range of systems and support for application metrics.
Also the existence in
Metrics are grouped into metric classes. For example, all statistics relating to per-thread cpu statistics could be grouped into a single metric class. A metric class is only a template: it must be instantiated by a metric provider before the actual data available for the class's metrics can be identified. For example, the provider of per-thread cpu statistics would instantiate this class for each thread. The thread_id would then provide the additional information required to identify a specific thread's cpu metrics.
The purpose of the DCI metric name space is to establish a unique name for a metric. This name consists of three parts:
A fully qualified metric name consists of a metric class identifier, a metric instance identifier and a metric datum identifier. Once a name for a metric has been registered by a metrics provider, this name can be returned by the library routines which list registered metrics, and can also be used as an argument to routines that return values for the metric. The textual form of the metric name can be written as:
The metric class identifier names an abstraction, meaning that the name has no physical representation in the system. Metric class identifiers indicate a unique location in the metric class hierarchy. For example, a metric class identifier of {datapool cpu per_thread} might indicate the class of per-thread cpu statistics. This name does not indicate any particular data (since the metric instance identifier, thread_id, has not been specified). Nor is a specific datum indicated (for example, dispatch_count, queue_length). The metric class identifier identifies a class (or template, record or struct) containing metric datum identifiers. This class is instantiated by the provider for a particular set of instance identifiers, that is, threads.
When used with a metric class identifier, the metric instance identifier uniquely identifies a specific metric class instantiation. The instance identifier can correspond directly to some system object, such as a process identifier, a device number, etc. Like the metric class identifier, the instance identifier can have multiple levels. This flexibility allows for multi-dimensional metric classes, for example, per-thread/per-disk I/O statistics. The metric class identifier for such a class might be {datapool io per_thread per_disk}; a metric instance identifier would consist of two parts: a thread_id and a disk_id - for example, {8145 disk0}.
Note that there is a special kind of instance identifier that
is used for classes that have a single instantiation. Such classes
typically include metrics of a global (or system-wide) nature,
for example, refer to the global physical I/O counters class in
the Data Pool Definitions specification.
Such classes are
registered as having UMA_SINGLEINST instance types. The consumer
then provides an instance identifier of 0 to reference this
single instance. See
When used with a metric class identifier and a metric instance identifier, the metric datum identifier serves to uniquely name a metric. This metric corresponds either with a statistic (for example, dispatch_count), or an event (for example, thread termination). A special value (or wildcard) for the metric datum identifier can be provided to indicate that all metrics within an instantiated class are involved in a particular DCI operation.
Although DCI operations are optimised for transfer of information at the instantiated class level, one can also perform operations at the metric datum (within an instantiated class) level. This ability is used, for example, to wait for individual events or collections of individual events which are not members of the same class. The datum identifier specifies a metric within a fully qualified class and instance identifier.
Consider the following complete name space example illustrating the class "I/O Device Data" and subclass "Disk Device Data". This example is drawn from the companion specification, UMA Performance Measurement Data Pool Definition (see reference DPD). The disk device data subclass could have a metric datum that returns "Number of Blocks Read". This metric class could be represented symbolically as:
or the corresponding numeric name, such as 1.11.2. Note that only a single 4 byte integer is used for each level in the metric class hierarchy.
If one were interested in collecting only the number of blocks read for disk0, then the metric identifier used in the DCI routines must explicitly list the class, instance, and datum. This can be represented symbolically as:
Like the metric class identifier, the instance identifier can have multiple levels. Unlike the metric class identifier, each instance identifier level represents an instance of an existing system value and can be multiple bytes long.
Extending the above example, if a machine has a complicated I/O system that consists of a channel/bus/controller/disk hierarchy, then the number of blocks read on the first channel, second bus, first controller, and second drive would be identified by:
{ { datapool io_data disk_data_by_busaddress } { chan1 bus2 cont1 disk2 } { blocks_read } }
These examples serve to introduce but not explicitly define the DCI name
space. Refer to the specification in
There is an important point to keep in mind here. Although this specification describes how to implement a secure version of the programming interface, it does not mandate a particular security level. The choice of how secure a particular implementation should be is for the designer to decide. This programming interface should be flexible enough so that security levels can range from no security checks at all to the highest mandated standards without any modification to the interfaces.
In summary, a secure implementation must be able to allow for discretionary and mandatory access control, and it must prevent the creation of covert timing and storage channels. To this end, this specification adapts a key principle in security: economy of mechanism. This principle means that a system's existing security mechanisms should be used for the implementation of secure access control to metrics. The implication is that this specification will allow for the use of those controls but not specify what those controls are nor how they are implemented. The advantage to this approach, from a security perspective, is that this interface will use known, proven, access control subsystems.
Several features have been added to the design to explicitly support secure implementations. The class/subclass hierarchy fits nicely into a security model. It allows the hierarchy to be ordered from least to most privileged information. There are some side effects to this hierarchical ordering. First, there must be separate metrics hierarchies for each subject. To allow multiple subjects with different access levels to register metrics at the same hierarchical level would create a security nightmare. This implies that the total metrics name space consists of a root, a level containing each class of provider's metrics, and sublevels for the metrics' classes and subclasses.
Another aspect of the design that is affected by security is the requirement that a chosen transport mechanism must be capable of asynchronously notifying the DCI Server that a provider has exited without unregistering metrics. This notification is necessary to allow the server to collect and discard defunct branches of the metrics name space.
From a practical point of view, most designers will choose to use file system access control mechanisms for their secure implementations. The file system used for the metrics hierarchy should be modelled after mechanisms used to implement secure temporary file systems. The latter solves the problem of how to handle file access to the same root by providing a multi-level directory access mechanism. For example, subjects with the highest access level could see all the files in the temporary file system while those with lower levels could only see those files appropriate to their level. (Some secure systems mandate that not only can a subject not have access to an illegal object but it cannot even know that the object exists.)
A final point in this section is that even though an existing file system is used for access control, there is no requirement that the file system also be used for data transport. There is a separation between the need to verify a consumer's access to metrics and how those metrics are delivered from the provider to the consumer.
As noted in
Of course, error checking has been minimised to produce a simple coding example.
This example uses macros or function calls which are not part of the API, but if such functionality was available, a sample output for a system with a single disk would look like the following:
****** Polled the metrics @ Wed Sep 14 13:44:45 1994
capacity c3780
sector_size 1000
track_size c000
addr e002
major 1
minor 6200
channel_paths 4
status 1
vendor The Disk Vendor
vendor_designation The Disk Model
cu_vendor_designation The Controller Model
#include <sys/time.h>
#include <sys/dci.h>
extern int errno;
extern DCIMetricId *makemetricid();
/* first level */
#define DATAPOOL 2 /* from Data Pool Definition documentation */
/* second level */
#define SYSTEM_CONFIG 11
/* third level */
#define DISK 10
/* macro to extract the classid from a metric id */
#define getDCIClassIdfromDCIMetricId(x) ((DCIClassId*)((char*)(x) + (x)->classId.offset))
main()
{
DCIMetricId *midp=0;
DCIClassId *cidp; /* class for time metrics */
DCIReturn *returneddata=0; /* generic return buffer */
DCIReturn *classattrdata=0; /* classattribute data */
DCIRetval *rtvl; /* individual return status */
DCIStatus status; /* return status of functions */
DCIClassAttr *classattr; /* pointer the class attributes */
DCIHandle handle; /* descriptor returned from dciOpen */
int class[] = { DATAPOOL, SYSTEM_CONFIG, DISK}; /* metric class desired */
int polling = 1; /* if 1, continue to poll */
void *thedata; /* pointer to the returned data */
/* initialise the connection to the DCI Server */
status = dciInitialize(DCIVersion *) NULL, (DCIVersion *) NULL);
if (!(status & DCI_SUCCESS)) {
dciPerror (status, errno, 0, "dciInitialize failed");
exit(1);
}
/* make a metric id using an application provided function and
* the specified class. The metric id will contain wildcards
* in all instance levels.
*/
midp = makemetricid(class, 3);
/* open up the metric. Let the library allocate the return data buffer */
status = dciOpen(&handle, midp, 1, &returneddata, 0, 0);
if (status & DCI_FATAL)
goto quit;
/*
* obtain the class attributes. These are used to extract particular
* metrics when a whole class of metric data is returned. The DCI
* server will automatically allocate the correct size return
* buffer. Extract the actual class attribute super structure
* from the request returned data.
*/
cidp = getDCIClassIdfromDCIMetricId(midp);
status = dciGetClassAttributes(handle, cidp, 1, &classattrdata, 0);
if (status & DCI_FATAL)
goto quit;
rtvl = (DCIRetval *)(&classattrdata->retval);
classattr = (DCIClassAttr *)((char *)rtvl + rtvl->dataOffset);
while (polling) {
/* free the return buffer for reuse */
dciFree(returneddata); returneddata = 0;
/* poll for the data placing the data and return
* status in the same buffer.
*/
status = dciGetData(handle, midp, 1, &returneddata, 0,
0, 0, (UMATimeVal *)0);
if (status & DCI_FATAL) {
/* application provided error printing call */
dciPerror(status, errno, 0, "dciGetData failed");
goto quit;
}
/* the poll was successful. Call a compute-and-print
* routine. The routine takes the class attribute
* structure and the data returned and derives the
* data desired.
*/
rtvl = (DCIRetval *)(&returneddata->retval);
if (rtvl->dataSize) {
thedata = (char *)((char *)returneddata + rtvl->dataOffset);
computeandprint(classattr, thedata);
}
sleep(10); /* the polling rate */
}
quit:
/* free any buffers the library allocated, close the handle and
* shutdown the connection to the DCI Server.
*/
if (midp)
dciFree(midp);
if (returneddata)
dciFree(returneddata);
if (classattrdata)
dciFree(classattrdata);
status = dciClose (handle);
dciTerminate();
exit(0);
}
/* create a metric Id for the given class with wildcarded datumId
* and instances. Note that memory allocation has been trivialised
* for the example. The caller must free the metric id when done.
*/
DCIMetricId
*makemetricid(int *classarray, int numclasses)
{
DCIClassId *cidp;
DCIInstanceId *iidp;
DCIMetricId *midp;
int size, i;
/* for ease of example, overallocate a DCIMetricId structure */
midp = (DCIMetricId *)malloc(128);
if (!midp)
return(midp);
/* fill in the classid, using macros not part of the DCI API.
* The data for the classid will be appended to the end of the
* base DCIMetricId structure.
*/
midp->classId.offset = sizeof(DCIMetricId);
cidp = (DCIClassId *)((char *)midp + midp->classId.offset);
cidp->identifier.offset = sizeof(DCIClassId);
cidp->identifier.count = numclasses;
cidp->size = dcisizeof (cidp) + (numclasses * sizeof(UMAUint4));
for (i=0;i<numclasses;i++)
dciclassidlevel(cidp,i) = classarray[i];
/* fill in the instanceid, using macros not part of the DCI API */
midp->instanceId.offset = midp->classId.offset + cidp->size;
iidp = (DCIInstanceId *)((char *)midp + midp->instanceId.offset);
iidp->inputMask = DCI_ALL_INSTANCES; /* wildcard overrides actual instances */
iidp->outputMask = DCI_ALL;
iidp->size = sizeof(DCIInstanceId) + 4;
/* fill in the metric id now */
midp->datumId = DCI_ALL; /* all metrics */
midp->size = 128; /* the size allocated */
return(midp);
}
/* Takes a class attribute structure and the data and prints the
* current values. For ease of example, assume all the data is
* 4 bytes in size (except for textstrings) -- normally one
* must check the data type and correctly print the data.
*/
computeandprint(DCIClassAttr *ca, char *databuf)
{
int numdatums, i, currentdata;
char *labeltext, *currentstr;
struct timeval tv;
DCIDataAttr *da;
gettimeofday(&tv, (struct timezone *)0);
printf("****** Polled the metrics @ %s", ctime((time_t *)&tv));
/* number of datums in the class */
numdatums = ca->dataAttr.count;
da = (DCIDataAttr *) dciclassattrdataattr(ca);
/* for each datum, print the label and current value of data */
for (i=0; i<numdatums; i++) {
labeltext = (char *) &
((UMATextString *)(dcilabelascii(dcidataattrlabel(da))))->string;
currentdata = *(int *)(databuf + da->offset);
if (da->type == UMA_TEXTSTRING) {
/* the offset actually points to a UMATextString */
currentstr = (char *) &
((UMATextString *)(databuf + currentdata))->string;
printf("%-25.25s %s\n", labeltext, currentstr);
} else {
printf("%-25.25s %8x\n", labeltext, currentdata);
}
/* next data attribute */
da = (DCIDataAttr *)((char *)da + da->size);
}
return(0);
}
These three factors, centralisation, transport, and security, dictate the size and difficulty of the implementation.
Firstly, consider centralisation. The initial architecture diagram
shows all interactions going through a centralised
DCI Server;
however this is not an implementation requirement. The
specification consists of the routines provided at the boundary layer
and the behaviour of the underlying transport mechanism, and not the
implementation details.
When implementing the DCI, one or more underlying transport mechanisms must be chosen. The choice of transport can be any system specific communications mechanism, such as sockets, shared memory, shared files, and various IPC mechanisms. When choosing a transport mechanism, the designer must consider the number of concurrent operations, speed, portability, if the implementation is intended for more than one operating system type, and the ability to detect inadvertent provider and consumer termination. The latter feature is important if the implementation saves connection related state, such as whether a consumer passed access control.
Finally, an implementor must choose the
implementation's security level. This
has the largest effect on a secondary choice, the DCI name space
implementation. As mentioned in
Contents | Next section | Index |