Systems Management: Data Storage Management (XDSM) API
Systems Management: Data Storage Management (XDSM) API
Copyright © 1997 The Open Group
Many of the interfaces described in this Chapter accept and return variable length data structures. For
are variable length.
Event messages are also
variable length. For detailed information on accessing individual elements within a data structure, refer
to the Data Structures definitions (see
An application that needs to use the features and functionality of the DMAPI must first call an
initialization function. This allows implementations of the DMAPI to perform internal initialization
procedures before providing service to an application. The DMAPI specification allows undefined
behavior if applications do not use the initialization call. Another purpose of this function is to return a
DMAPI implementation specific version string which may be used to determine at run-time whether the
DM application is running on the correct implementation.
The following function exists for initializing the DMAPI:
- perform implementation-defined initialization.
is an opaque identifier for an entity manipulated by the DMAPI. There are three fundamental
categories of handles; the
file system handles,
is a fixed constant of the implementation, and is used primarily in the
call when setting event disposition for the
event. There is exactly one global
handle in any DMAPI implementation.
File system handles
are used by many DMAPI functions to identify a file system. There is one file
system handle per file system. They are persistent and unique over time (per host) to the extent that a
given file system instance has a unique and persistent identity.
are the most common. These are the handles used to represent all types of file system
objects. Object handles are persistent and unique over time (per host) to the extent that a given file
system instance has a unique and persistent identity. They are governed by the following properties:
A unique object handle shall exist for each object visible in the file system name space.
Each object handle shall be unique within at least the context of a single host.
A unique object handle shall exist for each file system object not visible in the name
space, such as unlinked but still-open files.
Object handles must meet the following specific requirements with regard to uniqueness:
Object handles shall remain valid across any number of system reboots as long as
the corresponding file system object exists.
Object handles shall remain valid if the file object they identify is renamed within
a file system.
Object handles shall remain valid regardless of any operations on the file object
they identify. The only time an object handle becomes invalid is when the object
it identifies ceases to exist.
Object handles may be reused.
Some interfaces can only operate on object handles representing a specific type of file system object. For
can only be used to write to a regular file. When necessary to describe
restricted forms of object handles, the terms
etc. are used
in this document. The taxonomy of handle terminology is shown in the
Figure: Taxonomy of Handles
Handles are opaque; the length of a handle is implementation defined. DM applications should make no
assumptions about the length of a handle, as handles may be differing lengths even on the same file
system in some DMAPI implementations. Therefore, functions that use handles in their interfaces
specify two parameters: a
that provides access to the actual handle, and a
specifies the length of the handle. The DMAPI implementation allocates space for handles via the
When a DM application is finished with the handle, it should free the space via the
Most DMAPI functions take a handle as part of their interface.
Many events also provide a handle as part
of their event-specific data.
To convert from path names and file descriptors to handles, a number of
functions are provided, as described in the man-pages.
They are outlined below:
Create a file handle from a path name.
Create a file handle from a file descriptor.
Create a file system handle from a path name.
File handle comparison.
Free the storage allocated for a handle.
Determine if a handle is valid.
Hashes the contents of a handle.
Some legacy DM applications rely on the fact
that the DMAPI object handles are built from the
combination of file system ID, file inode, and generation number.
Such applications may require the
capability to decompose DMAPI handles into these components
and to build handles from these components.
interfaces are provided for this purpose:
Construct a DMAPI object handle.
Extract the file system ID from a handle.
Extract the inode generation count from a handle.
Extract the file inode number ID from a handle.
Construct a DMAPI file system handle.
Sessions can be thought of as message queues. The implementation of
the DMAPI enqueues messages on a session to make them
available to a
DM application. A DM application can also request the
to enqueue an application defined message on a
session. Sessions provide
a mechanism for a DM application to receive events.
A unique ID is associated with each session. A
session ID is of type
and is used to identify the recipient of an event
message. Session IDs are
opaque to DM applications.
Sessions are also the cornerstone of the recovery mechanism.
Sessions are governed by the following restrictions:
Sessions are not persistent across reboots.
Sessions are unique for as long as the system is up; they are not unique across reboots.
Sessions must be explicitly destroyed. If a process exits or otherwise aborts without doing
then the behavior of the session is dictated by the
constraints in the Sessions and Event Messages section, below.
A session is not tied to a process. Any process that has successfully executed
can use any valid session ID.
Session IDs should not be interpreted as file descriptors. The underlying implementation
may use file descriptors, but the DM application should make no assumptions about the
A session must be created via
before a DM application can communicate
with the DMAPI. When a session is created, it is possible to specify a previous instantiation of a session
that will then be assumed (taken over), which is useful for recovery purposes. The
function is atomic; if the call succeeds, the DMAPI guarantees that all old
messages that were enqueued on the old session are now part of the new session. When assuming an
existing session the old session is invalid when the call returns.
To shut down and destroy a session, a DM application may have to perform a number of operations,
such as ensuring that no more events are generated, responding to outstanding messages, and so
forth. If a DM application attempts to destroy a session that has outstanding event messages still
enqueued, an error is returned. It is assumed that
will only be called
after the application has ensured that no more events will be generated on the session.
The following functions are provided for manipulating an instantiation of a session:
Sessions and Event Messages
At any time, a session may have synchronous event messages that are in one of two states:
delivered and awaiting a response from the DM application.
From the standpoint of the DMAPI implementation, synchronous event messages that are in the second
state (delivered and awaiting a response) are outstanding. Asynchronous messages do not require a
response from the DM application, and therefore will never be in the outstanding state.
there are three event messages on the session. Each event message is identified by a unique
token. One synchronous event message has been delivered to a DM application, and therefore the session
has an outstanding message. The event message continues to exist until some DM application
responds to it. The two other event messages on the session are just enqueued and have not yet been
delivered to a DM application.
Figure: Message States
As part of sending a synchronous event message, the implementation of the DMAPI may convey access
rights to one or more objects in the message. If a DM application fails (dies, hangs, or otherwise
malfunctions), a recovery process must determine the outstanding event messages and take care of the
associated events to prevent the system from hanging. Since tokens are tied to a session, and are always
associated with a synchronous event message, it is possible to obtain all outstanding event messages
simply by knowing all the tokens.
An active session is not needed to obtain the list of all valid sessions in the system. This allows a
recovery application to interrogate all sessions even in the unlikely event the system runs out of sessions.
Recovering after a DM application failure is very different from recovering from a system crash. The
requirements of each individual DM application will be unique with respect to recovery from a system
crash; it is beyond the scope of the DMAPI to provide all the tools for a DM application to recover itself
in this instance.
The following interfaces exist to manage session and event message recovery.
Tokens are a reference to state associated with a synchronous event message. They are always associated
with one and only one synchronous event message. When responding to an event message, the same
token that was delivered with the message must be supplied. Tokens are the identifier that a DM
application must use to reference a synchronous event message; the DM application presents the token to
the DMAPI and in return, is provided with the state associated with the event message.
Like session IDs, tokens are opaque to DM applications. There is no security expressed or implied by the
possession or use of a token. If a DM application can "guess" the value of a token, then it can use it
(assuming that it can supply the appropriate session ID and has other system-dependent privileges).
Tokens have the following properties:
Tokens are not owned by any particular process.
The DMAPI does not mandate authentication or authorization of the process using the
token; if a process knows a token's value, it can use it.
Tokens are meaningful only within the session under which they were created unless a
session is assumed from a previous session.
There are two primary rights; DM_RIGHT_SHARED and DM_RIGHT_EXCL. The third access right,
DM_RIGHT_NULL, is not considered a primary access right, since it conveys no rights to an object.
Synchronous event messages contain access rights to one or more object handles. Some event messages
contain multiple file handles. The event message contains access rights to all the files in the event
message; the DM application must use
to determine what rights for the given
file handles, if any, are present in the message.
If a DM application needs to obtain access rights for more than one handle, it can use the same token in
repeated function calls to
It is not necessary
for a new message (and its corresponding token) to be created via
each handle the DM application needs to acquire access rights to.
As already noted, tokens do not belong to any particular process. An application presents a token to the
DMAPI to reference and identify a specific access right. When a DM application is informally described
as "holding a token" or "obtaining an access right", a more precise description would be that an
outstanding access right exists, is encapsulated within a synchronous event message, is associated with a
specific session, and is identified by a specific token.
Many DMAPI functions require a token that references a specific access right to an object. In some
cases, it may be advantageous for a DM application not to have to go through the steps of explicitly
creating a token and acquiring the necessary access rights just to call a DMAPI function. Therefore,
many functions accept either a token that references the required rights, or the special value
DM_NO_TOKEN, that indicates the absence of a token.
If a DM application does not pass a token to a DMAPI function that normally requires a token as one of
its parameters, then the function acquires the appropriate rights automatically on behalf of the DM
application. In this case, the DM application must be willing to be blocked. The DM application may or
may not be blocked interruptibly, depending on the implementation
of the DMAPI; see the man-page definition for
for more information.
The DM application must use caution when availing itself of this optimization. If a DM application
holds a token that references a right to an object, but fails to present it when calling a DMAPI function,
then the application is in danger of deadlocking with itself. This is because the DMAPI function will not
be able to acquire the necessary rights on behalf of the DM application since the application already
holds a token referencing those rights. The DM application should also not use this method of acquiring
access rights if it is receiving synchronous events via
Since one of the
synchronous event messages may contain a token that references an access right the DM application may
be trying to obtain, the application will again deadlock with itself.
The existence of any outstanding DM_RIGHT_SHARED access rights for a file system object will block
all attempts from all processes performing the following operations:
all data modification, such as via write(2)
The existence of the DM_RIGHT_EXCL access right will block all attempts to perform any operation on
the file system object, with the sole exception of the stat(2) family (
The locking properties of access rights are summarized in the
||data write, object destruction
||all but stat(2)
Table: Access Right Properties for Files
Notice that the above descriptions do not say that other processes are blocked; they say that all processes
are blocked. This is where the distinction that DM applications
do not really "own" access rights comes into play.
The only way a DM application can distinguish itself from other processes that should be blocked is by
value identifying the appropriate token, and passing it in with any
operations that are to be performed on the file. It follows from this
that once a DM application has "obtained" a DM_RIGHT_SHARED
or DM_RIGHT_EXCL access right, either directly via a
call or indirectly via an event message, the DM application must be
extremely cautious when performing operations on file system objects. Generally, it must restrict itself
to using interfaces containing
For example, calling
and requesting DM_RIGHT_EXCL does not make a
DM application the owner of the right;
merely creates the right and
encapsulates it in the synchronous event message referenced by the token. Once that happens, all
operations against the file system object will be blocked as described above, even if they come from the
same process that called
Only operations that are part of the DMAPI and
arguments are safe for DM application to call at this point, because those
interfaces are the only way DM application can distinguish themselves
as "owning" the DM_RIGHT_EXCL right.
Upgrading Access Rights
When requesting access rights to an object via
the requested right may not
be immediately available. If the DM application has specified that it wants to block until the right
becomes available, the DM app may or may not be blocked interruptibly. The implementation of the
DMAPI will specify the semantics for interrupting blocked processes.
If a DM application holds a DM_RIGHT_SHARED access right, it can attempt to upgrade the right to a
DM_RIGHT_EXCL in a non-blocking manner via
If the DMAPI
implementation cannot grant the request, however, the DM application will most likely have to release
the DM_RIGHT_SHARED right, and request DM_RIGHT_EXCL access to the object via
in a blocking fashion.
A DM application may also request to upgrade a DM_RIGHT_SHARED access right to a
DM_RIGHT_EXCL in a non-blocking manner via
if the DMAPI
implementation is able to upgrade the right without releasing the DM_RIGHT_SHARED access right.
The state of the object cannot change while the DM application is waiting for an exclusive right via
However, the state of the file may change if the request to upgrade is via
To provide some indication that the file changed while the application was
blocked, the DMAPI provides the notion of a change indicator that can be interrogated via
This change indicator is modified by any operation that modifies file data or
metadata. The change indicator is not persistent and has no meaning across reboots. Its only purpose is
to indicate to the DM application that the file may have changed since the last time the change indicator
The normal sequence of events for attempting a lock upgrade where the current shared lock must be
dropped would be as follows:
Obtain current change indicator.
Release shared right.
Request exclusive right (blocking operation).
Obtain new change indicator to see if the file has changed.
The following functions for manipulating access rights are provided:
Placing Holds on Objects
If a DM application needs to make sure an object does not go away after releasing all access rights to the
may be called to obtain an object hold. The effect is to prevent the
object from being flushed out for the duration of the hold and essentially making non-persistent data
management attributes temportarily persistent. Responding to an event releases all holds associated
with the event.
The following functions are for manipulating object holds:
Place a hold on a file system object.
Release a hold on a file system object.
Query for a hold on a file system object.
Finding Extents and Punching Holes
Data Management applications often need to release the on-disk blocks of a file to free up space on a file
system. Likewise, if a large, but sparsely populated file is to be backed up efficiently, a DM application
needs to know where the file has non-null data and where the file has holes. These operations may not be
supported on all file system types;
can be used to determine if the underlying file
system supports punching holes.
The DM application is responsible for maintaining accurate information about the location of any
holes in the original file when a sparse file is made non-resident. It is assumed that the DM application
to determine where actual storage is located, and only perform
operations on the portions of the file that contain data.
The following functions return information about a file in terms of a
structure, as defined
in the Data Structures chapter (see
These functions, which do not affect any of a file's time stamps, are
provided for managing the storage space for a file:
Return the allocation information for the file specified by the handle.
Interrogate the DMAPI implementation for size and offset around the area that the DM
applications want to punch a hole.
Logically write zeroes in the indicated region of the file identified by the handle, thereby
allowing the DMAPI implementation to release media resources associated with that region.
None of the file's time stamps are updated, but the file's DMAPI
change indicator is updated.
Invisible Read and Write
Many data management applications must be able to access file data without altering the file's access,
modification, and change times, and without generating any events. The operations in this section do not
trigger events; they bypass the normal event delivery mechanism to prevent a DM application from
receiving events generated by itself.
The invisible write function by default writes data asynchronously. If a DM application requires that data
written to a file be flushed at certain times, it can either set a flag specifying that writes happen
synchronously or it can call a separate function to flush the file's contents to media.
The following functions, which do not affect any of a file's time stamps, are provided:
Do a read without updating any of the file's time stamps.
Do a write without updating any of the file's time stamps. The DMAPI change indicator is
updated. This function can execute synchronously or asynchronously.
Synchronize a file's in-memory state with that on physical medium.
Managed regions provide a mechanism for a data management application to control a specific region of
a file. Managed regions provide granularity finer than the entire file for data events such as read and
write. Their use is particularly important for very large files that may be larger than the actual amount of
available disk space.
A single managed region is represented by a
structure. The set of managed regions for a
file is a collection of these structures. See Data Structures
for a definition of this structure.
The generation of events for a managed region
is controlled by a flags field in the
structure. The possible values for this field are a bitwise OR of one or more of the following:
Generate a synchronous event for a read operation that
overlaps this managed region.
Generate a synchronous event for a write operation that
overlaps this managed region.
- Generate a synchronous event for a truncate operation that
overlaps this managed region.
or the following value:
Do not generate any events for this managed region.
The events defined above are the only synchronous data events that are defined for a managed region.
Only one of the above events will be produced for a particular
operation, no matter
how many managed regions the operation may overlap.
The example in
Overlapping of Events across Managed Regions
below shows a read operation that overlaps two managed regions that have read events set.
Figure: Overlapping of Events across Managed Regions
Overlapping of Events across Managed Regions
a read event is produced for Managed Region A. The arguments passed to the DM
application in the event message have the offset and length of the
operation; it is up to the
DM application to determine which managed regions the operation will overlap. Once the DM
application responds to the event message, the DMAPI implementation allows the read to continue.
As an example, if a DM application fills the managed region A above, but not B, and continues the
operation, the behavior of the entire read operation is undefined.
Triggering one event per file operation eliminates the necessity
of having the DMAPI implementation re-evaluate all managed regions
involved in a given operation. Otherwise, the DMAPI implementation
could be forced to generate multiple events per managed region
for a single I/O operation.
To change the set of managed regions, the DM application must obtain DM_RIGHT_EXCL rights to the
object. Since managed regions may or may not be persistent, the DM application must be prepared to
expect a debut event and to use
to download the set of managed regions for a file.
Managed regions may be constrained by the following restrictions:
Implementations may choose to support only one managed region per file, which may
always be the entire file.
Managed regions may not overlap. Each region is a distinct subset of the file.
Only regular files may be partitioned into multiple managed regions.
A DM application can determine the properties of the DMAPI managed region implementation by
The following functions, which do not affect any of a file's time stamps, are provided for manipulating
the managed regions of a file:
Return the set of managed regions for a file.
Set the managed regions for a file. The DMAPI change indicator is updated.
File Attributes and Bulk Retrieval
Attributes need to be retrieved for a single file, a directory, or a whole file system. The attributes
returned are defined by the
structure. There are a number of methods for obtaining these file
function obtains the attributes for a single file specified by
the file's handle.
The attributes and names for all files in a directory can be obtained through use of the
The basic attributes for all files in a file system can be obtained
through use of the
The basic attributes, plus a named DM attribute, for
all files in a file system can be obtained through the
use of the
For the second, third, and fourth methods, the application either provides a buffer large enough to
contain all retrieved attributes or more commonly (particularly for the last option) the application makes
iterative calls through the interface. A file system must be mounted to have its attributes retrieved via
any of the above methods.
DM applications often need to set a file's metadata to specific values transparently. For example, a
backup application might want to set a file's time stamps to their original value when the file is restored.
Specific fields from the
structure are encapsulated in the
structure is used to set various metadata fields to specific values via
DM application must initialize an opaque "cookie" which
provides location information to the DMAPI. Each
can use this
cookie to determine location information from one call to the next.
The file's change indicator can also be retrieved using
This change indicator
is modified by any operation that modifies file data or metadata. DM applications can use the change
indicator to determine if a file may have changed state; if the indicator is the same between two calls, the
file is guaranteed not to have changed. If the indicator is different, the file may (but not necessarily) have
changed. This is especially useful for attempting lock upgrades, as
described in Upgrading Access Rights,
Upgrading Access Rights
The following functions, which do not update any of an object's time stamps, are provided for obtaining
Get the specified attributes in bulk for objects in a the given file system with a specific DM
Get the specified attributes in bulk for the given file system.
Get the specified file attributes and names in bulk for the given directory.
Initialize the location cookie for successive
The following function, which does not update any of a file's time stamps, is provided for obtaining the
attributes of a single file:
The following function, which does not update any of the file's time stamps (other than those specified)
as a side effect, is provided for metadata modification:
Data Management Attributes
data management attributes is a DMAPI implementation option. Some DMAPI
implementations may not support persistent opaque data management attributes, while others may not
provide support for persistent non-opaque attributes such as event lists. DM applications should use the
function to determine what the implementation provides.
A persistent attribute is one which stays defined across reboots. A non-persistent attribute is one that
may disappear at any time without notice (typically during inode flush). For more information on how to
manage non-persistent attributes, refer to the debut event.
Non-opaque Data Management Attributes
There are two types of non-opaque attributes:
The DMAPI implementation may support persistence of
managed regions. The
function returns the number of persistent managed regions supported.
Event Bit Masks
Event bit masks encode which events are enabled for a
particular file within a finite number of persistent bits.
Opaque Data Management Attributes
The DMAPI persistent opaque attribute mechanism provides a set of (name, value) pairs associated with
a file system object. The name is a fixed length 8 byte (defined as DM_ATTR_NAME_SIZE) opaque
value determined by the DM application and is interpreted as a byte sequence. Attribute names starting
with ASCII "_" (0x5F) are reserved for future common attribute labels. In order to prevent name clashes,
the first three bytes of the attribute name are currently assigned through a reservation process.
The prefix should identify the company whose DM
product is using the attribute, for example, Cheyenne has "CYE" reserved.
To register a 3-byte prefix, send e-mail to email@example.com, identifying
the company name and the requested name.
Registered prefixes can be checked
on the World-Wide Web at the following location:
The attribute value is variable length and also opaque. It is recommended that the values be stored in
network byte order to support the movement of media between architectures. These attributes are
persistent across reboots.
If the DM implementation supports opaque attributes, a limited number of attributes may be stored
persistently with each file. Each attribute may store up to DM_CONFIG_MAX_ATTRIBUTE_SIZE
bytes of data per file. The value of DM_CONFIG_MAX_ATTRIBUTE_SIZE is obtained via
and has a lower bound of 32 bytes. The total amount of space available for
storage of all persistent attributes on a file system is bounded by
Associated with the file attributes is a per-file time
which is updated when attributes
are created, modified, or deleted, or when a new file
inherits its attributes from the parent directory.
time stamp may be the same as
as determined by the value returned from the
function with DM_CONFIG_DTIME_OVERLOADED.
is not overloaded, then any
operation that manipulates attributes does not modify the file's traditional
time stamps (
If DM_CONFIG_PERS_INHERIT_ATTRIBS (obtainable from
DM applications can mark persistent attributes as inheritable. If a directory has an attribute (such as
that has been marked inheritable and a file is created in the directory, then the file
would inherit the attribute. Attributes that are not marked inheritable are not copied.
DM applications mark an attribute inheritable on a per-file system basis and for specified file types. For
example, a DM application could mark the above attribute (lock_on_magnetic) inheritable for newly
created regular files only. Newly-created directories would not inherit the attribute.
Attribute inheritance is not persistent across reboots. If a DM application marked the
attribute as inheritable and the system were then brought down, the attribute would no longer be
inheritable when the system came back up.
The following functions are provided for attribute management:
The following functions are provided for managing inheritable attributes:
Mark an attribute as inheritable on a file system.
Mark an attribute as no longer being inheritable.
Get all the attributes that have been marked inheritable on a file system. This is especially
useful for application restart after a failure.
The DMAPI provides DM applications with the ability to monitor and manage the data in a file system
without having to export all the file system semantics from kernel space to user space via the event
interface. Events are generated by a DMAPI implementation, and then the messages are enqueued on a
session for delivery to a DM application.
The intent of the DMAPI is to support a single product on
any single file system. The DMAPI does not
preclude different products from different vendors operating on the same file system, but it is not
recommended. Different products on different file systems are fully supported by the DMAPI with
regard to event delivery.
Therefore, the following event restrictions exist:
Multiple sessions cannot register disposition for the same event on the same object.
Event messages are targeted to and enqueued on sessions; there is no explicit targeting of
an event to a specific process.
The behavior of event delivery when no session has requested to receive a particular event
with the given event has not been executed) is DMAPI
implementation-specific. The DMAPI implementation must document the behavior of the
system and has one of these three choices:
block the process that caused the event to be fired
fail the operation
not fire the event and allow the process to proceed as if there is no event
Certain events are optional in the DMAPI specification. It is recommended that for each file system
being managed by a DM application, that the application initially call
to determine which events are supported by the DMAPI implementation for that file system.
Setting Event Disposition
After creating a session, DM applications must register with the DMAPI to establish the disposition of
events for a file system (that is, what session the events will be sent to). The event list is the complete set of
all events, including managed region events, that the DM application is monitoring during the life
of the session. Since registration is on a per-session basis, this event list is not persistent across reboots.
It is not possible to register to receive events on anything other than the file system object.
Once a DM application has registered its event list and session with the DMAPI, it can begin receiving
event messages on a file system. Registration can be thought of as establishing the association between a
file system and a session, as it lets the DMAPI implementation know which session to send specific
event messages to.
The example shown in
Disposition of Event Delivery
illustrates the case where a DM application has registered with the file
system represented by "foo" for the
events. The event messages are delivered to
the application via session 42. The file
has an event list of
was previously set via
Figure: Disposition of Event Delivery
Disposition of Event Delivery
event is delivered to DM application 1, since that is the session for that specific event.
Multiple applications can register their session and event list for a file system. If two applications
attempt to register to receive the same event, the last application to register for the event will receive it;
prior registrations for the event are replaced.
If this were not the case, and replacement were done on an entire
event list, not a per-event basis, then it would not be possible
to have more than one active session registered for a file system.
Having each event in the event list handled individually allows
multiple applications to be active on the same file system
simultaneously, all handling different events.
Duplicate Event Registrations on a File System
Disposition of Event Delivery
would change if a second DM application registered for
Figure: Duplicate Event Registrations on a File System
Duplicate Event Registrations on a File System
events are now sent to DM application 2, via session 69,
not DM application 1.
events will still be delivered to DM application 1.
The burden is on the system administrator to ensure that two
different DM applications do not attempt to control the same
events on the same file system. In Figure 5, an alternative
would be to return an error
saying that an <event, file system, session> binding already exists.
Another option would be to send a special event to
DM app one, informing it that it no longer will be receiving
events. While these options could be implemented,
it is believed that the level of complexity is not warranted
for this version of the DMAPI.
The examples given above assume that the file system the DM application is monitoring is already
mounted. However, it is quite possible that a DM application wants to set itself up to monitor a file
system that is not yet mounted.
The "mount" Event
The restriction of only sending synchronous events to one session has special ramifications with regard
event. It is not the intent of the DMAPI to force a model
of one "super-daemon" that
listens for mount events, and then forwards the event to the appropriate recipient. However, there is a
special bootstrap problem with regard to receiving the
event before a file system handle is
available. To receive
events, a DM application must use the global handle in the
event will be sent serially to each session that has executed
The event is not broadcast to all sessions concurrently. The order in which the
DMAPI implementation sends the event to the sessions is not defined.
event will be sent for all file systems that support the DMAPI. Specifying the event in the
function is not allowed, since the event is not persistent. When the
event is received, the DM application can determine if it is interested in the file system that is specified
in the event message. If a DM application is not interested in the file system, then it must respond to the
with a code of DM_RESP_DONTCARE. The first DM application
that responds to the event with DM_RESP_CONTINUE and an error code of zero
prevents the event
from being sent to any of the remaining sessions. If any DM application returns an error
[DM_RESP_ABORT], then the
event will not be sent to any other session.
Figure: Mount Event Propagation
Mount Event Propagation
3 DM applications have specified via
that they want to receive the
event. The DMAPI implementation sends the
event message to DM application A in step
1, which is not interested in the event, so it responds to the event message with DM_RESP_DONTCARE
in step 2. The DMAPI implementation then sends the
event message to DM application B in step 3,
which determines that it wants to monitor the file system. It responds to the event message with a
DM_RESP_CONTINUE in step 4, so the
event is not sent to the remaining DM application C.
If all of the DM applications receiving
events return DM_RESP_DONTCARE, then the file
For recovery processing, many DM applications will need the name of the file system device and the
directory that it was mounted at. This information is made available via the
application restart, an application can get the same information via
A DM application would determine all the file systems
that were being monitored via
to obtain more information about the file systems.
The following functions are provided for manipulating the disposition of a session's events for a file
Set the disposition of a session's events on a file system.
Get the disposition of events for all file systems for a session.
Get the information that was delivered on a
event for the indicated file system.
Setting Event Notification
DM applications can specify that they need to receive certain events on an object. Events will only be
generated for these objects, not for all objects
in the file system (except for the
event, discussed specifically later in this section).
To set event notification on a object, the DM application must specify an event list for the object.
This object is specified via a handle. The handle can be either the file system handle when setting events
on a per file system basis, or a handle to a specific file system object. Executing
may or may not persistently store the eventlist with the object; it is
dependent on the particular implementation of the DMAPI. The persistence characteristic can be
determined via the
The DM application must specify the entire list of events that is to be generated for the object. If an
event list already exists for the object, it is replaced by the new one specified in the
If an event list was previously set for the entire
file system, and a subsequent event list for an object
in that file system includes an event that was set for
the file system (or vice versa), the result is undefined.
All events, with the exception of the managed region events and the
event, can be
specified in the
function. If the object has multiple managed regions, then
returns the union of all managed region events, in addition to the other
When an event is generated by the file system, the DMAPI implementation uses the session to determine
the recipient. Since DM applications must register with the DMAPI via the
specify the event list and the session, the DMAPI can easily determine the target session for any given
Some implementations of the DMAPI may not provide any persistent storage, even for event
notification. For these "zero bit" implementations, the DMAPI provides a
event before any
access is granted to the object. This
event should be specified in the event list when the DM
application sets its event disposition. The
event gives the DM application the ability to
download information (such as event lists and managed region information) that may be needed by the
DMAPI implementation. Most likely, when downloading a new event list for an object, the list will not
event, but only include events that require some action to be performed by the DM
event is the first indication given to a DM application that a primitive DMAPI
implementation is going to perform an operation on a file. The DM application can take this opportunity
to download all the necessary information for that particular file, or for other files as well. Alternately,
some DM applications may want to intercept the
event to prime primitive DMAPI
implementations, rather than having to receive many
The following functions for managing event lists on a file system objects are provided:
Specify the events, with the exception of the managed region events, to be generated for an
Get the list of events to be generated for an object.
Receiving and Responding to Events
Pending events can be received one at a time or in bulk. For synchronous events, a response to each
event message is required. For all events, the only valid response is an indication of whether the
operation should be continued or aborted. If the operation is to be aborted, an error can also be specified
that will be returned to the user process in the form of an
Event messages are variable length. This is because two of the primary fields of most event messages,
file handles and path names, are variable length. DM applications should use
determine the largest message size to size their buffers for calls to
information on accessing and manipulating variable length message buffers, see
Data Structures definitions in
The process that generated the event is blocked until the response is received by the DMAPI
implementation. The sleep may or may not be interruptible; the implementation of the DMAPI will need
to define the behavior for each synchronous event.
When a synchronous event message is generated, a token is part of the message. The token identifies the
event message, and may reference access rights that are conveyed as part of the event message. No
tokens are passed in an asynchronous messages.
When a DM application responds to a data
event message, the token may reference access rights. If a DM
application allows the operation to continue with the DM_RESP_CONTINUE return code, then special
care must be taken by the implementation of the DMAPI to allow the operation that caused the event
generation to continue without another DM application changing the state of the file.
Consider the following example:
Figure: Event Generation with No Rights
Event Generation with No Rights
the user process has initiated a write(2) operation in user space, shown as step 1. When
the application begins executing the Operating System code that performs the operation in the kernel, it
detects that it must generate a synchronous managed region
event. The event message is enqueued
on the session in step 2, and the user process is then awaited.
Figure: Requesting Access Rights after Event Generation
Requesting Access Rights after Event Generation
the event message has been enqueued on the session in step 2, and is delivered to DM application A
in step 3. Since the event message conveys no rights, DM application A must obtain
access rights to the object. In this example, it requires the DM_RIGHT_EXCL right, which it obtains in
step 4. At the same time, DM application B attempts to get exclusive access to the file in step 5. Since the
access right is not available, DM application B will wait.
Figure: Continuing an Event with Access Rights
Continuing an Event with Access Rights
DM application A has completed its processing in step 6 and continues the operation via a
with the DM_RESP_CONTINUE response code. At the point when the
function returns to the DM application (not explicitly shown, but it can be assumed to be a step 6a), the token
that referenced the access rights to the object is invalid. However, the DMAPI implementation cannot
immediately release the rights referenced by the token and grant them to someone else.
In step 7, the user process that caused the data
event to be generated is resumed by the Operating System, and
continues operation at the point at which the event was generated. Once the DMAPI implementation has
completed whatever event processing it deems necessary, and once it has acquired whatever locks it
needs to complete the rest of the write(2) operation, the access rights can be released. At this point,
DM application B can be allowed to obtain the DM_RIGHT_EXCL access right, in step 8.
DM applications are logical extensions of the file system.
When a DM application has completed
the servicing of an event, it should appear as though the
conditions that caused the event to be
generated no longer exist. From the standpoint of
the Operating System, it is as though the event
never occurred; whatever state that required the event
to be generated has been taken care of by the DM application.
In the example above, if DM application B were allowed to gain
exclusive access to the file, it could
possibly change the state of the file;
all the recently-completed work of DM application A would
then be void.
More importantly, the implementation of the DMAPI
would have no way to tell what state
the file is in, unless it monitored all the actions
of DM application B. It is also important to prevent the user
process from starvation. Therefore, the user process
should be allowed to continue its processing
after DM application A has completed the event servicing.
The following functions for receiving event messages and responding to synchronous messages are
Some DM applications may be multi-threaded (or made up of multiple processes). To facilitate the
processing of events between related processes, the DMAPI provides a method to move an outstanding
event message from one session to another. The event message remains in the outstanding state, even
though it is now enqueued on a different session.
The following function is provided:
If a DM application knows that it will take some significant period of
to process an event, the application can optionally
notify the DMAPI implementation. The implementation is free to use or ignore the information.
The following function is provided:
Notify the file system of a slow DM application operation.
When a destroy event occurs, a DM application may optionally receive one DM attribute value in the
event message by specifying to the DMAPI implementation which DM attribute name it wants to receive
at destroy time.
The following function is provided:
Pseudo events do not correspond to an event generated as a result of an operation in the operating
system, such as a write(2). They are created by the DM application for purposes of generating a
token or sending a message to a session. The actual message data is opaque to the DMAPI
implementation. For the format of the pseudo-event, see Pseudo Events
There is currently
only one type of pseudo event; the
As described in the Tokens
tokens are always associated with a synchronous event
message. To gain access to an object, a DM application must first create a message that contains the
context for a token. The required access right can then be obtained via
will create a synchronous event message of type
and enqueue it
on the indicated session. The message and its corresponding token are outstanding. From the standpoint
of the DMAPI, the message appears to have been delivered to a DM application
has not yet been responded to via
The message will continue to exist until
the DM application does a
with the token.
For purposes of recovery processing, intelligent DM applications can use the user-generated event
message mechanism to log their state during long and complicated operations. For example, if a DM
application requires exclusive access to a file, it first needs to create a synchronous message. It
puts together a user-level event message describing the operation, and then requests that a token be
generated and associated with this pseudo-event message. If the DM application aborts (via a bus error,
kill signal, etc.) before responding to the event, when it restarts, it can obtain the message and any
corresponding state. This can provide the application with valuable information about its state when it
User-created messages can also be used as a test mechanism, to ensure that communications between the
DMAPI implementation and a DM application are working correctly. Applications can use
to create a synchronous or asynchronous message and have it enqueued on any
specified session. The created message is also of type user, and contains the data specified by the user.
For synchronous messages, the function does not return until the message has been responded to.
Obviously, the process initiating the message via
must not also be responsible for
consuming the message via
or it will hang.
The following functions for creating a user level event message exist:
Generate a user pseudo-event message and return its token. The message is placed on the
session's outstanding event message queue.
Generate a user pseudo-event message and send it to the indicated session. The message is
placed on the session's undelivered message queue.
In order for a DM application to determine information about the underlying implementation of the
DMAPI, an interface exists to interrogate various implementation specific details. The function
is called on a per-file system basis.
Based on selected options in this function,
it will return information as listed in its man-page definition (see
Limited Backup and Restore Support
Many current vendor migration and backup applications require additional interfaces into the DMAPI in
order to fully support their functionality. To ease a vendor's transition to the DMAPI, a set of optional
DM interfaces may be provided. They consist of the following functions:
Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy
of this publication.