Previous section.
Systems Management: Data Storage Management (XDSM) API
Systems Management: Data Storage Management (XDSM) API
Copyright © 1997 The Open Group
Interfaces
Many of the interfaces described in this Chapter accept and return variable length data structures. For
example,
handles
are variable length.
Event messages are also
variable length. For detailed information on accessing individual elements within a data structure, refer
to the Data Structures definitions (see
Data Structures
).
Initialization
An application that needs to use the features and functionality of the DMAPI must first call an
initialization function. This allows implementations of the DMAPI to perform internal initialization
procedures before providing service to an application. The DMAPI specification allows undefined
behavior if applications do not use the initialization call. Another purpose of this function is to return a
DMAPI implementation specific version string which may be used to determine at run-time whether the
DM application is running on the correct implementation.
The following function exists for initializing the DMAPI:
- dm_init_service()
- perform implementation-defined initialization.
Handles
A
handle
is an opaque identifier for an entity manipulated by the DMAPI. There are three fundamental
categories of handles; the
global handle,
file system handles,
and
object handles.
The
global handle
is a fixed constant of the implementation, and is used primarily in the
dm_set_disp()
call when setting event disposition for the
mount
event. There is exactly one global
handle in any DMAPI implementation.
File system handles
are used by many DMAPI functions to identify a file system. There is one file
system handle per file system. They are persistent and unique over time (per host) to the extent that a
given file system instance has a unique and persistent identity.
Object handles
are the most common. These are the handles used to represent all types of file system
objects. Object handles are persistent and unique over time (per host) to the extent that a given file
system instance has a unique and persistent identity. They are governed by the following properties:
-
A unique object handle shall exist for each object visible in the file system name space.
Each object handle shall be unique within at least the context of a single host.
-
A unique object handle shall exist for each file system object not visible in the name
space, such as unlinked but still-open files.
-
Object handles must meet the following specific requirements with regard to uniqueness:
-
Object handles shall remain valid across any number of system reboots as long as
the corresponding file system object exists.
-
Object handles shall remain valid if the file object they identify is renamed within
a file system.
-
Object handles shall remain valid regardless of any operations on the file object
they identify. The only time an object handle becomes invalid is when the object
it identifies ceases to exist.
-
Object handles may be reused.
Some interfaces can only operate on object handles representing a specific type of file system object. For
example,
dm_write_invis()
can only be used to write to a regular file. When necessary to describe
restricted forms of object handles, the terms
file handle,
directory handle,
symlink handle,
etc. are used
in this document. The taxonomy of handle terminology is shown in the
following diagram.
Figure: Taxonomy of Handles
Handles are opaque; the length of a handle is implementation defined. DM applications should make no
assumptions about the length of a handle, as handles may be differing lengths even on the same file
system in some DMAPI implementations. Therefore, functions that use handles in their interfaces
specify two parameters: a
void *
that provides access to the actual handle, and a
size_t
that
specifies the length of the handle. The DMAPI implementation allocates space for handles via the
dm_path_to_handle(),
dm_fd_to_handle(),
and
dm_path_to_fshandle()
functions.
When a DM application is finished with the handle, it should free the space via the
dm_handle_free()
function.
Most DMAPI functions take a handle as part of their interface.
Many events also provide a handle as part
of their event-specific data.
To convert from path names and file descriptors to handles, a number of
functions are provided, as described in the man-pages.
They are outlined below:
-
dm_path_to_handle()
Create a file handle from a path name.
-
dm_fd_to_handle()
Create a file handle from a file descriptor.
-
dm_path_to_fshandle()
Create a file system handle from a path name.
-
dm_handle_cmp()
File handle comparison.
-
dm_handle_free()
Free the storage allocated for a handle.
-
dm_handle_is_valid()
Determine if a handle is valid.
-
dm_handle_hash()
Hashes the contents of a handle.
Some legacy DM applications rely on the fact
that the DMAPI object handles are built from the
combination of file system ID, file inode, and generation number.
Such applications may require the
capability to decompose DMAPI handles into these components
and to build handles from these components.
The following
optional
interfaces are provided for this purpose:
-
dm_make_handle()
Construct a DMAPI object handle.
-
dm_handle_to_fsid()
Extract the file system ID from a handle.
-
dm_handle_to_igen()
Extract the inode generation count from a handle.
-
dm_handle_to_ino()
Extract the file inode number ID from a handle.
-
dm_make_fshandle()
Construct a DMAPI file system handle.
Sessions
Sessions can be thought of as message queues. The implementation of
the DMAPI enqueues messages on a session to make them
available to a
DM application. A DM application can also request the
DMAPI implementation
to enqueue an application defined message on a
session. Sessions provide
a mechanism for a DM application to receive events.
A unique ID is associated with each session. A
session ID is of type
dm_sessid_t
and is used to identify the recipient of an event
message. Session IDs are
opaque to DM applications.
Sessions are also the cornerstone of the recovery mechanism.
Sessions are governed by the following restrictions:
-
Sessions are not persistent across reboots.
-
Sessions are unique for as long as the system is up; they are not unique across reboots.
-
Sessions must be explicitly destroyed. If a process exits or otherwise aborts without doing
a
dm_destroy_session(),
then the behavior of the session is dictated by the
constraints in the Sessions and Event Messages section, below.
-
A session is not tied to a process. Any process that has successfully executed
dm_init_service()
can use any valid session ID.
-
Session IDs should not be interpreted as file descriptors. The underlying implementation
may use file descriptors, but the DM application should make no assumptions about the
implementation.
Session Instantiation
A session must be created via
dm_create_session()
before a DM application can communicate
with the DMAPI. When a session is created, it is possible to specify a previous instantiation of a session
that will then be assumed (taken over), which is useful for recovery purposes. The
dm_create_session()
function is atomic; if the call succeeds, the DMAPI guarantees that all old
messages that were enqueued on the old session are now part of the new session. When assuming an
existing session the old session is invalid when the call returns.
To shut down and destroy a session, a DM application may have to perform a number of operations,
such as ensuring that no more events are generated, responding to outstanding messages, and so
forth. If a DM application attempts to destroy a session that has outstanding event messages still
enqueued, an error is returned. It is assumed that
dm_destroy_session()
will only be called
after the application has ensured that no more events will be generated on the session.
The following functions are provided for manipulating an instantiation of a session:
Sessions and Event Messages
At any time, a session may have synchronous event messages that are in one of two states:
-
enqueued, undelivered
-
delivered and awaiting a response from the DM application.
From the standpoint of the DMAPI implementation, synchronous event messages that are in the second
state (delivered and awaiting a response) are outstanding. Asynchronous messages do not require a
response from the DM application, and therefore will never be in the outstanding state.
In
Message States
,
there are three event messages on the session. Each event message is identified by a unique
token. One synchronous event message has been delivered to a DM application, and therefore the session
has an outstanding message. The event message continues to exist until some DM application
responds to it. The two other event messages on the session are just enqueued and have not yet been
delivered to a DM application.
Figure: Message States
As part of sending a synchronous event message, the implementation of the DMAPI may convey access
rights to one or more objects in the message. If a DM application fails (dies, hangs, or otherwise
malfunctions), a recovery process must determine the outstanding event messages and take care of the
associated events to prevent the system from hanging. Since tokens are tied to a session, and are always
associated with a synchronous event message, it is possible to obtain all outstanding event messages
simply by knowing all the tokens.
An active session is not needed to obtain the list of all valid sessions in the system. This allows a
recovery application to interrogate all sessions even in the unlikely event the system runs out of sessions.
Recovering after a DM application failure is very different from recovering from a system crash. The
requirements of each individual DM application will be unique with respect to recovery from a system
crash; it is beyond the scope of the DMAPI to provide all the tools for a DM application to recover itself
in this instance.
The following interfaces exist to manage session and event message recovery.
Tokens
Tokens are a reference to state associated with a synchronous event message. They are always associated
with one and only one synchronous event message. When responding to an event message, the same
token that was delivered with the message must be supplied. Tokens are the identifier that a DM
application must use to reference a synchronous event message; the DM application presents the token to
the DMAPI and in return, is provided with the state associated with the event message.
Like session IDs, tokens are opaque to DM applications. There is no security expressed or implied by the
possession or use of a token. If a DM application can "guess" the value of a token, then it can use it
(assuming that it can supply the appropriate session ID and has other system-dependent privileges).
Tokens have the following properties:
-
Tokens are not owned by any particular process.
-
The DMAPI does not mandate authentication or authorization of the process using the
token; if a process knows a token's value, it can use it.
-
Tokens are meaningful only within the session under which they were created unless a
session is assumed from a previous session.
Access Rights
There are two primary rights; DM_RIGHT_SHARED and DM_RIGHT_EXCL. The third access right,
DM_RIGHT_NULL, is not considered a primary access right, since it conveys no rights to an object.
Synchronous event messages contain access rights to one or more object handles. Some event messages
contain multiple file handles. The event message contains access rights to all the files in the event
message; the DM application must use
dm_query_right()
to determine what rights for the given
file handles, if any, are present in the message.
If a DM application needs to obtain access rights for more than one handle, it can use the same token in
repeated function calls to
dm_request_right()
and
dm_release_right().
It is not necessary
for a new message (and its corresponding token) to be created via
dm_create_userevent()
for
each handle the DM application needs to acquire access rights to.
As already noted, tokens do not belong to any particular process. An application presents a token to the
DMAPI to reference and identify a specific access right. When a DM application is informally described
as "holding a token" or "obtaining an access right", a more precise description would be that an
outstanding access right exists, is encapsulated within a synchronous event message, is associated with a
specific session, and is identified by a specific token.
Many DMAPI functions require a token that references a specific access right to an object. In some
cases, it may be advantageous for a DM application not to have to go through the steps of explicitly
creating a token and acquiring the necessary access rights just to call a DMAPI function. Therefore,
many functions accept either a token that references the required rights, or the special value
DM_NO_TOKEN, that indicates the absence of a token.
If a DM application does not pass a token to a DMAPI function that normally requires a token as one of
its parameters, then the function acquires the appropriate rights automatically on behalf of the DM
application. In this case, the DM application must be willing to be blocked. The DM application may or
may not be blocked interruptibly, depending on the implementation
of the DMAPI; see the man-page definition for
dm_request_right()
for more information.
The DM application must use caution when availing itself of this optimization. If a DM application
holds a token that references a right to an object, but fails to present it when calling a DMAPI function,
then the application is in danger of deadlocking with itself. This is because the DMAPI function will not
be able to acquire the necessary rights on behalf of the DM application since the application already
holds a token referencing those rights. The DM application should also not use this method of acquiring
access rights if it is receiving synchronous events via
dm_get_events().
Since one of the
synchronous event messages may contain a token that references an access right the DM application may
be trying to obtain, the application will again deadlock with itself.
The existence of any outstanding DM_RIGHT_SHARED access rights for a file system object will block
all attempts from all processes performing the following operations:
-
all data modification, such as via write(2)
-
object destruction.
The existence of the DM_RIGHT_EXCL access right will block all attempts to perform any operation on
the file system object, with the sole exception of the stat(2) family (
stat,
lstat,
fstat,
etc.).
The locking properties of access rights are summarized in the
following table.
Access Right
| Blocked Operations
|
---|
DM_RIGHT_SHARED
| data write, object destruction
|
DM_RIGHT_EXCL
| all but stat(2)
|
Table: Access Right Properties for Files
Notice that the above descriptions do not say that other processes are blocked; they say that all processes
are blocked. This is where the distinction that DM applications
do not really "own" access rights comes into play.
The only way a DM application can distinguish itself from other processes that should be blocked is by
knowing the
dm_token_t
value identifying the appropriate token, and passing it in with any
operations that are to be performed on the file. It follows from this
that once a DM application has "obtained" a DM_RIGHT_SHARED
or DM_RIGHT_EXCL access right, either directly via a
dm_request_right()
call or indirectly via an event message, the DM application must be
extremely cautious when performing operations on file system objects. Generally, it must restrict itself
to using interfaces containing
dm_token_t
parameters.
For example, calling
dm_request_right()
and requesting DM_RIGHT_EXCL does not make a
DM application the owner of the right;
dm_request_right()
merely creates the right and
encapsulates it in the synchronous event message referenced by the token. Once that happens, all
operations against the file system object will be blocked as described above, even if they come from the
same process that called
dm_request_right().
Only operations that are part of the DMAPI and
contain
dm_token_t
arguments are safe for DM application to call at this point, because those
interfaces are the only way DM application can distinguish themselves
as "owning" the DM_RIGHT_EXCL right.
Upgrading Access Rights
When requesting access rights to an object via
dm_request_right(),
the requested right may not
be immediately available. If the DM application has specified that it wants to block until the right
becomes available, the DM app may or may not be blocked interruptibly. The implementation of the
DMAPI will specify the semantics for interrupting blocked processes.
If a DM application holds a DM_RIGHT_SHARED access right, it can attempt to upgrade the right to a
DM_RIGHT_EXCL in a non-blocking manner via
dm_request_right().
If the DMAPI
implementation cannot grant the request, however, the DM application will most likely have to release
the DM_RIGHT_SHARED right, and request DM_RIGHT_EXCL access to the object via
dm_request_right()
in a blocking fashion.
A DM application may also request to upgrade a DM_RIGHT_SHARED access right to a
DM_RIGHT_EXCL in a non-blocking manner via
dm_upgrade_right()
if the DMAPI
implementation is able to upgrade the right without releasing the DM_RIGHT_SHARED access right.
The state of the object cannot change while the DM application is waiting for an exclusive right via
dm_upgrade_right().
However, the state of the file may change if the request to upgrade is via
dm_request_right().
To provide some indication that the file changed while the application was
blocked, the DMAPI provides the notion of a change indicator that can be interrogated via
dm_get_fileattr().
This change indicator is modified by any operation that modifies file data or
metadata. The change indicator is not persistent and has no meaning across reboots. Its only purpose is
to indicate to the DM application that the file may have changed since the last time the change indicator
was interrogated.
The normal sequence of events for attempting a lock upgrade where the current shared lock must be
dropped would be as follows:
-
Obtain current change indicator.
-
Release shared right.
-
Request exclusive right (blocking operation).
-
Obtain new change indicator to see if the file has changed.
The following functions for manipulating access rights are provided:
Placing Holds on Objects
If a DM application needs to make sure an object does not go away after releasing all access rights to the
object,
dm_obj_ref_hold()
may be called to obtain an object hold. The effect is to prevent the
object from being flushed out for the duration of the hold and essentially making non-persistent data
management attributes temportarily persistent. Responding to an event releases all holds associated
with the event.
The following functions are for manipulating object holds:
-
dm_obj_ref_hold()
Place a hold on a file system object.
-
dm_obj_ref_rele()
Release a hold on a file system object.
-
dm_obj_ref_query()
Query for a hold on a file system object.
Finding Extents and Punching Holes
Data Management applications often need to release the on-disk blocks of a file to free up space on a file
system. Likewise, if a large, but sparsely populated file is to be backed up efficiently, a DM application
needs to know where the file has non-null data and where the file has holes. These operations may not be
supported on all file system types;
dm_get_config()
can be used to determine if the underlying file
system supports punching holes.
The DM application is responsible for maintaining accurate information about the location of any
holes in the original file when a sparse file is made non-resident. It is assumed that the DM application
will call
dm_get_allocinfo()
to determine where actual storage is located, and only perform
dm_read_invis()
operations on the portions of the file that contain data.
The following functions return information about a file in terms of a
dm_extent
structure, as defined
in the Data Structures chapter (see
Data Structures
).
These functions, which do not affect any of a file's time stamps, are
provided for managing the storage space for a file:
-
dm_get_allocinfo()
Return the allocation information for the file specified by the handle.
-
dm_probe_hole()
Interrogate the DMAPI implementation for size and offset around the area that the DM
applications want to punch a hole.
-
dm_punch_hole()
Logically write zeroes in the indicated region of the file identified by the handle, thereby
allowing the DMAPI implementation to release media resources associated with that region.
None of the file's time stamps are updated, but the file's DMAPI
change indicator is updated.
Invisible Read and Write
Many data management applications must be able to access file data without altering the file's access,
modification, and change times, and without generating any events. The operations in this section do not
trigger events; they bypass the normal event delivery mechanism to prevent a DM application from
receiving events generated by itself.
The invisible write function by default writes data asynchronously. If a DM application requires that data
written to a file be flushed at certain times, it can either set a flag specifying that writes happen
synchronously or it can call a separate function to flush the file's contents to media.
The following functions, which do not affect any of a file's time stamps, are provided:
-
dm_read_invis()
Do a read without updating any of the file's time stamps.
-
dm_write_invis()
Do a write without updating any of the file's time stamps. The DMAPI change indicator is
updated. This function can execute synchronously or asynchronously.
-
dm_sync_by_handle()
Synchronize a file's in-memory state with that on physical medium.
Managed Regions
Managed regions provide a mechanism for a data management application to control a specific region of
a file. Managed regions provide granularity finer than the entire file for data events such as read and
write. Their use is particularly important for very large files that may be larger than the actual amount of
available disk space.
A single managed region is represented by a
dm_region
structure. The set of managed regions for a
file is a collection of these structures. See Data Structures
Data Structures
for a definition of this structure.
The generation of events for a managed region
is controlled by a flags field in the
dm_region
structure. The possible values for this field are a bitwise OR of one or more of the following:
- DM_REGION_READ
Generate a synchronous event for a read operation that
overlaps this managed region.
- DM_REGION_WRITE
Generate a synchronous event for a write operation that
overlaps this managed region.
- DM_REGION_TRUNCATE
- Generate a synchronous event for a truncate operation that
overlaps this managed region.
or the following value:
- DM_REGION_NOEVENT
Do not generate any events for this managed region.
The events defined above are the only synchronous data events that are defined for a managed region.
Only one of the above events will be produced for a particular
read/write/truncate
operation, no matter
how many managed regions the operation may overlap.
The example in
Overlapping of Events across Managed Regions
below shows a read operation that overlaps two managed regions that have read events set.
Figure: Overlapping of Events across Managed Regions
In
Overlapping of Events across Managed Regions
,
a read event is produced for Managed Region A. The arguments passed to the DM
application in the event message have the offset and length of the
read
operation; it is up to the
DM application to determine which managed regions the operation will overlap. Once the DM
application responds to the event message, the DMAPI implementation allows the read to continue.
As an example, if a DM application fills the managed region A above, but not B, and continues the
operation, the behavior of the entire read operation is undefined.
Rationale:
-
-
Triggering one event per file operation eliminates the necessity
of having the DMAPI implementation re-evaluate all managed regions
involved in a given operation. Otherwise, the DMAPI implementation
could be forced to generate multiple events per managed region
for a single I/O operation.
To change the set of managed regions, the DM application must obtain DM_RIGHT_EXCL rights to the
object. Since managed regions may or may not be persistent, the DM application must be prepared to
expect a debut event and to use
dm_set_region()
to download the set of managed regions for a file.
Managed regions may be constrained by the following restrictions:
-
Implementations may choose to support only one managed region per file, which may
always be the entire file.
-
Managed regions may not overlap. Each region is a distinct subset of the file.
-
Only regular files may be partitioned into multiple managed regions.
A DM application can determine the properties of the DMAPI managed region implementation by
consulting the
dm_get_config()
interface.
The following functions, which do not affect any of a file's time stamps, are provided for manipulating
the managed regions of a file:
-
dm_get_region()
Return the set of managed regions for a file.
-
dm_set_region()
Set the managed regions for a file. The DMAPI change indicator is updated.
File Attributes and Bulk Retrieval
Attributes need to be retrieved for a single file, a directory, or a whole file system. The attributes
returned are defined by the
dm_stat
structure. There are a number of methods for obtaining these file
attributes:
-
The
dm_get_fileattr()
function obtains the attributes for a single file specified by
the file's handle.
-
The attributes and names for all files in a directory can be obtained through use of the
dm_get_dirattrs()
function.
-
The basic attributes for all files in a file system can be obtained
through use of the
dm_get_bulkattr()
function.
-
The basic attributes, plus a named DM attribute, for
all files in a file system can be obtained through the
use of the
dm_get_bulkall()
function.
For the second, third, and fourth methods, the application either provides a buffer large enough to
contain all retrieved attributes or more commonly (particularly for the last option) the application makes
iterative calls through the interface. A file system must be mounted to have its attributes retrieved via
any of the above methods.
DM applications often need to set a file's metadata to specific values transparently. For example, a
backup application might want to set a file's time stamps to their original value when the file is restored.
Specific fields from the
dm_stat
structure are encapsulated in the
dm_fileattr
struct; this
structure is used to set various metadata fields to specific values via
dm_set_fileattr().
Before calling
dm_get_bulkattr(),
dm_get_dirattrs(),
and
dm_get_bulkall(),
the
DM application must initialize an opaque "cookie" which
provides location information to the DMAPI. Each
call of
dm_get_bulkattr(),
dm_get_dirattrs()
or
dm_get_bulkall()
can use this
cookie to determine location information from one call to the next.
The file's change indicator can also be retrieved using
dm_get_fileattr().
This change indicator
is modified by any operation that modifies file data or metadata. DM applications can use the change
indicator to determine if a file may have changed state; if the indicator is the same between two calls, the
file is guaranteed not to have changed. If the indicator is different, the file may (but not necessarily) have
changed. This is especially useful for attempting lock upgrades, as
described in Upgrading Access Rights,
Upgrading Access Rights
.
The following functions, which do not update any of an object's time stamps, are provided for obtaining
bulk attributes:
-
dm_get_bulkall()
Get the specified attributes in bulk for objects in a the given file system with a specific DM
attribute.
-
dm_get_bulkattr()
Get the specified attributes in bulk for the given file system.
-
dm_get_dirattrs()
Get the specified file attributes and names in bulk for the given directory.
-
dm_init_attrloc()
Initialize the location cookie for successive
dm_get_bulkattr()
calls.
The following function, which does not update any of a file's time stamps, is provided for obtaining the
attributes of a single file:
The following function, which does not update any of the file's time stamps (other than those specified)
as a side effect, is provided for metadata modification:
Data Management Attributes
Support for
persistent
data management attributes is a DMAPI implementation option. Some DMAPI
implementations may not support persistent opaque data management attributes, while others may not
provide support for persistent non-opaque attributes such as event lists. DM applications should use the
dm_get_config()
function to determine what the implementation provides.
A persistent attribute is one which stays defined across reboots. A non-persistent attribute is one that
may disappear at any time without notice (typically during inode flush). For more information on how to
manage non-persistent attributes, refer to the debut event.
Non-opaque Data Management Attributes
There are two types of non-opaque attributes:
-
Managed Regions
The DMAPI implementation may support persistence of
managed regions. The
dm_get_config()
function returns the number of persistent managed regions supported.
-
Event Bit Masks
Event bit masks encode which events are enabled for a
particular file within a finite number of persistent bits.
Opaque Data Management Attributes
The DMAPI persistent opaque attribute mechanism provides a set of (name, value) pairs associated with
a file system object. The name is a fixed length 8 byte (defined as DM_ATTR_NAME_SIZE) opaque
value determined by the DM application and is interpreted as a byte sequence. Attribute names starting
with ASCII "_" (0x5F) are reserved for future common attribute labels. In order to prevent name clashes,
the first three bytes of the attribute name are currently assigned through a reservation process.
The prefix should identify the company whose DM
product is using the attribute, for example, Cheyenne has "CYE" reserved.
-
-
To register a 3-byte prefix, send e-mail to xdsmreg@opengroup.org, identifying
the company name and the requested name.
Registered prefixes can be checked
on the World-Wide Web at the following location:
http://www.opengroup.org/public/tech/sysman/xdsmreg.htm
The attribute value is variable length and also opaque. It is recommended that the values be stored in
network byte order to support the movement of media between architectures. These attributes are
persistent across reboots.
If the DM implementation supports opaque attributes, a limited number of attributes may be stored
persistently with each file. Each attribute may store up to DM_CONFIG_MAX_ATTRIBUTE_SIZE
bytes of data per file. The value of DM_CONFIG_MAX_ATTRIBUTE_SIZE is obtained via
dm_get_config()
and has a lower bound of 32 bytes. The total amount of space available for
storage of all persistent attributes on a file system is bounded by
DM_CONFIG_TOTAL_ATTRIBUTE_SPACE.
Associated with the file attributes is a per-file time
stamp called
dtime,
which is updated when attributes
are created, modified, or deleted, or when a new file
inherits its attributes from the parent directory.
The
dtime
time stamp may be the same as
ctime
as determined by the value returned from the
dm_get_config()
function with DM_CONFIG_DTIME_OVERLOADED.
If
dtime
is not overloaded, then any
operation that manipulates attributes does not modify the file's traditional
time stamps (
atime,
mtime,
ctime).
If DM_CONFIG_PERS_INHERIT_ATTRIBS (obtainable from
dm_get_config())
is DM_TRUE,
DM applications can mark persistent attributes as inheritable. If a directory has an attribute (such as
lock_on_magnetic)
that has been marked inheritable and a file is created in the directory, then the file
would inherit the attribute. Attributes that are not marked inheritable are not copied.
DM applications mark an attribute inheritable on a per-file system basis and for specified file types. For
example, a DM application could mark the above attribute (lock_on_magnetic) inheritable for newly
created regular files only. Newly-created directories would not inherit the attribute.
Attribute inheritance is not persistent across reboots. If a DM application marked the
lock_on_magnetic
attribute as inheritable and the system were then brought down, the attribute would no longer be
inheritable when the system came back up.
The following functions are provided for attribute management:
The following functions are provided for managing inheritable attributes:
-
dm_set_inherit()
Mark an attribute as inheritable on a file system.
-
dm_clear_inherit()
Mark an attribute as no longer being inheritable.
-
dm_getall_inherit()
Get all the attributes that have been marked inheritable on a file system. This is especially
useful for application restart after a failure.
Events
The DMAPI provides DM applications with the ability to monitor and manage the data in a file system
without having to export all the file system semantics from kernel space to user space via the event
interface. Events are generated by a DMAPI implementation, and then the messages are enqueued on a
session for delivery to a DM application.
The intent of the DMAPI is to support a single product on
any single file system. The DMAPI does not
preclude different products from different vendors operating on the same file system, but it is not
recommended. Different products on different file systems are fully supported by the DMAPI with
regard to event delivery.
Therefore, the following event restrictions exist:
-
Multiple sessions cannot register disposition for the same event on the same object.
-
Event messages are targeted to and enqueued on sessions; there is no explicit targeting of
an event to a specific process.
-
The behavior of event delivery when no session has requested to receive a particular event
(that is,
dm_set_disp()
with the given event has not been executed) is DMAPI
implementation-specific. The DMAPI implementation must document the behavior of the
system and has one of these three choices:
-
block the process that caused the event to be fired
-
fail the operation
-
not fire the event and allow the process to proceed as if there is no event
disposition.
Certain events are optional in the DMAPI specification. It is recommended that for each file system
being managed by a DM application, that the application initially call
dm_get_config_events()
to determine which events are supported by the DMAPI implementation for that file system.
Setting Event Disposition
After creating a session, DM applications must register with the DMAPI to establish the disposition of
events for a file system (that is, what session the events will be sent to). The event list is the complete set of
all events, including managed region events, that the DM application is monitoring during the life
of the session. Since registration is on a per-session basis, this event list is not persistent across reboots.
It is not possible to register to receive events on anything other than the file system object.
Once a DM application has registered its event list and session with the DMAPI, it can begin receiving
event messages on a file system. Registration can be thought of as establishing the association between a
file system and a session, as it lets the DMAPI implementation know which session to send specific
event messages to.
The example shown in
Disposition of Event Delivery
illustrates the case where a DM application has registered with the file
system represented by "foo" for the
read
and
write
events. The event messages are delivered to
the application via session 42. The file
bar
has an event list of
read,
write,
and
truncate
that
was previously set via
dm_set_region().
Figure: Disposition of Event Delivery
In
Disposition of Event Delivery
,
the
read
event is delivered to DM application 1, since that is the session for that specific event.
Multiple applications can register their session and event list for a file system. If two applications
attempt to register to receive the same event, the last application to register for the event will receive it;
prior registrations for the event are replaced.
Rationale:
-
-
If this were not the case, and replacement were done on an entire
event list, not a per-event basis, then it would not be possible
to have more than one active session registered for a file system.
Having each event in the event list handled individually allows
multiple applications to be active on the same file system
simultaneously, all handling different events.
Duplicate Event Registrations on a File System
illustrates how
Disposition of Event Delivery
would change if a second DM application registered for
just the
read
events.
Figure: Duplicate Event Registrations on a File System
In
Duplicate Event Registrations on a File System
,
read
events are now sent to DM application 2, via session 69,
not DM application 1.
write
events will still be delivered to DM application 1.
Rationale:
-
-
The burden is on the system administrator to ensure that two
different DM applications do not attempt to control the same
events on the same file system. In Figure 5, an alternative
implementation of
dm_set_disp()
would be to return an error
saying that an <event, file system, session> binding already exists.
Another option would be to send a special event to
DM app one, informing it that it no longer will be receiving
read
events. While these options could be implemented,
it is believed that the level of complexity is not warranted
for this version of the DMAPI.
The examples given above assume that the file system the DM application is monitoring is already
mounted. However, it is quite possible that a DM application wants to set itself up to monitor a file
system that is not yet mounted.
The "mount" Event
The restriction of only sending synchronous events to one session has special ramifications with regard
to the
mount
event. It is not the intent of the DMAPI to force a model
of one "super-daemon" that
listens for mount events, and then forwards the event to the appropriate recipient. However, there is a
special bootstrap problem with regard to receiving the
mount
event before a file system handle is
available. To receive
mount
events, a DM application must use the global handle in the
dm_set_disp()
function. The
mount
event will be sent serially to each session that has executed
dm_set_disp().
The event is not broadcast to all sessions concurrently. The order in which the
DMAPI implementation sends the event to the sessions is not defined.
The
mount
event will be sent for all file systems that support the DMAPI. Specifying the event in the
dm_set_eventlist()
function is not allowed, since the event is not persistent. When the
mount
event is received, the DM application can determine if it is interested in the file system that is specified
in the event message. If a DM application is not interested in the file system, then it must respond to the
event via
dm_respond_event()
with a code of DM_RESP_DONTCARE. The first DM application
that responds to the event with DM_RESP_CONTINUE and an error code of zero
prevents the event
from being sent to any of the remaining sessions. If any DM application returns an error
[DM_RESP_ABORT], then the
mount
event will not be sent to any other session.
Figure: Mount Event Propagation
In
Mount Event Propagation
,
3 DM applications have specified via
dm_set_disp()
that they want to receive the
mount
event. The DMAPI implementation sends the
mount
event message to DM application A in step
1, which is not interested in the event, so it responds to the event message with DM_RESP_DONTCARE
in step 2. The DMAPI implementation then sends the
mount
event message to DM application B in step 3,
which determines that it wants to monitor the file system. It responds to the event message with a
DM_RESP_CONTINUE in step 4, so the
mount
event is not sent to the remaining DM application C.
If all of the DM applications receiving
mount
events return DM_RESP_DONTCARE, then the file
system
mount
proceeds normally.
For recovery processing, many DM applications will need the name of the file system device and the
directory that it was mounted at. This information is made available via the
mount
event. During
application restart, an application can get the same information via
dm_get_mountinfo().
A DM application would determine all the file systems
that were being monitored via
dm_getall_disp(),
and
then use
dm_get_mountinfo()
to obtain more information about the file systems.
The following functions are provided for manipulating the disposition of a session's events for a file
system:
-
dm_set_disp()
Set the disposition of a session's events on a file system.
-
dm_getall_disp()
Get the disposition of events for all file systems for a session.
-
dm_get_mountinfo()
Get the information that was delivered on a
mount
event for the indicated file system.
Setting Event Notification
DM applications can specify that they need to receive certain events on an object. Events will only be
generated for these objects, not for all objects
in the file system (except for the
debut
event, discussed specifically later in this section).
To set event notification on a object, the DM application must specify an event list for the object.
This object is specified via a handle. The handle can be either the file system handle when setting events
on a per file system basis, or a handle to a specific file system object. Executing
dm_set_eventlist()
may or may not persistently store the eventlist with the object; it is
dependent on the particular implementation of the DMAPI. The persistence characteristic can be
determined via the
dm_get_config()
function.
The DM application must specify the entire list of events that is to be generated for the object. If an
event list already exists for the object, it is replaced by the new one specified in the
dm_set_eventlist()
function.
If an event list was previously set for the entire
file system, and a subsequent event list for an object
in that file system includes an event that was set for
the file system (or vice versa), the result is undefined.
All events, with the exception of the managed region events and the
mount
event, can be
specified in the
dm_set_eventlist()
function. If the object has multiple managed regions, then
dm_get_eventlist()
returns the union of all managed region events, in addition to the other
events.
When an event is generated by the file system, the DMAPI implementation uses the session to determine
the recipient. Since DM applications must register with the DMAPI via the
dm_set_disp()
to
specify the event list and the session, the DMAPI can easily determine the target session for any given
event.
Some implementations of the DMAPI may not provide any persistent storage, even for event
notification. For these "zero bit" implementations, the DMAPI provides a
debut
event before any
access is granted to the object. This
debut
event should be specified in the event list when the DM
application sets its event disposition. The
debut
event gives the DM application the ability to
download information (such as event lists and managed region information) that may be needed by the
DMAPI implementation. Most likely, when downloading a new event list for an object, the list will not
include the
debut
event, but only include events that require some action to be performed by the DM
application.
The
debut
event is the first indication given to a DM application that a primitive DMAPI
implementation is going to perform an operation on a file. The DM application can take this opportunity
to download all the necessary information for that particular file, or for other files as well. Alternately,
some DM applications may want to intercept the
mount
event to prime primitive DMAPI
implementations, rather than having to receive many
debut
events.
The following functions for managing event lists on a file system objects are provided:
-
dm_set_eventlist()
Specify the events, with the exception of the managed region events, to be generated for an
object.
-
dm_get_eventlist()
Get the list of events to be generated for an object.
Receiving and Responding to Events
Pending events can be received one at a time or in bulk. For synchronous events, a response to each
event message is required. For all events, the only valid response is an indication of whether the
operation should be continued or aborted. If the operation is to be aborted, an error can also be specified
that will be returned to the user process in the form of an
errno.
Event messages are variable length. This is because two of the primary fields of most event messages,
file handles and path names, are variable length. DM applications should use
dm_get_config()
to
determine the largest message size to size their buffers for calls to
dm_get_events().
For more
information on accessing and manipulating variable length message buffers, see
Data Structures definitions in
Data Structures
.
The process that generated the event is blocked until the response is received by the DMAPI
implementation. The sleep may or may not be interruptible; the implementation of the DMAPI will need
to define the behavior for each synchronous event.
When a synchronous event message is generated, a token is part of the message. The token identifies the
event message, and may reference access rights that are conveyed as part of the event message. No
tokens are passed in an asynchronous messages.
When a DM application responds to a data
event message, the token may reference access rights. If a DM
application allows the operation to continue with the DM_RESP_CONTINUE return code, then special
care must be taken by the implementation of the DMAPI to allow the operation that caused the event
generation to continue without another DM application changing the state of the file.
Consider the following example:
Figure: Event Generation with No Rights
In
Event Generation with No Rights
.
the user process has initiated a write(2) operation in user space, shown as step 1. When
the application begins executing the Operating System code that performs the operation in the kernel, it
detects that it must generate a synchronous managed region
write
event. The event message is enqueued
on the session in step 2, and the user process is then awaited.
Figure: Requesting Access Rights after Event Generation
In
Requesting Access Rights after Event Generation
,
the event message has been enqueued on the session in step 2, and is delivered to DM application A
via
dm_get_events()
in step 3. Since the event message conveys no rights, DM application A must obtain
access rights to the object. In this example, it requires the DM_RIGHT_EXCL right, which it obtains in
step 4. At the same time, DM application B attempts to get exclusive access to the file in step 5. Since the
access right is not available, DM application B will wait.
Figure: Continuing an Event with Access Rights
In
Continuing an Event with Access Rights
,
DM application A has completed its processing in step 6 and continues the operation via a
dm_respond_event()
with the DM_RESP_CONTINUE response code. At the point when the
function returns to the DM application (not explicitly shown, but it can be assumed to be a step 6a), the token
that referenced the access rights to the object is invalid. However, the DMAPI implementation cannot
immediately release the rights referenced by the token and grant them to someone else.
In step 7, the user process that caused the data
event to be generated is resumed by the Operating System, and
continues operation at the point at which the event was generated. Once the DMAPI implementation has
completed whatever event processing it deems necessary, and once it has acquired whatever locks it
needs to complete the rest of the write(2) operation, the access rights can be released. At this point,
DM application B can be allowed to obtain the DM_RIGHT_EXCL access right, in step 8.
Rationale:
-
-
DM applications are logical extensions of the file system.
When a DM application has completed
the servicing of an event, it should appear as though the
conditions that caused the event to be
generated no longer exist. From the standpoint of
the Operating System, it is as though the event
never occurred; whatever state that required the event
to be generated has been taken care of by the DM application.
In the example above, if DM application B were allowed to gain
exclusive access to the file, it could
possibly change the state of the file;
all the recently-completed work of DM application A would
then be void.
More importantly, the implementation of the DMAPI
would have no way to tell what state
the file is in, unless it monitored all the actions
of DM application B. It is also important to prevent the user
process from starvation. Therefore, the user process
should be allowed to continue its processing
after DM application A has completed the event servicing.
The following functions for receiving event messages and responding to synchronous messages are
provided:
Some DM applications may be multi-threaded (or made up of multiple processes). To facilitate the
processing of events between related processes, the DMAPI provides a method to move an outstanding
event message from one session to another. The event message remains in the outstanding state, even
though it is now enqueued on a different session.
The following function is provided:
If a DM application knows that it will take some significant period of
time
to process an event, the application can optionally
notify the DMAPI implementation. The implementation is free to use or ignore the information.
The following function is provided:
-
dm_pending()
Notify the file system of a slow DM application operation.
When a destroy event occurs, a DM application may optionally receive one DM attribute value in the
event message by specifying to the DMAPI implementation which DM attribute name it wants to receive
at destroy time.
The following function is provided:
Pseudo Events
Pseudo events do not correspond to an event generated as a result of an operation in the operating
system, such as a write(2). They are created by the DM application for purposes of generating a
token or sending a message to a session. The actual message data is opaque to the DMAPI
implementation. For the format of the pseudo-event, see Pseudo Events
Pseudo Events
.
There is currently
only one type of pseudo event; the
user
event.
As described in the Tokens
Tokens
,
tokens are always associated with a synchronous event
message. To gain access to an object, a DM application must first create a message that contains the
context for a token. The required access right can then be obtained via
dm_request_right().
dm_create_userevent()
will create a synchronous event message of type
user
and enqueue it
on the indicated session. The message and its corresponding token are outstanding. From the standpoint
of the DMAPI, the message appears to have been delivered to a DM application
via
dm_get_events(),
but
has not yet been responded to via
dm_respond_event().
The message will continue to exist until
the DM application does a
dm_respond_event()
with the token.
For purposes of recovery processing, intelligent DM applications can use the user-generated event
message mechanism to log their state during long and complicated operations. For example, if a DM
application requires exclusive access to a file, it first needs to create a synchronous message. It
puts together a user-level event message describing the operation, and then requests that a token be
generated and associated with this pseudo-event message. If the DM application aborts (via a bus error,
kill signal, etc.) before responding to the event, when it restarts, it can obtain the message and any
corresponding state. This can provide the application with valuable information about its state when it
aborted.
User-created messages can also be used as a test mechanism, to ensure that communications between the
DMAPI implementation and a DM application are working correctly. Applications can use
dm_send_msg()
to create a synchronous or asynchronous message and have it enqueued on any
specified session. The created message is also of type user, and contains the data specified by the user.
For synchronous messages, the function does not return until the message has been responded to.
Obviously, the process initiating the message via
dm_send_msg()
must not also be responsible for
consuming the message via
dm_get_events(),
or it will hang.
The following functions for creating a user level event message exist:
-
dm_create_userevent()
Generate a user pseudo-event message and return its token. The message is placed on the
session's outstanding event message queue.
-
dm_send_msg()
Generate a user pseudo-event message and send it to the indicated session. The message is
placed on the session's undelivered message queue.
Configuration Information
In order for a DM application to determine information about the underlying implementation of the
DMAPI, an interface exists to interrogate various implementation specific details. The function
dm_get_config()
is called on a per-file system basis.
Based on selected options in this function,
it will return information as listed in its man-page definition (see
dm_get_config
).
Limited Backup and Restore Support
Many current vendor migration and backup applications require additional interfaces into the DMAPI in
order to fully support their functionality. To ease a vendor's transition to the DMAPI, a set of optional
DM interfaces may be provided. They consist of the following functions:
Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy
of this publication.