Previous section.

Systems Management: Data Storage Management (XDSM) API

Systems Management: Data Storage Management (XDSM) API
Copyright © 1997 The Open Group

Interfaces

Many of the interfaces described in this Chapter accept and return variable length data structures. For example, handles are variable length. Event messages are also variable length. For detailed information on accessing individual elements within a data structure, refer to the Data Structures definitions (see Data Structures ).

Initialization

An application that needs to use the features and functionality of the DMAPI must first call an initialization function. This allows implementations of the DMAPI to perform internal initialization procedures before providing service to an application. The DMAPI specification allows undefined behavior if applications do not use the initialization call. Another purpose of this function is to return a DMAPI implementation specific version string which may be used to determine at run-time whether the DM application is running on the correct implementation.

The following function exists for initializing the DMAPI:

dm_init_service()
perform implementation-defined initialization.

Handles

A handle is an opaque identifier for an entity manipulated by the DMAPI. There are three fundamental categories of handles; the global handle, file system handles, and object handles.

The global handle is a fixed constant of the implementation, and is used primarily in the dm_set_disp() call when setting event disposition for the mount event. There is exactly one global handle in any DMAPI implementation.

File system handles are used by many DMAPI functions to identify a file system. There is one file system handle per file system. They are persistent and unique over time (per host) to the extent that a given file system instance has a unique and persistent identity.

Object handles are the most common. These are the handles used to represent all types of file system objects. Object handles are persistent and unique over time (per host) to the extent that a given file system instance has a unique and persistent identity. They are governed by the following properties:

Some interfaces can only operate on object handles representing a specific type of file system object. For example, dm_write_invis() can only be used to write to a regular file. When necessary to describe restricted forms of object handles, the terms file handle, directory handle, symlink handle, etc. are used in this document. The taxonomy of handle terminology is shown in the following diagram.

Figure: Taxonomy of Handles

Handles are opaque; the length of a handle is implementation defined. DM applications should make no assumptions about the length of a handle, as handles may be differing lengths even on the same file system in some DMAPI implementations. Therefore, functions that use handles in their interfaces specify two parameters: a void * that provides access to the actual handle, and a size_t that specifies the length of the handle. The DMAPI implementation allocates space for handles via the dm_path_to_handle(), dm_fd_to_handle(), and dm_path_to_fshandle() functions.

When a DM application is finished with the handle, it should free the space via the dm_handle_free() function.

Most DMAPI functions take a handle as part of their interface. Many events also provide a handle as part of their event-specific data. To convert from path names and file descriptors to handles, a number of functions are provided, as described in the man-pages. They are outlined below:

Some legacy DM applications rely on the fact that the DMAPI object handles are built from the combination of file system ID, file inode, and generation number. Such applications may require the capability to decompose DMAPI handles into these components and to build handles from these components.

The following optional interfaces are provided for this purpose:

Sessions

Sessions can be thought of as message queues. The implementation of the DMAPI enqueues messages on a session to make them available to a DM application. A DM application can also request the DMAPI implementation to enqueue an application defined message on a session. Sessions provide a mechanism for a DM application to receive events.

A unique ID is associated with each session. A session ID is of type dm_sessid_t and is used to identify the recipient of an event message. Session IDs are opaque to DM applications. Sessions are also the cornerstone of the recovery mechanism.

Sessions are governed by the following restrictions:

Session Instantiation

A session must be created via dm_create_session() before a DM application can communicate with the DMAPI. When a session is created, it is possible to specify a previous instantiation of a session that will then be assumed (taken over), which is useful for recovery purposes. The dm_create_session() function is atomic; if the call succeeds, the DMAPI guarantees that all old messages that were enqueued on the old session are now part of the new session. When assuming an existing session the old session is invalid when the call returns.

To shut down and destroy a session, a DM application may have to perform a number of operations, such as ensuring that no more events are generated, responding to outstanding messages, and so forth. If a DM application attempts to destroy a session that has outstanding event messages still enqueued, an error is returned. It is assumed that dm_destroy_session() will only be called after the application has ensured that no more events will be generated on the session.

The following functions are provided for manipulating an instantiation of a session:

Sessions and Event Messages

At any time, a session may have synchronous event messages that are in one of two states:

From the standpoint of the DMAPI implementation, synchronous event messages that are in the second state (delivered and awaiting a response) are outstanding. Asynchronous messages do not require a response from the DM application, and therefore will never be in the outstanding state.

In Message States , there are three event messages on the session. Each event message is identified by a unique token. One synchronous event message has been delivered to a DM application, and therefore the session has an outstanding message. The event message continues to exist until some DM application responds to it. The two other event messages on the session are just enqueued and have not yet been delivered to a DM application.

Figure: Message States

As part of sending a synchronous event message, the implementation of the DMAPI may convey access rights to one or more objects in the message. If a DM application fails (dies, hangs, or otherwise malfunctions), a recovery process must determine the outstanding event messages and take care of the associated events to prevent the system from hanging. Since tokens are tied to a session, and are always associated with a synchronous event message, it is possible to obtain all outstanding event messages simply by knowing all the tokens.

An active session is not needed to obtain the list of all valid sessions in the system. This allows a recovery application to interrogate all sessions even in the unlikely event the system runs out of sessions.

Recovering after a DM application failure is very different from recovering from a system crash. The requirements of each individual DM application will be unique with respect to recovery from a system crash; it is beyond the scope of the DMAPI to provide all the tools for a DM application to recover itself in this instance.

The following interfaces exist to manage session and event message recovery.

Tokens

Tokens are a reference to state associated with a synchronous event message. They are always associated with one and only one synchronous event message. When responding to an event message, the same token that was delivered with the message must be supplied. Tokens are the identifier that a DM application must use to reference a synchronous event message; the DM application presents the token to the DMAPI and in return, is provided with the state associated with the event message.

Like session IDs, tokens are opaque to DM applications. There is no security expressed or implied by the possession or use of a token. If a DM application can "guess" the value of a token, then it can use it (assuming that it can supply the appropriate session ID and has other system-dependent privileges).

Tokens have the following properties:

Access Rights

There are two primary rights; DM_RIGHT_SHARED and DM_RIGHT_EXCL. The third access right, DM_RIGHT_NULL, is not considered a primary access right, since it conveys no rights to an object.

Synchronous event messages contain access rights to one or more object handles. Some event messages contain multiple file handles. The event message contains access rights to all the files in the event message; the DM application must use dm_query_right() to determine what rights for the given file handles, if any, are present in the message.

If a DM application needs to obtain access rights for more than one handle, it can use the same token in repeated function calls to dm_request_right() and dm_release_right(). It is not necessary for a new message (and its corresponding token) to be created via dm_create_userevent() for each handle the DM application needs to acquire access rights to.

As already noted, tokens do not belong to any particular process. An application presents a token to the DMAPI to reference and identify a specific access right. When a DM application is informally described as "holding a token" or "obtaining an access right", a more precise description would be that an outstanding access right exists, is encapsulated within a synchronous event message, is associated with a specific session, and is identified by a specific token.

Many DMAPI functions require a token that references a specific access right to an object. In some cases, it may be advantageous for a DM application not to have to go through the steps of explicitly creating a token and acquiring the necessary access rights just to call a DMAPI function. Therefore, many functions accept either a token that references the required rights, or the special value DM_NO_TOKEN, that indicates the absence of a token.

If a DM application does not pass a token to a DMAPI function that normally requires a token as one of its parameters, then the function acquires the appropriate rights automatically on behalf of the DM application. In this case, the DM application must be willing to be blocked. The DM application may or may not be blocked interruptibly, depending on the implementation of the DMAPI; see the man-page definition for dm_request_right() for more information.

The DM application must use caution when availing itself of this optimization. If a DM application holds a token that references a right to an object, but fails to present it when calling a DMAPI function, then the application is in danger of deadlocking with itself. This is because the DMAPI function will not be able to acquire the necessary rights on behalf of the DM application since the application already holds a token referencing those rights. The DM application should also not use this method of acquiring access rights if it is receiving synchronous events via dm_get_events(). Since one of the synchronous event messages may contain a token that references an access right the DM application may be trying to obtain, the application will again deadlock with itself.

The existence of any outstanding DM_RIGHT_SHARED access rights for a file system object will block all attempts from all processes performing the following operations:

The existence of the DM_RIGHT_EXCL access right will block all attempts to perform any operation on the file system object, with the sole exception of the stat(2) family ( stat, lstat, fstat, etc.).

The locking properties of access rights are summarized in the following table.

Access Right Blocked Operations
DM_RIGHT_SHARED data write, object destruction
DM_RIGHT_EXCL all but stat(2)

Table: Access Right Properties for Files

Notice that the above descriptions do not say that other processes are blocked; they say that all processes are blocked. This is where the distinction that DM applications do not really "own" access rights comes into play.

The only way a DM application can distinguish itself from other processes that should be blocked is by knowing the dm_token_t value identifying the appropriate token, and passing it in with any operations that are to be performed on the file. It follows from this that once a DM application has "obtained" a DM_RIGHT_SHARED or DM_RIGHT_EXCL access right, either directly via a dm_request_right() call or indirectly via an event message, the DM application must be extremely cautious when performing operations on file system objects. Generally, it must restrict itself to using interfaces containing dm_token_t parameters.

For example, calling dm_request_right() and requesting DM_RIGHT_EXCL does not make a DM application the owner of the right; dm_request_right() merely creates the right and encapsulates it in the synchronous event message referenced by the token. Once that happens, all operations against the file system object will be blocked as described above, even if they come from the same process that called dm_request_right(). Only operations that are part of the DMAPI and contain dm_token_t arguments are safe for DM application to call at this point, because those interfaces are the only way DM application can distinguish themselves as "owning" the DM_RIGHT_EXCL right.

Upgrading Access Rights

When requesting access rights to an object via dm_request_right(), the requested right may not be immediately available. If the DM application has specified that it wants to block until the right becomes available, the DM app may or may not be blocked interruptibly. The implementation of the DMAPI will specify the semantics for interrupting blocked processes.

If a DM application holds a DM_RIGHT_SHARED access right, it can attempt to upgrade the right to a DM_RIGHT_EXCL in a non-blocking manner via dm_request_right(). If the DMAPI implementation cannot grant the request, however, the DM application will most likely have to release the DM_RIGHT_SHARED right, and request DM_RIGHT_EXCL access to the object via dm_request_right() in a blocking fashion.

A DM application may also request to upgrade a DM_RIGHT_SHARED access right to a DM_RIGHT_EXCL in a non-blocking manner via dm_upgrade_right() if the DMAPI implementation is able to upgrade the right without releasing the DM_RIGHT_SHARED access right.

The state of the object cannot change while the DM application is waiting for an exclusive right via dm_upgrade_right(). However, the state of the file may change if the request to upgrade is via dm_request_right(). To provide some indication that the file changed while the application was blocked, the DMAPI provides the notion of a change indicator that can be interrogated via dm_get_fileattr(). This change indicator is modified by any operation that modifies file data or metadata. The change indicator is not persistent and has no meaning across reboots. Its only purpose is to indicate to the DM application that the file may have changed since the last time the change indicator was interrogated.

The normal sequence of events for attempting a lock upgrade where the current shared lock must be dropped would be as follows:

  1. Obtain current change indicator.

  2. Release shared right.

  3. Request exclusive right (blocking operation).

  4. Obtain new change indicator to see if the file has changed.

The following functions for manipulating access rights are provided:

Placing Holds on Objects

If a DM application needs to make sure an object does not go away after releasing all access rights to the object, dm_obj_ref_hold() may be called to obtain an object hold. The effect is to prevent the object from being flushed out for the duration of the hold and essentially making non-persistent data management attributes temportarily persistent. Responding to an event releases all holds associated with the event.

The following functions are for manipulating object holds:

Finding Extents and Punching Holes

Data Management applications often need to release the on-disk blocks of a file to free up space on a file system. Likewise, if a large, but sparsely populated file is to be backed up efficiently, a DM application needs to know where the file has non-null data and where the file has holes. These operations may not be supported on all file system types; dm_get_config() can be used to determine if the underlying file system supports punching holes.

The DM application is responsible for maintaining accurate information about the location of any holes in the original file when a sparse file is made non-resident. It is assumed that the DM application will call dm_get_allocinfo() to determine where actual storage is located, and only perform dm_read_invis() operations on the portions of the file that contain data.

The following functions return information about a file in terms of a dm_extent structure, as defined in the Data Structures chapter (see Data Structures ). These functions, which do not affect any of a file's time stamps, are provided for managing the storage space for a file:

Invisible Read and Write

Many data management applications must be able to access file data without altering the file's access, modification, and change times, and without generating any events. The operations in this section do not trigger events; they bypass the normal event delivery mechanism to prevent a DM application from receiving events generated by itself.

The invisible write function by default writes data asynchronously. If a DM application requires that data written to a file be flushed at certain times, it can either set a flag specifying that writes happen synchronously or it can call a separate function to flush the file's contents to media.

The following functions, which do not affect any of a file's time stamps, are provided:

Managed Regions

Managed regions provide a mechanism for a data management application to control a specific region of a file. Managed regions provide granularity finer than the entire file for data events such as read and write. Their use is particularly important for very large files that may be larger than the actual amount of available disk space.

A single managed region is represented by a dm_region structure. The set of managed regions for a file is a collection of these structures. See Data Structures Data Structures for a definition of this structure.

The generation of events for a managed region is controlled by a flags field in the dm_region structure. The possible values for this field are a bitwise OR of one or more of the following:

DM_REGION_READ

Generate a synchronous event for a read operation that overlaps this managed region.

DM_REGION_WRITE

Generate a synchronous event for a write operation that overlaps this managed region.

DM_REGION_TRUNCATE
Generate a synchronous event for a truncate operation that overlaps this managed region.

or the following value:

DM_REGION_NOEVENT

Do not generate any events for this managed region.

The events defined above are the only synchronous data events that are defined for a managed region. Only one of the above events will be produced for a particular read/write/truncate operation, no matter how many managed regions the operation may overlap.

The example in Overlapping of Events across Managed Regions below shows a read operation that overlaps two managed regions that have read events set.

Figure: Overlapping of Events across Managed Regions

In Overlapping of Events across Managed Regions , a read event is produced for Managed Region A. The arguments passed to the DM application in the event message have the offset and length of the read operation; it is up to the DM application to determine which managed regions the operation will overlap. Once the DM application responds to the event message, the DMAPI implementation allows the read to continue.

As an example, if a DM application fills the managed region A above, but not B, and continues the operation, the behavior of the entire read operation is undefined.

Rationale:

Triggering one event per file operation eliminates the necessity of having the DMAPI implementation re-evaluate all managed regions involved in a given operation. Otherwise, the DMAPI implementation could be forced to generate multiple events per managed region for a single I/O operation.

To change the set of managed regions, the DM application must obtain DM_RIGHT_EXCL rights to the object. Since managed regions may or may not be persistent, the DM application must be prepared to expect a debut event and to use dm_set_region() to download the set of managed regions for a file.

Managed regions may be constrained by the following restrictions:

A DM application can determine the properties of the DMAPI managed region implementation by consulting the dm_get_config() interface.

The following functions, which do not affect any of a file's time stamps, are provided for manipulating the managed regions of a file:

File Attributes and Bulk Retrieval

Attributes need to be retrieved for a single file, a directory, or a whole file system. The attributes returned are defined by the dm_stat structure. There are a number of methods for obtaining these file attributes:

For the second, third, and fourth methods, the application either provides a buffer large enough to contain all retrieved attributes or more commonly (particularly for the last option) the application makes iterative calls through the interface. A file system must be mounted to have its attributes retrieved via any of the above methods.

DM applications often need to set a file's metadata to specific values transparently. For example, a backup application might want to set a file's time stamps to their original value when the file is restored. Specific fields from the dm_stat structure are encapsulated in the dm_fileattr struct; this structure is used to set various metadata fields to specific values via dm_set_fileattr().

Before calling dm_get_bulkattr(), dm_get_dirattrs(), and dm_get_bulkall(), the DM application must initialize an opaque "cookie" which provides location information to the DMAPI. Each call of dm_get_bulkattr(), dm_get_dirattrs() or dm_get_bulkall() can use this cookie to determine location information from one call to the next.

The file's change indicator can also be retrieved using dm_get_fileattr(). This change indicator is modified by any operation that modifies file data or metadata. DM applications can use the change indicator to determine if a file may have changed state; if the indicator is the same between two calls, the file is guaranteed not to have changed. If the indicator is different, the file may (but not necessarily) have changed. This is especially useful for attempting lock upgrades, as described in Upgrading Access Rights, Upgrading Access Rights .

The following functions, which do not update any of an object's time stamps, are provided for obtaining bulk attributes:

The following function, which does not update any of a file's time stamps, is provided for obtaining the attributes of a single file:

The following function, which does not update any of the file's time stamps (other than those specified) as a side effect, is provided for metadata modification:

Data Management Attributes

Support for persistent data management attributes is a DMAPI implementation option. Some DMAPI implementations may not support persistent opaque data management attributes, while others may not provide support for persistent non-opaque attributes such as event lists. DM applications should use the dm_get_config() function to determine what the implementation provides.

A persistent attribute is one which stays defined across reboots. A non-persistent attribute is one that may disappear at any time without notice (typically during inode flush). For more information on how to manage non-persistent attributes, refer to the debut event.

Non-opaque Data Management Attributes

There are two types of non-opaque attributes:

Opaque Data Management Attributes

The DMAPI persistent opaque attribute mechanism provides a set of (name, value) pairs associated with a file system object. The name is a fixed length 8 byte (defined as DM_ATTR_NAME_SIZE) opaque value determined by the DM application and is interpreted as a byte sequence. Attribute names starting with ASCII "_" (0x5F) are reserved for future common attribute labels. In order to prevent name clashes, the first three bytes of the attribute name are currently assigned through a reservation process. The prefix should identify the company whose DM product is using the attribute, for example, Cheyenne has "CYE" reserved.

To register a 3-byte prefix, send e-mail to xdsmreg@opengroup.org, identifying the company name and the requested name.

Registered prefixes can be checked on the World-Wide Web at the following location:



http://www.opengroup.org/public/tech/sysman/xdsmreg.htm

The attribute value is variable length and also opaque. It is recommended that the values be stored in network byte order to support the movement of media between architectures. These attributes are persistent across reboots.

If the DM implementation supports opaque attributes, a limited number of attributes may be stored persistently with each file. Each attribute may store up to DM_CONFIG_MAX_ATTRIBUTE_SIZE bytes of data per file. The value of DM_CONFIG_MAX_ATTRIBUTE_SIZE is obtained via dm_get_config() and has a lower bound of 32 bytes. The total amount of space available for storage of all persistent attributes on a file system is bounded by DM_CONFIG_TOTAL_ATTRIBUTE_SPACE.

Associated with the file attributes is a per-file time stamp called dtime, which is updated when attributes are created, modified, or deleted, or when a new file inherits its attributes from the parent directory. The dtime time stamp may be the same as ctime as determined by the value returned from the dm_get_config() function with DM_CONFIG_DTIME_OVERLOADED. If dtime is not overloaded, then any operation that manipulates attributes does not modify the file's traditional time stamps ( atime, mtime, ctime).

If DM_CONFIG_PERS_INHERIT_ATTRIBS (obtainable from dm_get_config()) is DM_TRUE, DM applications can mark persistent attributes as inheritable. If a directory has an attribute (such as lock_on_magnetic) that has been marked inheritable and a file is created in the directory, then the file would inherit the attribute. Attributes that are not marked inheritable are not copied.

DM applications mark an attribute inheritable on a per-file system basis and for specified file types. For example, a DM application could mark the above attribute (lock_on_magnetic) inheritable for newly created regular files only. Newly-created directories would not inherit the attribute.

Attribute inheritance is not persistent across reboots. If a DM application marked the lock_on_magnetic attribute as inheritable and the system were then brought down, the attribute would no longer be inheritable when the system came back up.

The following functions are provided for attribute management:

The following functions are provided for managing inheritable attributes:

Events

The DMAPI provides DM applications with the ability to monitor and manage the data in a file system without having to export all the file system semantics from kernel space to user space via the event interface. Events are generated by a DMAPI implementation, and then the messages are enqueued on a session for delivery to a DM application.

The intent of the DMAPI is to support a single product on any single file system. The DMAPI does not preclude different products from different vendors operating on the same file system, but it is not recommended. Different products on different file systems are fully supported by the DMAPI with regard to event delivery.

Therefore, the following event restrictions exist:

Certain events are optional in the DMAPI specification. It is recommended that for each file system being managed by a DM application, that the application initially call dm_get_config_events() to determine which events are supported by the DMAPI implementation for that file system.

Setting Event Disposition

After creating a session, DM applications must register with the DMAPI to establish the disposition of events for a file system (that is, what session the events will be sent to). The event list is the complete set of all events, including managed region events, that the DM application is monitoring during the life of the session. Since registration is on a per-session basis, this event list is not persistent across reboots. It is not possible to register to receive events on anything other than the file system object.

Once a DM application has registered its event list and session with the DMAPI, it can begin receiving event messages on a file system. Registration can be thought of as establishing the association between a file system and a session, as it lets the DMAPI implementation know which session to send specific event messages to.

The example shown in Disposition of Event Delivery illustrates the case where a DM application has registered with the file system represented by "foo" for the read and write events. The event messages are delivered to the application via session 42. The file bar has an event list of read, write, and truncate that was previously set via dm_set_region().


Figure: Disposition of Event Delivery

In Disposition of Event Delivery , the read event is delivered to DM application 1, since that is the session for that specific event.

Multiple applications can register their session and event list for a file system. If two applications attempt to register to receive the same event, the last application to register for the event will receive it; prior registrations for the event are replaced.

Rationale:

If this were not the case, and replacement were done on an entire event list, not a per-event basis, then it would not be possible to have more than one active session registered for a file system. Having each event in the event list handled individually allows multiple applications to be active on the same file system simultaneously, all handling different events.

Duplicate Event Registrations on a File System illustrates how Disposition of Event Delivery would change if a second DM application registered for just the read events.

Figure: Duplicate Event Registrations on a File System

In Duplicate Event Registrations on a File System , read events are now sent to DM application 2, via session 69, not DM application 1. write events will still be delivered to DM application 1.

Rationale:

The burden is on the system administrator to ensure that two different DM applications do not attempt to control the same events on the same file system. In Figure 5, an alternative implementation of dm_set_disp() would be to return an error saying that an <event, file system, session> binding already exists. Another option would be to send a special event to DM app one, informing it that it no longer will be receiving read events. While these options could be implemented, it is believed that the level of complexity is not warranted for this version of the DMAPI.

The examples given above assume that the file system the DM application is monitoring is already mounted. However, it is quite possible that a DM application wants to set itself up to monitor a file system that is not yet mounted.

The "mount" Event

The restriction of only sending synchronous events to one session has special ramifications with regard to the mount event. It is not the intent of the DMAPI to force a model of one "super-daemon" that listens for mount events, and then forwards the event to the appropriate recipient. However, there is a special bootstrap problem with regard to receiving the mount event before a file system handle is available. To receive mount events, a DM application must use the global handle in the dm_set_disp() function. The mount event will be sent serially to each session that has executed dm_set_disp(). The event is not broadcast to all sessions concurrently. The order in which the DMAPI implementation sends the event to the sessions is not defined.

The mount event will be sent for all file systems that support the DMAPI. Specifying the event in the dm_set_eventlist() function is not allowed, since the event is not persistent. When the mount event is received, the DM application can determine if it is interested in the file system that is specified in the event message. If a DM application is not interested in the file system, then it must respond to the event via dm_respond_event() with a code of DM_RESP_DONTCARE. The first DM application that responds to the event with DM_RESP_CONTINUE and an error code of zero prevents the event from being sent to any of the remaining sessions. If any DM application returns an error [DM_RESP_ABORT], then the mount event will not be sent to any other session.


Figure: Mount Event Propagation

In Mount Event Propagation , 3 DM applications have specified via dm_set_disp() that they want to receive the mount event. The DMAPI implementation sends the mount event message to DM application A in step 1, which is not interested in the event, so it responds to the event message with DM_RESP_DONTCARE in step 2. The DMAPI implementation then sends the mount event message to DM application B in step 3, which determines that it wants to monitor the file system. It responds to the event message with a DM_RESP_CONTINUE in step 4, so the mount event is not sent to the remaining DM application C.

If all of the DM applications receiving mount events return DM_RESP_DONTCARE, then the file system mount proceeds normally.

For recovery processing, many DM applications will need the name of the file system device and the directory that it was mounted at. This information is made available via the mount event. During application restart, an application can get the same information via dm_get_mountinfo(). A DM application would determine all the file systems that were being monitored via dm_getall_disp(), and then use dm_get_mountinfo() to obtain more information about the file systems.

The following functions are provided for manipulating the disposition of a session's events for a file system:

Setting Event Notification

DM applications can specify that they need to receive certain events on an object. Events will only be generated for these objects, not for all objects in the file system (except for the debut event, discussed specifically later in this section). To set event notification on a object, the DM application must specify an event list for the object. This object is specified via a handle. The handle can be either the file system handle when setting events on a per file system basis, or a handle to a specific file system object. Executing dm_set_eventlist() may or may not persistently store the eventlist with the object; it is dependent on the particular implementation of the DMAPI. The persistence characteristic can be determined via the dm_get_config() function.

The DM application must specify the entire list of events that is to be generated for the object. If an event list already exists for the object, it is replaced by the new one specified in the dm_set_eventlist() function. If an event list was previously set for the entire file system, and a subsequent event list for an object in that file system includes an event that was set for the file system (or vice versa), the result is undefined. All events, with the exception of the managed region events and the mount event, can be specified in the dm_set_eventlist() function. If the object has multiple managed regions, then dm_get_eventlist() returns the union of all managed region events, in addition to the other events.

When an event is generated by the file system, the DMAPI implementation uses the session to determine the recipient. Since DM applications must register with the DMAPI via the dm_set_disp() to specify the event list and the session, the DMAPI can easily determine the target session for any given event.

Some implementations of the DMAPI may not provide any persistent storage, even for event notification. For these "zero bit" implementations, the DMAPI provides a debut event before any access is granted to the object. This debut event should be specified in the event list when the DM application sets its event disposition. The debut event gives the DM application the ability to download information (such as event lists and managed region information) that may be needed by the DMAPI implementation. Most likely, when downloading a new event list for an object, the list will not include the debut event, but only include events that require some action to be performed by the DM application.

The debut event is the first indication given to a DM application that a primitive DMAPI implementation is going to perform an operation on a file. The DM application can take this opportunity to download all the necessary information for that particular file, or for other files as well. Alternately, some DM applications may want to intercept the mount event to prime primitive DMAPI implementations, rather than having to receive many debut events.

The following functions for managing event lists on a file system objects are provided:

Receiving and Responding to Events

Pending events can be received one at a time or in bulk. For synchronous events, a response to each event message is required. For all events, the only valid response is an indication of whether the operation should be continued or aborted. If the operation is to be aborted, an error can also be specified that will be returned to the user process in the form of an errno.

Event messages are variable length. This is because two of the primary fields of most event messages, file handles and path names, are variable length. DM applications should use dm_get_config() to determine the largest message size to size their buffers for calls to dm_get_events(). For more information on accessing and manipulating variable length message buffers, see Data Structures definitions in Data Structures .

The process that generated the event is blocked until the response is received by the DMAPI implementation. The sleep may or may not be interruptible; the implementation of the DMAPI will need to define the behavior for each synchronous event.

When a synchronous event message is generated, a token is part of the message. The token identifies the event message, and may reference access rights that are conveyed as part of the event message. No tokens are passed in an asynchronous messages.

When a DM application responds to a data event message, the token may reference access rights. If a DM application allows the operation to continue with the DM_RESP_CONTINUE return code, then special care must be taken by the implementation of the DMAPI to allow the operation that caused the event generation to continue without another DM application changing the state of the file.

Consider the following example:

Figure: Event Generation with No Rights

In Event Generation with No Rights . the user process has initiated a write(2) operation in user space, shown as step 1. When the application begins executing the Operating System code that performs the operation in the kernel, it detects that it must generate a synchronous managed region write event. The event message is enqueued on the session in step 2, and the user process is then awaited.

Figure: Requesting Access Rights after Event Generation

In Requesting Access Rights after Event Generation , the event message has been enqueued on the session in step 2, and is delivered to DM application A via dm_get_events() in step 3. Since the event message conveys no rights, DM application A must obtain access rights to the object. In this example, it requires the DM_RIGHT_EXCL right, which it obtains in step 4. At the same time, DM application B attempts to get exclusive access to the file in step 5. Since the access right is not available, DM application B will wait.

Figure: Continuing an Event with Access Rights

In Continuing an Event with Access Rights , DM application A has completed its processing in step 6 and continues the operation via a dm_respond_event() with the DM_RESP_CONTINUE response code. At the point when the function returns to the DM application (not explicitly shown, but it can be assumed to be a step 6a), the token that referenced the access rights to the object is invalid. However, the DMAPI implementation cannot immediately release the rights referenced by the token and grant them to someone else.

In step 7, the user process that caused the data event to be generated is resumed by the Operating System, and continues operation at the point at which the event was generated. Once the DMAPI implementation has completed whatever event processing it deems necessary, and once it has acquired whatever locks it needs to complete the rest of the write(2) operation, the access rights can be released. At this point, DM application B can be allowed to obtain the DM_RIGHT_EXCL access right, in step 8.

Rationale:

DM applications are logical extensions of the file system. When a DM application has completed the servicing of an event, it should appear as though the conditions that caused the event to be generated no longer exist. From the standpoint of the Operating System, it is as though the event never occurred; whatever state that required the event to be generated has been taken care of by the DM application.

In the example above, if DM application B were allowed to gain exclusive access to the file, it could possibly change the state of the file; all the recently-completed work of DM application A would then be void. More importantly, the implementation of the DMAPI would have no way to tell what state the file is in, unless it monitored all the actions of DM application B. It is also important to prevent the user process from starvation. Therefore, the user process should be allowed to continue its processing after DM application A has completed the event servicing.

The following functions for receiving event messages and responding to synchronous messages are provided:

Some DM applications may be multi-threaded (or made up of multiple processes). To facilitate the processing of events between related processes, the DMAPI provides a method to move an outstanding event message from one session to another. The event message remains in the outstanding state, even though it is now enqueued on a different session.

The following function is provided:

If a DM application knows that it will take some significant period of time to process an event, the application can optionally notify the DMAPI implementation. The implementation is free to use or ignore the information.

The following function is provided:

When a destroy event occurs, a DM application may optionally receive one DM attribute value in the event message by specifying to the DMAPI implementation which DM attribute name it wants to receive at destroy time.

The following function is provided:

Pseudo Events

Pseudo events do not correspond to an event generated as a result of an operation in the operating system, such as a write(2). They are created by the DM application for purposes of generating a token or sending a message to a session. The actual message data is opaque to the DMAPI implementation. For the format of the pseudo-event, see Pseudo Events Pseudo Events . There is currently only one type of pseudo event; the user event.

As described in the Tokens Tokens , tokens are always associated with a synchronous event message. To gain access to an object, a DM application must first create a message that contains the context for a token. The required access right can then be obtained via dm_request_right(). dm_create_userevent() will create a synchronous event message of type user and enqueue it on the indicated session. The message and its corresponding token are outstanding. From the standpoint of the DMAPI, the message appears to have been delivered to a DM application via dm_get_events(), but has not yet been responded to via dm_respond_event(). The message will continue to exist until the DM application does a dm_respond_event() with the token.

For purposes of recovery processing, intelligent DM applications can use the user-generated event message mechanism to log their state during long and complicated operations. For example, if a DM application requires exclusive access to a file, it first needs to create a synchronous message. It puts together a user-level event message describing the operation, and then requests that a token be generated and associated with this pseudo-event message. If the DM application aborts (via a bus error, kill signal, etc.) before responding to the event, when it restarts, it can obtain the message and any corresponding state. This can provide the application with valuable information about its state when it aborted.

User-created messages can also be used as a test mechanism, to ensure that communications between the DMAPI implementation and a DM application are working correctly. Applications can use dm_send_msg() to create a synchronous or asynchronous message and have it enqueued on any specified session. The created message is also of type user, and contains the data specified by the user. For synchronous messages, the function does not return until the message has been responded to. Obviously, the process initiating the message via dm_send_msg() must not also be responsible for consuming the message via dm_get_events(), or it will hang.

The following functions for creating a user level event message exist:

Configuration Information

In order for a DM application to determine information about the underlying implementation of the DMAPI, an interface exists to interrogate various implementation specific details. The function dm_get_config() is called on a per-file system basis. Based on selected options in this function, it will return information as listed in its man-page definition (see dm_get_config ).

Limited Backup and Restore Support

Many current vendor migration and backup applications require additional interfaces into the DMAPI in order to fully support their functionality. To ease a vendor's transition to the DMAPI, a set of optional DM interfaces may be provided. They consist of the following functions:
Why not acquire a nicely bound hard copy?
Click here to return to the publication details or order a copy of this publication.

Contents Next section Index