Stakeholders and Concerns Modeling the View Data Intensive Versus Information Intensive Achieving Interoperability Software Tiers Uses of a Data Access Tier Distribution Conclusion
Building a software-intensive system is both expensive and time consuming. Because of this, it is necessary to establish guidelines to help minimize the effort required and the risks involved. This is the purpose of the Software Engineering View, which should be developed for the software engineers who are going to develop the system.
Major concerns for these stakeholders are:
Development Approach
Software Modularity and Re-Use
Portability
Migration and Interoperability
There are many lifecycle models defined for software development (waterfall, prototyping, etc.). A consideration for the architect is how best to feed architectural decisions into the lifecycle model that is going to be used for development of the system.
As a piece of software grows in size, so the complexity and interdependencies between different parts of the code increase. Reliability will fall dramatically unless this complexity can be brought under control.
Modularity is a concept by which a piece of software is grouped into a number of distinct and logically cohesive sub-units, presenting services to the outside world through a well defined interface. Generally speaking, the components of a module will share access to common data, and the interface will provide controlled access to this data. Using modularity, it becomes possible to build a software application incrementally on a reliable base of pre-tested code.
A further benefit of a well defined modular system is that the modules defined within it may be re-used in the same or on other projects, cutting development time dramatically by reducing both development and testing effort.
In recent years, the development of Object Oriented Programming Languages has greatly increased programming language support for module development and code re-use. Such languages allow the developer to define 'classes' (a unit of modularity) of objects that behave in a controlled and well-defined manner. Techniques such as inheritance - which enables parts of an existing interface to an object to be changes- enhance the potential for re-usability by allowing pre-defined classes to be tailored or extended when the services they offer do not quite meet the requirement of the developer.
If modularity and software re-use are likely to be key objectives of new software developments, consideration must be given to whether the component parts of any proposed architecture may facilitate or prohibit the desired level of modularity in the appropriate areas.
Software portability, the ability to take a piece of software written in one environment and make it run in another, is important in many projects, especially product developments. It requires that all software and hardware aspects of a chosen technical architecture (not just the newly developed application) be available on the new platform. It will, therefore, be necessary to ensure that the component parts of any chosen architecture are available across all the appropriate target platforms.
Interoperability is always required between the component parts of a new architecture. It may also, however, be required between a new architecture and parts of an existing legacy system - for example during the staggered replacement of an old system. Interoperability between the new and old architectures may, therefore, be a factor in architectural choice.
The general architecture of a "software system" can be modeled with ADML entities (components, ports, connectors, and roles).
A commonly used alternative is the Unified Modeling Language, UML.
This View considers two general categories of software systems. First, there are those systems that require only a user interface to a database, requiring little or no business logic built into the software. These systems can be called Data Intensive. Second, there are those systems that require users to manipulate information that might be distributed across multiple databases, and to do this manipulation according to a predefined business logic. These systems can be called Information Intensive.
Data intensive systems can be built with reasonable ease through the use of 4GL tools. In these systems, the business logic is in the mind of the user, i.e., the user understands the rules for manipulating the data and uses those rules while doing his work.
Information intensive systems are different. Information is defined as meaningful data, i.e., data in a context that includes business logic. Information is different from data. Data is the tokens that are stored in databases or other data stores. Information is multiple tokens of data combined to convey a message. For example, 3 is data, but 3 widgets is information. Typically, information reflects a model. Information intensive systems also tend to require information from other systems, and, if this path of information passing is automated, usually some mediation is required to convert the format of incoming information into a format that can be locally used. Because of this, information intensive systems tend to be more complex than others, and require the most effort to build, integrate, and maintain.
This view is concerned primarily with information intensive systems. In addition to building systems that can manage information, though, systems should also be as flexible as possible. This has a number of benefits. It allows the system to be used in different environments, for example, the same system should be usable with different sources of data, even if the new data store is a different configuration. Similarly, it might make sense to use the same functionality but with users who need a different user interface. So information systems should be built so that they can be reconfigured with different data stores or different user interfaces. If a system is built to allow this, it enables the enterprise to reuse parts (or components) of one system in another.
Interoperability can only be achieved when information is passed, not when data is passed. Most information systems today get information both from their own data stores and other information systems. In some cases the web of connectivity between information systems is quite extensive. The United States Air Force, for example, has a concept known as A5 Interoperability. This means that the required data is available Anytime, Anywhere, by Anyone, who is Authorized, in Any way. This requires that many information systems are architecturally linked and provide information to each other.
There must be some kind of physical connectivity between the systems. This might be a local area network, it might be a wide area network, or, in some cases, it might simply be the passing of a disk or CD between systems.[3] Assuming a network connects the systems, there must be agreement on the protocols used. This enables the transfer of bits.
When the bits are assembled at the receiving system, they must be placed in the context that the receiving system needs. In other words, both the source and destination systems must agree on an information model. The source system uses this model to convert its information into data to be passed, and the destination system uses this same model to convert the received data into information it can use.
This usually requires an agreement between the architects and designers of the two systems. In the past, this agreement was often documented in the form of an Interface Control Document (ICD). The ICD defines the exact syntax and semantics that the sending system will use so that the receiving system will know what to do when the data arrives. The biggest problem with ICDs is that they tend to be unique solutions between two systems. If a given system must share information with n other systems, there is the potential need for n2 ICDs. This extremely tight integration prohibits flexibility and the ability of a system to adapt to a changing environment. Maintaining all these ICDs is also a challenge.
New technology such as eXtensible Markup Language (XML) has the promise of making data self describing. Use of new technologies such as XML, once they become reliable and well documented, might eliminate the need for an ICD. Further, there would be Commercial Off The Shelf (COTS) products available to parse and manipulate the XML data, eliminating the need to develop these products inhouse. It should also ease the pain of maintaining all the interfaces.
Another approach is to build mediators between the systems. Mediators would use metadata that is sent with the data to understand the syntax and semantics of the data and convert it into a format usable by the receiving system. However, mediators do require that well formed metadata be sent, adding to the complexity of the interface.
Typically, software architectures are either 2-tier or 3-tier.[4] Each tier typically presents at least one capability.
In a two-tier architecture, the user interface and business logic are tightly coupled while the data is kept independent. This gives the advantage of allowing the data to reside on a dedicated data server. It also allows the data to be independently maintained. The tight coupling of the user interface and business logic assure that they will work well together, for this problem in this domain. However, the tight coupling of the user interface and business logic dramatically increases maintainability risks while reducing flexibility and opportunities for reuse.
A 3-tier approach adds a tier that separates the business logic from the user interface. This in principle allows the business logic to be used with different user interfaces as well as with different data stores. With respect to the use of different user interfaces, users might want the same user interface but using different COTS presentation servers, for example, Java Virtual Machine (JVM) or Common Desktop Environment (CDE)[5]. Similarly, if the business logic is to be used with different data stores, then each data store must use the same data model[6] (data standardization), or a mediation tier must be added above the data store (data encapsulation).
To achieve maximum flexibility, software should utilize a 5-tier scheme for software which extends the three-tier paradigm (see Figure 1). The scheme is intended to provide strong separation of the three major functional areas of the architecture. Since there are client and server aspects of both the user interface and the data store, the scheme then has 5 tiers.[7]
The presentation tier is typically COTS-based. The presentation interface might be an X-server, Win32, etc. There should be a separate tier for the user interface client. This client establishes the look and feel of the interface; the server (presentation tier) actually performs the tasks by manipulating the display. The user interface client hides the presentation server from the application business logic.
The application business logic, e.g., a scheduling engine, should be a separate tier. This tier is called the application logic and functions as a server for the user interface client. It interfaces to the user interface typically through callbacks. The application logic tier also functions as a client to the data access tier.
If there is a user need to use an application with multiple databases with different schema, then a separate tier is needed for data access. This client would access the data stores using the appropriate COTS interface[8] and then convert the raw data into an abstract data type representing parts of the information model. The interface into this object network would then provide a generalized data access interface (DAI) which would hide the storage details of the data from any application that uses that data.
Each tier in this scheme can have zero or more components. The organization of the components within a tier is flexible and can reflect a number of different architectures based on need. For example, there might be many different components in the application logic tier (scheduling, accounting, inventory control, ) and the relationship between them can reflect whatever architecture makes sense, but none of them should be a client to the presentation server.
This clean separation of user interface, business logic, and information will result in maximum flexibility and componentized software that lends itself to product line development practices. For example, it is conceivable that the same functionality should be built once and yet be usable by different presentation servers (e.g., on PCs or UNIX boxes), displayed with different looks and feels depending on user needs, and usable with multiple legacy databases. Moreover, this flexibility should not require massive rewrites to the software whenever a change is needed.
The data access tier provides a standardized view of certain classes of data, and as such functions as a server to one or more application logic tiers. If implemented correctly, there would be no need for application code to know about the implementation details of the data. The application code would only have to know about an interface that presents a level of abstraction higher than the data. This interface is called the Data Access Interface (DAI).
For example, should a scheduling engine need to know what events are scheduled between two dates, that query should not require knowledge of tables and joins in a relational database. Moreover, the DAI could provide standardized access techniques for the data. For example, the DAI could provide a Publish and Subscribe (P&S) interface whereby systems which require access to data stores could register an interest in certain types of data, perhaps under certain conditions, and the DAI would provide the required data when those conditions occur.
One means to instantiate a data access component is with three layers, as is shown in Figure 3. This is not the only means to build a DAI, but is presented as a possibility.
Whereas the Direct Data Access layer contains the implementation details of one or more specific data stores, the Object Network and the Information Distribution layer require no such knowledge. Instead, the upper two layers reflect the need to standardize the interface for a particular domain. The Direct Data Access layer spans the gap between the Data Access tier and the Data Store tier, and therefore has knowledge of the implementation details of the data. SQL statements, either embedded or via a standard such as DRDA or ODBC, are located here.
The Object Network layer is the instantiation in software of the information model. As such, it is an efficient means to show the relationships that hold between pieces of data. The translation of data accesses to objects in the network would be the role of the Direct Data Access layer.
Within the Information Distribution layer lies the interface to the outside world. This interface typically uses a data bus to distribute the data (see below)[9]. It could also contain various information-related services, for example, a P&S registry and publication service or an interface to a security server for data access control[10]. The Information Distribution layer might also be used to distribute applications or applets required to process distributed information. Objects in the object network would point to the applications or applets, allowing easy access to required processing code.
The DAI enables a very flexible architecture. Multiple raw capabilities can access the same or different data stores all through the same DAI. Each DAI might be implemented in many ways, according to the specific needs of the raw capabilities using it. Figure 4 illustrates a number of possibilities, including multiple different DAIs in different domains accessing the same database, a single DAI accessing multiple databases, and multiple instantiations of the same DAI access the same database.
It is not always clear that a DAI is needed, and it appears to require additional work during all phases of development. However, should a database ever be redesigned, or if an application is to be reused and there is no control over how the new data is implemented, using a DAI saves time in the long run.
The International Standards Organization Reference Model for Open Distributed Processing (RM-ODP)[11] offers a meta-standard that is intended to allow more specific standards to emerge. This Reference Model defines a set of distribution transparencies that are applicable to the TOGAF Software View.
Transparency |
Definition |
Access |
Masks differences in data representation and invocation mechanisms to enable interworking between objects. This transparency solves many of the problems of interworking between heterogeneous systems, and will generally be provided by default. |
Failure |
Masks from an object the failure and possible recovery of other objects (or itself) to enable fault tolerance. When this transparency is provided, the designer can work in an idealized world in which the corresponding class of failures does not occur. |
Location |
Masks the use of information about location in space when identifying and binding to interfaces. This transparency provides a logical view of naming, independent of actual physical location. |
Migration |
Masks from an object the ability of a system to change the location of that object. Migration is often used to achieve load balancing and reduce latency. |
Relocation |
Masks relocation of an interface from other interfaces bound to it. Relocation allows system operation to continue even when migration or replacement of some objects creates temporary inconsistencies in the view seen by their users. |
Replication |
Masks the use of a group of mutually behaviorally compatible objects to support an interface. Replication is often used to enhance performance and availability. |
Transaction |
Masks coordination of activities amongst a configuration of objects to achieve consistency. |
Table 1: RM-ODP Distribution Transparencies
The Infrastructure Bus represents the middleware that establishes the client/server relationship. This commercial software is like a backplane onto which one can plug capabilities. A system should adhere to a commercial implementation of a middleware standard. This is to ensure that capabilities using different commercial implementations of the standard can interoperate. If more than one commercial standard is used (e.g., COM and CORBA), then the system should allow for interoperability between implementations of these standards via the use of commercial bridging software.[12] Wherever practical, the interfaces should be specified in the appropriate Interface Description Language (IDL). Taken this way, every interface in the 5-tier scheme represents an opportunity for distribution.
Clients can interact with servers via the Infrastructure Bus. In this interaction, the actual network transport (TCP/IP, HTTP, etc.), the platform/vendor of the server, and the operating system of server are all transparent.
Figure 5: Notional Distribution Model
The Software Engineering View gives guidance on how to structure software in a very flexible manner. By following these guidelines, the resulting software will be componentized. This enables the reuse of components in different environments. Moreover, through the use of an infrastructure bus and clean interfaces, the resulting software will be location independent, enabling its distribution across a network.
[1] Some of the material in this document is from The Command and Control System Target Architecture (C2STA) which was developed by the Electronic Systems Center (ESC) of the United States Air Force between 1997 and 2000.
[2] The word interoperate implies that one processing system performs an operation on behalf of or at the behest of, another processing system. In practice the request is a complete sentence containing a verb (operation) and one or more nouns (identities of resources - where the resources can be information, data, physical devices, etc.). Interoperability comes from shared functionality.
[3] At usable Ethernet speeds (usually about 4 mb/s), it takes about 33 minutes to transfer a 1GB file. Today, many databases are considerably larger than 1GB, and the fastest way to transfer these extremely large databases might well be to put them on CDs and send them by an overnight courier.
[4] These are different from 2 and 3 tiered system architectures in which the middle tier is usually middleware. In the approach being presented here, middleware is seen as an enabler for the software components to interact with each other. See the section below on the Infrastructure Bus for more details.
[5] This allows for the same user interface to be run on PCs, workstations, and mainframes, for example.
[6] If, for example, SQL statements are to be embedded in the business logic.
[7] Note that typical layered architectures require each layer to be a client of the layer below it and a server to the layer above it. The scheme presented here is not compliant with this description and therefore we have used the word tier instead of layer.
[8] The interface to the data store might utilize embedded SQL. A more flexible way would be to use the Distributed Relational Database Architecture (DRDA) or ODBC since either of these standards would enable an application to access different DBMS in a location-independent manner using the same SQL statements.
[9] Although it could use other mechanisms. For example, the DAI could be built as a shared library to be linked with the application logic at compile time.
[10] The security server itself would use a 5 tier architecture. The security application logic tier would interface with the DAI of other systems to provide data access control.
[12] For example, many people believe that the user interface should be built on COM while the data access tiers should be built on CORBA.
Copyright © The Open Group, 1998, 2000