This chapter describes the Data Architecture part of Phase C.
The objectives of the Data Architecture part of Phase C are to:
When an enterprise has chosen to undertake largescale architectural transformation, it is important to understand and address data management issues. A structured and comprehensive approach to data management enables the effective use of data to capitalize on its competitive advantages.
When an existing application is replaced, there will be a critical need to migrate data (master, transactional, and reference) to the new application. The Data Architecture should identify data migration requirements and also provide indicators as to the level of transformation, weeding, and cleansing that will be required to present data in a format that meets the requirements and constraints of the target application. The objective being that the target application has quality data when it is populated. Another key consideration is to ensure that an enterprise-wide common data definition is established to support the transformation.
Data governance considerations ensure that the enterprise has the necessary dimensions in place to enable the transformation, as follows:
As part of this phase, the architecture team will need to consider what relevant Data Architecture resources are available in the organization's Architecture Repository (see Part V, 41. Architecture Repository), in particular, generic data models relevant to the organization's industry "vertical" sector. For example:
This section defines the inputs to Phase C (Data Architecture).
The level of detail addressed in Phase C will depend on the scope and goals of the overall architecture effort.
New data building blocks being introduced as part of this effort will need to be defined in detail during Phase C. Existing data building blocks to be carried over and supported in the target environment may already have been adequately defined in previous architectural work; but, if not, they too will need to be defined in Phase C.
The order of the steps in this phase (see below) as well as the time at which they are formally started and completed should be adapted to the situation at hand in accordance with the established architecture governance. In particular, determine whether in this situation it is appropriate to conduct Baseline Description or Target Architecture development first, as described in Part III, 19. Applying Iteration to the ADM.
All activities that have been initiated in these steps must be closed during the Finalize the Data Architecture step (see 10.4.8 Finalize the Data Architecture). The documentation generated from these steps must be formally published in the Create Architecture Definition Document step (see 10.4.9 Create Architecture Definition Document.
The steps in Phase C (Data Architecture) are as follows:
Review and validate (or generate, if necessary) the set of data principles. These will normally form part of an overarching set of architecture principles. Guidelines for developing and applying principles, and a sample set of data principles, are given in Part III, 23. Architecture Principles.
Select relevant Data Architecture resources (reference models, patterns, etc.) on the basis of the business drivers, stakeholders, concerns, and Business Architecture.
Select relevant Data Architecture viewpoints (for example, stakeholders of the data - regulatory bodies, users, generators, subjects, auditors, etc.; various time dimensions - real-time, reporting period, event-driven, etc.; locations; business processes); i.e., those that will enable the architect to demonstrate how the stakeholder concerns are being addressed in the Data Architecture.
Identify appropriate tools and techniques (including forms) to be used for data capture, modeling, and analysis, in association with the selected viewpoints. Depending on the degree of sophistication warranted, these may comprise simple documents or spreadsheets, or more sophisticated modeling tools and techniques such as data management models, data models, etc. Examples of data modeling techniques are:
For each viewpoint, select the models needed to support the specific view required, using the selected tool or method.
Ensure that all stakeholder concerns are covered. If they are not, create new models to address concerns not covered, or augment existing models (see above).
The recommended process for developing a Data Architecture is as follows:
The organization's data inventory is captured as a catalog within the Architecture Repository. Catalogs are hierarchical in nature and capture a decomposition of a metamodel entity and also decompositions across related model entities (e.g., logical data component -> physical data component ->] data entity).
Catalogs form the raw material for development of matrices and diagrams and also act as a key resource for portfolio managing business and IT capability.
During the Business Architecture phase, a Business Service/Information diagram was created showing the key data entities required by the main business services. This is a prerequisite to successful Data Architecture activities.
Using the traceability from application to business function to data entity inherent in the content framework, it is possible to create an inventory of the data needed to be in place to support the Architecture Vision.
Once the data requirements are consolidated in a single location, it is possible to refine the data inventory to achieve semantic consistency and to remove gaps and overlaps.
The following catalogs should be considered for development within a Data Architecture:
The structure of catalogs is based on the attributes of metamodel entities, as defined in Part IV, 34. Content Metamodel.
Matrices show the core relationships between related model entities.
Matrices form the raw material for development of diagrams and also act as a key resource for impact assessment.
At this stage, an entity to applications matrix could be produced to validate this mapping. How data is created, maintained, transformed, and passed to other applications, or used by other applications, will now start to be understood. Obvious gaps such as entities that never seem to be created by an application or data created but never used, need to be noted for later gap analysis.
The rationalized data inventory can be used to update and refine the architectural diagrams of how data relates to other aspects of the architecture.
Once these updates have been made, it may be appropriate to drop into a short iteration of Application Architecture to resolve the changes identified.
The following matrices should be considered for development within a Data Architecture:
The structure of matrices is based on the attributes of metamodel entities, as defined in Part IV, 34. Content Metamodel.
Diagrams present the Data Architecture information from a set of different perspectives (viewpoints) according to the requirements of the stakeholders.
Once the data entities have been refined, a diagram of the relationships between entities and their attributes can be produced.
It is important to note at this stage that information may be a mixture of enterprise-level data (from system service providers and package vendor information) and local-level data held in personal databases and spreadsheets.
The level of detail modeled needs to be carefully assessed. Some physical system data models will exist down to a very detailed level; others will only have core entities modeled. Not all data models will have been kept up-to-date as applications were modified and extended over time. It is important to achieve a balance in the level of detail provided (e.g., reproducing existing detailed system physical data schemas or presenting high-level process maps and data requirements, highlight the two extreme views).
The following diagrams should be considered for development within a Data Architecture:
Once the Data Architecture catalogs, matrices, and diagrams have been developed, architecture modeling is completed by formalizing the data-focused requirements for implementing the Target Architecture.
These requirements may:
Within this step, the architect should identify requirements that should be met by the architecture (see 17.2.2 Requirements Development).
Develop a Baseline Description of the existing Data Architecture, to the extent necessary to support the Target Data Architecture. The scope and level of detail to be defined will depend on the extent to which existing data elements are likely to be carried over into the Target Data Architecture, and on whether architectural descriptions exist, as described in 10.2 Approach. To the extent possible, identify the relevant Data Architecture building blocks, drawing on the Architecture Repository (see Part V, 41. Architecture Repository).
Where new architecture models need to be developed to satisfy stakeholder concerns, use the models identified within Step 1 as a guideline for creating new architecture content to describe the Baseline Architecture.
Develop a Target Description for the Data Architecture, to the extent necessary to support the Architecture Vision and Target Business Architecture. The scope and level of detail to be defined will depend on the relevance of the data elements to attaining the Target Architecture, and on whether architectural descriptions exist. To the extent possible, identify the relevant Data Architecture building blocks, drawing on the Architecture Repository (see Part V, 41. Architecture Repository).
Where new architecture models need to be developed to satisfy stakeholder concerns, use the models identified within Step 1 as a guideline for creating new architecture content to describe the Target Architecture.
Verify the architecture models for internal consistency and accuracy:
Identify gaps between the baseline and target, using the Gap Analysis technique as described in Part III, 27. Gap Analysis.
Following creation of a Baseline Architecture, Target Architecture, and gap analysis, a data roadmap is required to prioritize activities over the coming phases.
This initial Data Architecture roadmap will be used as raw material to support more detailed definition of a consolidated, cross-discipline roadmap within the Opportunities & Solutions phase.
Once the Data Architecture is finalized, it is necessary to understand any wider impacts or implications.
At this stage, other architecture artifacts in the Architecture Landscape should be examined to identify:
Check the original motivation for the architecture project and the Statement of Architecture Work against the proposed Data Architecture. Conduct an impact analysis to identify any areas where the Business and Application Architectures (e.g., business practices) may need to change to cater for changes in the Data Architecture (for example, changes to forms or procedures, applications, or database systems).
If the impact is significant, this may warrant the Business and Application Architectures being revisited.
Identify any areas where the Application Architecture (if generated at this point) may need to change to cater for changes in the Data Architecture (or to identify constraints on the Application Architecture about to be designed).
If the impact is significant, it may be appropriate to drop into a short iteration of the Application Architecture at this point.
Identify any constraints on the Technology Architecture about to be designed, refining the proposed Data Architecture only if necessary.
Document rationale for building block decisions in the Architecture Definition Document.
Prepare Data Architecture sections of the Architecture Definition Document, comprising some or all of:
The outputs of Phase C (Data Architecture) may include, but are not restricted to:
The outputs may include some or all of the following:
The TOGAF document set is designed for use with frames. To navigate around the document:
Downloads of TOGAF®, an Open Group Standard, are available under license from the TOGAF information web site. The license is free to any organization wishing to use the TOGAF standard entirely for internal purposes (for example, to develop an information system architecture for use within that organization). A book is also available (in hardcopy and pdf) from The Open Group Bookstore as document G116.