The Open Group Agile Architecture Framework Draft Standard

The Open Group Snapshot

Note

NOTICE

Snapshot documents are draft standards, which provide a mechanism for The Open Group to disseminate information on its current direction and thinking to an interested audience, in advance of formal publication, with a view to soliciting feedback and comment.

A Snapshot document represents the interim results of an activity to develop a standard. Although at the time of publication The Open Group intends to progress the activity towards publication of a Preliminary Standard or full Standard, The Open Group is a consensus organization, and makes no commitment regarding publication. Similarly, a Snapshot document does not represent any commitment by any member of The Open Group to make any specific products or services available.

This Snapshot document is intended to make public the direction and thinking about the path we are taking in the development of The Open Group Agile Architecture Framework Standard. We invite your feedback and guidance. To provide feedback on this Snapshot document, please send comments by email to ogspecs-snapshot-feedback@opengroup.org no later than January 15, 2020.

This Snapshot document is valid through January 15, 2020 only.

For information on joining The Open Group, please visit www.opengroup.org/getinvolved/becomeamember.

The Open Group hereby authorizes you to use this document for any purpose, PROVIDED THAT any copy of this document, or any part thereof, which you make shall retain all copyright and other proprietary notices contained herein.

This document may contain other proprietary notices and copyright information.

Nothing contained herein shall be construed as conferring by implication, estoppel, or otherwise any license or right under any patent or trademark of The Open Group or any third party. Except as expressly provided above, nothing contained herein shall be construed as conferring any license or right under any copyright of The Open Group.

Note that any product, process, or technology in this document may be the subject of other intellectual property rights reserved by The Open Group, and may not be licensed hereunder.

This document is provided “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. Some jurisdictions do not allow the exclusion of implied warranties, so the above exclusion may not apply to you.

Any publication of The Open Group may include technical inaccuracies or typographical errors. Changes may be periodically made to these publications; these changes will be incorporated in new editions of these publications. The Open Group may make improvements and/or changes in the products and/or the programs described in these publications at any time without notice.

Should any viewer of this document respond with information including feedback data, such as questions, comments, suggestions, or the like regarding the content of this document, such information shall be deemed to be non-confidential and The Open Group shall have no obligation of any kind with respect to such information and shall be free to reproduce, use, disclose, and distribute the information to others without limitation. Further, The Open Group shall be free to use any ideas, concepts, know-how, or techniques contained in such information for any purpose whatsoever including but not limited to developing, manufacturing, and marketing products incorporating such information.

If you did not obtain this copy through The Open Group, it may not be the latest version. For your convenience, the latest version of this publication may be downloaded at www.opengroup.org/library.

The Open Group Snapshot
The Open Group Agile Architecture Framework™ Draft Standard
Document Number: S192

Published by The Open Group, July 2019.

DRAFT: Built with asciidoctor, version 2.0.9. Backend: html5 Build date: 2019-07-23 15:05:54 +0100

Preface

The Open Group

The Open Group is a global consortium that enables the achievement of business objectives through technology standards. Our diverse membership of more than 700 organizations includes customers, systems and solutions suppliers, tools vendors, integrators, academics, and consultants across multiple industries.

The mission of The Open Group is to drive the creation of Boundaryless Information Flow™ achieved by:

Working with customers to capture, understand, and address current and emerging requirements, establish policies, and share best practices
Working with suppliers, consortia, and standards bodies to develop consensus and facilitate interoperability, to evolve and integrate specifications and open source technologies
Offering a comprehensive set of services to enhance the operational efficiency of consortia
Developing and operating the industry’s premier certification service and encouraging procurement of certified products

Further information on The Open Group is available at www.opengroup.org.

The Open Group publishes a wide range of technical documentation, most of which is focused on development of Standards and Guides, but which also includes white papers, technical studies, certification and testing documentation, and business titles. Full details and a catalog are available at www.opengroup.org/library.

This Document

This document is a Snapshot of what is intended to become The Open Group Agile Architecture Framework™ Standard, also known as the O-AAF™ Standard. It is being developed by The Open Group.

This document follows a modular structure and is organized in four parts:

Part 1: Agile Architecture Fundamentals gives an overview of this document and introduces the key concepts
Part 2: Playbooks provides guidelines to solve an Agile Architecture problem
Part 3: Architecture Patterns describes solution types to solve problem types
Part 4: Methods develops a "meta methodology" discourse on relevant methods

This first Snapshot document has fully developed an initial release of the architecture fundamentals section. The other sections are still incomplete and will be completed in the next version.

The target audience for this document includes:

Agilists who need to understand the importance of architecture when shifting toward an Agile at scale model and who want to learn architecture skills
Enterprise Architects who want to stay relevant in an Agile at scale world and who need to learn new architecture skills for the digital age
Business managers and executives who need to learn the importance of the architecture discipline and who need to influence architecture decisions

Trademarks

ArchiMate®, DirecNet®, Making Standards Work®, Open O® logo, Open O and Check® Certification logo, OpenPegasus®, Platform 3.0®, The Open Group®, TOGAF®, UNIX®, UNIXWARE®, and the Open Brand X® logo are registered trademarks and Boundaryless Information Flow™, Build with Integrity Buy with Confidence™, Dependability Through Assuredness™, Digital Practitioner Body of Knowledge™, DPBoK™, EMMM™, FACE™, the FACE™ logo, IT4IT™, the IT4IT™ logo, O-DEF™, O-HERA™, O-PAS™, Open FAIR™, Open Platform 3.0™, Open Process Automation™, Open Subsurface Data Universe™, Open Trusted Technology Provider™, O-SDU™, Sensor Integration Simplified™, SOSA™, and the SOSA™ logo are trademarks of The Open Group.

Airbnb™ is a trademark of Airbnb, Inc.

Amazon™, Amazon Prime™, and Prime Now™ are trademarks of Amazon.com.

CMMI® and PCMM® are registered trademarks of CMMI Institute LLC, USA.

COBIT® is a registered trademark of ISACA and the IT Governance Institute.

eBay® is a registered trademark of eBay, Inc.

Etsy® is a registered trademark of Etsy, Inc. in the US and/or other countries.

FaceBook® is a registered trademark of Facebook, Inc.

Ford™ is a trademark of Ford Motor Company.

General Electric ® is a registered trademark of General Electric Company.

Google® is a registered trademark of Google LLC.

ISACA® is a registered trademark of the Information Systems Audit and Control Association.

Java® is a registered trademark of Oracle and/or its affiliates.

MITRE® is a registered trademark of The MITRE Corporation.

MQSeries® is a registered trademark of IBM in the United States.

Netflix® is a registered trademark of Netflix, Inc.

PepsiCo® is a registered trademark of PepsiCo, Inc.

Spotify™ is a trademark of Spotify AB.

Toyota® is a registered trademark of Toyota Motor Company.

Uber™ is a trademark of Uber Technologies, Inc.

Walmart® is a registered trademark of Walmart.

All other brands, company, and product names are used for identification purposes only and may be trademarks that are the sole property of their respective owners.

Acknowledgments

The Open Group gratefully acknowledges the contribution of the following people in the development of this document:

Miguel de Andrade
Paddy Fagan
Jérémie Grodziski
Peter Haviland
Frédéric Le
Jean-Pierre Le Cam
Antoine Lonjon
Eamonn Moriarty

Referenced Documents

The following documents are referenced in this Snapshot.

(Please note that the links below are good at the time of writing but cannot be guaranteed for the future.)

Normative References

This document does not contain any normative references at the time of publication. These may be added in a future release.

Informative References

[Agile Manifesto] Manifesto for Agile Software Development, 2001: https://agilemanifesto.org/
[Bain 2014] Winning Operating Models that Convert Strategy to Results, Marcia Blenko, Eric Garton, Ludovica Mottura, Bain & Company, December 2014: https://www.bain.com/insights/winning-operating-models-that-convert-strategy-to-results/, retrieved June 6, 2019
[Baiyere 2017] Desining for Digital – Lessons from Spotify™, Abayomi Baiyere, Jeanne W. Ross, Ina M. Sebastien, Research Briefing, MIT Sloan CISR, December 2017
[Ballé 2019] Lean is a Product-Driven Strategy, Michael Ballé, April 2019, retrieved July 4, 2019: https://www.lean.org/LeanPost/Posting.cfm?LeanPostId=1024
[Chheda 2017] Putting Customer Experience at the Heart of Next-generation Operating Models, Shital Chheda, Ewan Duncan, Stefan Roggenhofer, Digital McKinsey, 2017
[Christensen 2016] Know Your Customers’ “Jobs-to-be-done”, Clayton M. Christensen, Taddy Hall, Karen Dillon, David S. Duncan, Harvard Business Review, September 2016 Issue
[COBIT 5] ISACA®: http://www.isaca.org/COBIT/Pages/default.aspx
[Crawley 2016] Systems Architecture, Edward Crawley, Bruce Cameron, Daniel Selva, Global Edition, Pearson Education Limited, 2016
[DoC 2007] US Department of Commerce, Concept Overview – ACMM: http://www.aprocessgroup.com/atpl/togaf9.arch_capab/guidances/concepts/overview_acmm_62D9B651.html, retrieved July 3, 2019
[Erder 2016] Continuous Architecture: Sustainable Architecture in an Agile and Cloud-Centric World, Murat Erder, Pierre Pureur, Elsevier, 2016
[Evans 2003] Domain-Driven Design: Tackling Complexity in the Heart of Software, Eric Evans, Addison-Wesley Professional, 2003
[Evans 2013] Getting Started with DDD when Surrounded by Legacy Systems, Eric Evans, 2013, retrieved April 24, 2019: http://domainlanguage.com/wp-content/uploads/2016/04/GettingStartedWithDDDWhenSurroundedByLegacySystemsV1.pdf
[Ford 2017] Building Evolutionary Architectures, Neal Ford, Rebecca Parsons, Patrick Kua, O′Reilly, 2017
[Forsgren 2018] Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations, Forsgren, Humble, Kim, Trade Select, 2018
[Fowler 2004] StranglerFigApplication, Martin Fowler, June 2004: https://www.martinfowler.com/bliki/StranglerApplication.html
[Fowler 2013] Continuous Delivery, Martin Fowler, May 2013: https://martinfowler.com/bliki/ContinuousDelivery.html
[Fowler 2014 Sacrificial Architecture, Martin Fowler 2014: https://martinfowler.com/bliki/SacrificialArchitecture.html
[Fowler 2015] Making Architecture Matter – Martin Fowler Keynote, youtube.com, posted by O’Reilly Media, July 23, 2015: https://www.youtube.com/watch?v=DngAZyWMGR0
[Fowler 2019] Refactoring: Improving the Design of Existing Code, Martin Fowler, Addison-Wesley, 2019
[George 2004] Conquering Complexity in your Business: How Walmart®, Toyota®, and Other Top Companies are Breaking through the Ceiling on Profits and Growth, Michael L. George, Stephen A. Wilson, McGraw Hill, 2004
[Gof 1994] Design Patterns: Elements of Reusable Object-Oriented Software, Vlissides, Helm, Gamma, Johnson, Addison-Wesley, 1994
[Hammer 1990 Reengineering Work: Don’t Automate, Obliterate, Michael Hammer, Harvard Business Review, July-August 1990 Issue
[Hammer 1993] Re-engineering the Corporation: A Manifesto for Business Revolution, Michael Hammer, James A. Champy, 1993
[Harrington 1991] Business Process Improvement, H. James Harrington, McGraw-Hill, 1991
[Hayler 2006] Six Sigma for Financial Services, Rowland Hayler, Michael D. Nichols, McGraw-Hill, 2006
[HBR 2013] IT Governance is Killing Innovation, Andrew Horne, Brian Foster, Harvard Business Review, 2013: https://hbr.org/2013/08/it-governance-is-killing-innov
[HBR 2017] How Spotify™ Balances Employee Autonomy and Accountability, Michael Mankins, Eric Garton, Harvard Business Review, 2017: https://hbr.org/2017/02/how-spotify-balances-employee-autonomy-and-accountability
[Hellman 2018] Delivering Customer Outcomes versus Selling Products: The GE Digital Case, Karl Hellman, Frank M. Grillo, The Marketing Journal, June 2018: http://www.marketingjournal.org/delivering-customer-outcomes-versus-selling-products-the-ge-digital-case-study-by-frank-m-grillo-and-karl-hellman/, retrieved July 2, 2018
[Hodgson 2017] Feature Toggles (aka Feature Flags), Hodgson, martinfowler.com, posted by Pete Hodgson, October 2017
[Holland 2014] Complexity: A Very Short Introduction, John H. Holland, Oxford University Press, 2014
Signals and Boundaries: Building Blocks for Complex Adaptive Systems, John H. Holland, The MIT Press, 2012
[Humble 2010] Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation, Humble, Farley, Addison-Wesley, 2010
[ISO/IEC 38500] Information Technology – Governance of IT for the Organization: https://www.iso.org/standard/62816.html
[Johnson 2008] Reinventing your Business Model, Mark W. Johnson, Clayton M. Christensen, Henning Kagermann, Harvard Business Review, December 2008
[Kane 2019] How Digital Leadership Is(n’t) Different, MIT Sloan Management Review, Gerald C. Kane, Anh Nguyen Phillips, Jonathan Copulsky, Garth Andrus, Spring 2019 Issue, March 12, 2019
[Kersten 2018] Project to Product: How to Survive and Thrive in the Age of Digital Disruption with the Flow Framework, Mik Kersten, IT Revolution, 2018
[Kesler 2008] How Coke’s CEO Aligned Strategy and People to Recharge Growth: An Interview with Neville Isdell, G. Kesler, People & Strategy, 31(2), 18-21, 2008
[Kim 2013] The Phoenix Project: A Novel about IT, DevOps, and Helping your Business Win, Kim, Behr, IT Revolution Press, 2013
[Kim 2016] The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations, Kim, Debois, Willis, Trade Select, 2016
[Lambin 2000] Market-driven Management: Strategic and Operational Marketing, Jean-Jacques Lambin, Macmillan, 2000
[Lancelott 2017] Operating Model Canvas: Aligning Operations and Organization with Strategy, Mark Lancelott, Mikel Gutierrez, Andrew Campbell, Van Haren Publishing, 2017
[Leffingwell 2011] Agile Software Requirements, Dean Leffingwell, Pearson Education, 2011
[Luna 2014] State of the Art of Agile Governance: A Systematic Review, Alexandre J.H. de O.Luna, Philippe Kruchten, Marcello L.G. do E. Pedrosa, Humberto R. de Almeida Neto, Hermano P. de Moura, International Journal of Computer Science & Information Technology (IJCSIT) Vol 6, No 5, October 2014: https://arxiv.org/ftp/arxiv/papers/1411/1411.1922.pdf
[McKinsey 2013] Mastering the Building Blocks of Strategy, Chris Bradley, Angus Dawson, Antoine Montard, October 2013: https://www.mckinsey.com/business-functions/strategy-and-corporate-finance/our-insights/mastering-the-building-blocks-of-strategy
[McKinsey 2016] An Operating Model for Company-wide Agile Development, Santiago Comella-Dorda, Swati Lohiya, Gerard Speksnijder, May 2016: https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/an-operating-model-for-company-wide-agile-development
[Merriam-Webster] https://www.merriam-webster.com/
[MITRE] Systems Engineering Guide, The MITRE® Corporation: https://www.mitre.org/publications/systems-engineering-guide/se-lifecycle-building-blocks/system-architecture/architectural-patterns
[Morgan 2019] Designing the Future: How Ford™, Toyota®, and Other World-Class Organizations Use Lean Product Development to Drive Innovation and Transform their Business, James M. Morgan, Jeffery K. Liker, the Lean Enterprise Institute, McGraw-Hill Education, 2019
[Nygard 2015] Documenting Architecture Decisions, Michael Nygard Blog, 2015: http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions
[Oosterwal 2010] The Lean Machine: How Harley-Davidson Drove Top-Line Growth and Profitability with Revolutionary Lean Product Development, Dantar P. Oosterwal, AMACOM, 2010
[Osterwalder 2010] Business Model Generation: A Handbook for Visionaries, Game Changers, and Challengers, Alexander Osterwalder, Yves Pigneur, Wiley, 2010
[Parker 2016] Platform Revolution: How Networked Markets are Transforming the Economy – and How to Make them Work for You, Geoffrey G. Parker, Marshall W. Van Alstyne, Sangeet Paul Choudary, W.W. Norton & Company, 2016
[Parnas 1972] On the Criteria to be Used in Decomposing Systems into Modules, S.L. Parnas, Carnegie-Mellon University, 1972
[Patton 2014] User Story Mapping, Jeff Patton, O’Reilly Media, Inc., 2014
[Porter 2004] Competitive Advantage: Creating and Sustaining Superior Performance, Michael E. Porter, Free Press, 2004
[PWC 2013] The Financial Conduct Authority (FCA) and the Focus on Product Governance, PWC, Autumn 2013: https://pwc.blogs.com/files/the-financial-conduct-authority-fca-and-the-focus-on-product-governance.pdf
[Ries 2011] The Lean Startup: How Constant Innovation Creates Radically Successful Businesses, Eric Ries, Portfolio Penguin, 2011
[Rigby 2018] Agile at Scale, Darrell K. Rigby et al., Harvard Business Review (http://hbr.org), May-June 2018 Issue
[Ross 2018] Goodbye Structure; Hello Accountability, Jeanne W. Ross, MIT Sloan Management Review, June 27, 2018
[Ross 2018] Tech Republic interview retrieved June 21, 2018: https://www.techrepublic.com/article/how-to-create-a-vision-for-digital-transformation-at-your-company/
[Ross 2018] Let Your Digital Strategy Emerge, Jeanne Ross, MIT Sloan Managment Review, October 2018
[Ross 2019] Designed for Digital: How to Architect your Business for Sustained Success, Jeanne W. Ross, Cynthia M. Beath, Martin Mocker, MIT Press, 2019
[Rossman 2019] Think Like Amazon™: 50 1/2 Ideas to Become a Digital Leader, John Rossman, McGraw-Hill, 2019
[Rozanski 2005] Software Systems Architecture: Working with Stakeholders using Viewpoints and Perspectives, Rozanski, Woods, Addison-Wesley, 2005
[SEI1993] Capability Maturity Model for Software, Version 1.1, Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, Charles V. Weber, SEI Technical Report, CMU/SEI-93-TR-024, ESC-TR-93-177, February 1993
[SEI 1995] People Capability Maturity Model, Mark C. Paulk, Bill Curtis, Mary Beth Chrissis, Charles V. Weber, CMU SEI-95-MM-02, September 1995
[SEI 2018] https://www.sei.cmu.edu/news-events/news/article.cfm?assetid=528380, retrieved July 3, 2019
[Senge 1994] The Fifth Discipline Fieldbook: Strategies and Tools for Building a Learning Organization, Peter M. Senge, Crown Business, 1994
[Shoup 2014] From the Monolith to Micro-services, slideshare.net, posted by Randy Shoup, October 2014: https://www.slideshare.net/RandyShoup/monoliths-migrations-and-microservices
[Simon 2018] Liquid Software: How to Achieve Trusted Continuous Updates in the DevOps World, Simon, Landman, Sadogursky, JFrog, 2018
[Sobek 1999] Toyota®'s Principles of Set-Based Concurrent Engineering, Durward K. Sobek II, Allen C. Ward, Jeffrey K. Liker, MIT Sloan Management Review, January 15, 1999
[Stanford 2010] An Introduction to Design Thinking – Process Guide, Institute of Design, Stanford
[Stevenson 2004] An Agile Approach to a Legacy System, Chris Stevenson, Andy Pols, 2004, retrieved April 23, 2019: http://cdn.pols.co.uk/papers/agile-approach-to-legacy-systems.pdf
[TOGAF 2018] The TOGAF® Standard, Version 9.2, a standard of The Open Group (C192), published by The Open Group, April 2018; refer to: http://www.opengroup.org/togaf
[Ton 2014] The Good Jobs Strategy: How the Smartest Companies Invest in Employees to Lower Costs and Boost Profits, Zeynep Ton, Amazon, 2014
[Vaughn 2013] Implementing Domain-Driven Design, Vaughn Vernon, Addison-Wesley Professional, 2013
[Ward 2014] Lean Product and Process Development, Second Edition, Allen C. Ward, Durward K. Sobek II, Lean Enterprise Institute, Inc., 2014
[Watson 2005] Design and Execution of a Collaborative Business Strategy, Journal For Quality & Participation, 2005
[Wharton 2018] Introduction to Marketing: https://www.coursera.org/learn/wharton-marketing
[Wind 2016] Beyond Advertising: Creating Value through All Customer Touchpoints, Yoram Jerry Wind, Catharine Findiesen Hays, John Wiley & Sons, 2016
[Womack 2007] The Machine that Changed the World, James P. Womack, Daniel T. Jones, Daniel Roos, Simon & Schuster, 2007

1. Introduction

1.1. Objective

This Snapshot document is a draft of The Open Group Agile Architecture Framework™ Standard. The objective of this document is to cover both Digital Transformation of the enterprise, together with Agile Transformation of the enterprise.

This Snapshot document is intended to make public the direction and thinking about the path we are taking in the development of The Open Group Agile Architecture Framework Standard. We invite your feedback and guidance. To provide feedback on this document, please send comments by email to ogspecs-snapshot-feedback@opengroup.org no later than January 15, 2020.

1.2. Overview

This Snapshot documents a proposal for a standard that covers both Digital Transformation of the enterprise, together with Agile Transformation of the enterprise. The scope of the Snapshot covers key concepts and includes the topics below:

Continuous architecture in an Agile world
Designing business models
Discovering and analyzing customer insights
Architecting digital platforms
Developing digital offerings
Architecting an adaptive operating model
Designing an accountability framework
Architecting the enterprise’s Digital Transformation
Defining a Minimum Viable Architecture (MVA)
Leveraging event-driven architecture to design modular systems and modernize legacy systems

Additional "how-to" architecture material will be developed in the next version.

1.3. Conformance

This is a Snapshot, not an approved standard. Do not specify or claim conformance to it.

1.4. Normative References

The following standards contain provisions which, through references in this standard, constitute provisions of the O-AAF Standard. At the time of publication, the editions indicated were valid.

1.5. Terminology

For the purposes of this document, the following terminology definitions apply:

Can: Describes a possible feature or behavior available to the user or application.
May: Describes a feature or behavior that is optional. To avoid ambiguity, the opposite of “may” is expressed as “need not”, instead of “may not”.
Shall: Describes a feature or behavior that is a requirement. To avoid ambiguity, do not use “must” as an alternative to “shall”.
Shall not: Describes a feature or behavior that is an absolute prohibition.
Should: Describes a feature or behavior that is recommended but not required.
Will: Same meaning as “shall”; “shall” is the preferred term.

1.6. Future Directions

The Open Group intends to progress this activity towards publication of a standard of The Open Group.

2. Definitions

For the purposes of this document, the following terms and definitions apply. Merriam-Webster’s Collegiate Dictionary should be referenced for terms not defined in this section.

Allowable Lead Time

Time available between starting a product development initiative or process and finishing it in order to satisfy customers.

Architectural Runway

Ability to implement new features without excessive refactoring (Source: Leffingwell 2011)
Consists of the existing code, components, and technical infrastructure needed to implement near-term features without excessive redesign and delay (Source: Scaled Agile, Inc. https://www.scaledagile.com/)

Architecture – system engineering context

The embodiment of concept, and the allocation of physical/informational function (process) to elements of form (objects) and definition of structural interfaces among the objects.

Catchball

A dialog between senior managers and project teams about the resources and time both available and needed to achieve the targets.

Note	Once the major goals are set, planning should become a top-down and bottom-up process involving a dialog. This dialog is often called catchball (or nemawashi) as ideas are tossed back and forth like a ball. (Source: https://www.lean.org/lexicon/strategy-deployment)

Continuous Architecture

An architecture with no end state and that is designed to evolve to support the evolving needs of the digital enterprise.

Customer Journey

Series of interactions between a customer and a company that occur as the customer pursues a specific goal. The journey may not conform to the company’s intentions. (Source: https://www.forrester.com/Customer-Journey)

Design Thinking

A methodology for creative problem solving that begins with understanding unmet customer needs. (Source: https://dschool.stanford.edu/resources/getting-started-with-design-thinking and https://executive-ed.mit.edu/mastering-design-thinking)

Digital Platform

Software system composed of application and infrastructure components that can be rapidly reconfigured using DevOps and Cloud Native Computing.

Epic – classical Agile

Large user story that cannot be delivered as defined within a single iteration or is large enough that it can be split into smaller user stories. There is no standard form to represent epics. Some teams use the familiar user story formats (as a, I want, so that or in order to, as a, I want) while other teams represent them with a short phrase. (Source: http://www.agilealliance.org)

Epic – scaled Agile

Highest-level expression of a customer need. Development initiatives that are intended to deliver the value of an investment theme and are identified, prioritized, estimated, and maintained in the portfolio backlog. (Source: Leffingwell 2011)

Evolutionary Architecture

An architecture that supports guided, incremental change across multiple dimensions. (Source: Ford 2017)

Evolvability

A meta-non-functional requirement that aims to prevent other architecture requirements, in particular the non-functional ones, from degrading over time.

Job-to-be-done

What the customer hopes to accomplish. “Job” is shorthand for what an individual really seeks to accomplish in a given circumstance. (Source: Christensen 2016)

Lean Value Stream

All of the actions, both value-creating and non-value-creating, required to bring a product from concept to launch (also known as the development value stream) and from order to delivery (also known as the operational value stream). These include actions to process information from the customer and actions to transform the product on its way to the customer. (Source: Lean Enterprise Institute https://www.lean.org/)

Lead Time

Time between the initiation and completion of a process.

Modularization

Design decisions which must be made before the work on independent modules can begin. Every module is characterized by its knowledge of a design decision which it hides from all others. Its interface or definition is chosen to reveal as little as possible about its inner workings. (Source: Parnas 1972)

Persona

Fictional character which is created based upon research in order to represent the different user types that might use your service, product, site, or brand in a similar way. (Source: https://www.interaction-design.org/literature/article/personas-why-and-how-you-should-use-them)

Platform Business Model

Business model that is based on the two-sided market theory.

Process

Any activity or group of activities that takes an input, adds value to it, and provides an output to an internal or external customer. There is no product and/or service without a process. Likewise, there is no process without a product or a service. (Source: Harrington 1991)

Product

Something a value stream produces. A product has a lifecycle which is comprised of a product and process development value stream and a production value stream. Broadly speaking, a product can refer to a product or a service. A service will be referred to as a product if its delivery is industrialized or repeatable.

Product-centricity

Shift from temporary organizational structures – projects – to permanent ones. A product-centric organization is composed of cross-functional Agile teams which are responsible for developing products or services, and also operating or running them. The DevOps principle "you build it, you run it" is core to product-centricity.

Refactoring

The process of changing a software system in a way that does not alter the external behavior of the code yet improves its internal structure. It is a disciplined way to clean up code that minimizes the chances of introducing bugs. (Source: Fowler 2019)

Service – business context

Activity performed on behalf of a customer that results in a desired outcome.

Service – software context

A function or activity that produces an output or an outcome. Well-designed services are self-contained and provide APIs that shield their consumers from their implementation details.

Story

Conversations about working together to arrive at a best solution to a problem understood by both parties. A simple story template follows:

As a [type of user]
I want to [do something]
So that I can [get some benefit]

(Source: Patton 2014)

Story Map: Used for breaking down big stories as they are told. (Source: Patton 2014)
System: Perceived whole whose elements "hang together" because they continually affect each other over time and operate toward a common purpose. (Source: Senge 1994)

Part 1: Agile Architecture Fundamentals

The digital enterprise is shaped by people who work in the context of an enterprise’s organization and culture. People working within organizational entities formulate and implement strategy, design business and operating models, and develop and run products and services.

The more Agile the enterprise, the faster the learning cycles. Fast learning cycles translate into shorter time-to-market and higher quality. Traditional "command and control" organizations get in the way by slowing down learning cycles.

The Agile Transformation of the enterprise and its culture is becoming a prerequisite of effective Digital Transformation. Digital leaders and their teams need to steer the Digital/Agile transformation. Figure 1, “Architecting the Dual Digital/Agile Transformation”, inspired from an Escher painting, shows the recursive nature of this dual transformation.

Figure 1. Architecting the Dual Digital/Agile Transformation

This document covers both the Digital Transformation of the enterprise and its Agile Transformation.

Digital is about defining a business strategy that is inspired by the capabilities of digital technologies [Ross 2018]. Digital technologies are game-changing in helping to solve customer problems in ways that were not possible before.

The scope of this document is the enterprise as a whole and not just the information system that supports it. Architecting the enterprise brings together all knowledge domains required to innovate business and operating models and create new digital offerings.

Cross-functional product teams break down organizational silos by bringing together marketing, software engineering, sales, business operations, compliance, risk, or IT operations. Each of the disciplines comes with its established body of knowledge and specialized language. The same word used in different contexts is likely to have different meanings.

The O-AAF approach takes the view it is counter-productive and reductionist to impose a unified "architecturally-driven" language to all stakeholders. The challenge is to bring together each domain of the enterprise with its own body of knowledge:

Marketing is bringing customer-centricity into the field along with new disciplines such as design thinking
IT is bringing flexible and adaptive new software technologies and has popularized new Agile ways of working
Operations is looking to leverage capabilities provided by automation and software platforms to develop operational excellence
Executives are looking for innovative business models that generate profitable revenue growth
Compliance departments need to ensure that privacy and security regulations are applied throughout the organization

The role of architecture is to provide an integrative view of these different disciplines. Integrative doesn’t mean unified. There will not be one method to rule them all. This document recognizes the value of concepts and tools brought by each discipline; for example, design thinking, job-to-be-done, event-driven or domain-driven design, etc. It provides a modular framework that architects and practitioners can use to help shape Digital Transformation endeavors. It includes:

A systems thinking view of architecture formulation that combines intentional and emergent design
Modularity and loose-coupling to bring agility to the organization and its software systems
An integrating dictionary to bridge the concepts and vocabulary of each discipline
An outside-in framework that starts from clients' pains and expected gains to develop digital offerings that generate profitable growth

We expect the materials in this framework to evolve at different frequencies but continuously as the state-of-the-art in each of these disciplines evolves and matures.

The value of an open standard in this area is that the contribution of (many organizations' and practitioners') practical experience and expertise across industries and domains can be distilled into a continuously updated, open standard as published by The Open Group.

This means organizations do not have to carry a singular burden of integration of these disciplines, but can work with peers and competitors in its delivery, as is the ambition of this framework.

3. A Dual Transformation

There are two major drivers of Digital/Agile Transformation, as shown in Figure 2, “The O-AAF Big Picture”:

Customer experience which drives Digital Transformation
The project-to-product shift which epitomizes Agile Transformation

The enterprise’s purpose states the strategic intent. The Merriam-Webster dictionary [Merriam-Webster] defines purpose as: "something set up as an object or end to be attained : intention".

Figure 2. The O-AAF Big Picture

3.1. Strategy and Governance

The Why? questions start a strategy formulation process composed of several building blocks [McKinsey 2013]:

Frame: what are the right questions?
Diagnose: where and why do we make money?
Forecast: what futures do we need to plan for?
Search: what are the potential pathways to winning?
Choose: what is our integrated strategy?
Commit: how do we drive change?
Evolve: how do we adapt and learn?

Strategy evolves from big up-front design toward a learning process that involves all organizational levels. The strategic process itself must be adaptive and capable of responding to changes in the business system. The people who are charged with executing the strategic plan should participate in the planning process itself [Watson 2005].

The Japanese have coined the term catchball which refers to a strategic planning approach that incorporates structured group dialog at all levels of the organization. The key benefits of the catchball process are to drive alignment among all silos and hierarchical levels, and to help shape the business as a coherent whole aligned on its core objectives.

This view of strategy formulation and deployment goes hand-in-hand with a flatter and cross-functional organization. It also requires the enterprise’s governance model to evolve. Agile teams are granted more autonomy but not at the expense of effective alignment. The catchball process provides an alignment mechanism that is compatible with autonomy because it works best in a non-control and command culture.

The "acid" test of a successful business strategy is the increased capability of the enterprise to develop offerings that customers are willing to pay for and that deliver the right experience.

3.2. Autonomous Teams

In his book “Think like Amazon™” [Rossman 2019], John Rossman writes that digital is about two things: "speed and agility – externally to your customers and market and internally within your organization".

Team autonomy is a prerequisite to speed and agility. Why? If the coordination work your teams have to perform is too high, it will slow them down, which results in a slower pace and additional rigidity.

John Rossman claims that speed is: "about moving in one direction very efficiently, very precisely. Operational excellence at scale is the business equivalent to speed".

Operational excellence implies effective coordination between autonomous teams. Freedom needs to be combined with responsibility. Accountability relations that link teams bring predictability. For example, at Amazon you are expected to identify and tenaciously manage every potential business-derailing dependency you have. Peer pressure is more powerful than pressure from managers who do not have the bandwidth to micro-manage inter-team dependencies.

3.3. Customer Experience

In today’s digital world, the quality of customer experience is becoming a critical success factor. Consumers and B2B customers have expectations that are shaped by the Internet giants. For example, the level of service provided by Amazon Prime™ is becoming the new normal of the retail industry. Because most Amazon competitors do not have the resources and scale to provide a similar service, the logistic capability becomes a decisive competitive advantage.

This has forced large retailers such as Walmart® to raise their investment level to develop new digital capabilities on par with those of Amazon. Recently a startup has developed a service to provide state-of-the-art logistic services to smaller retailers which do not have the skills nor the scale to compete with Amazon’s logistic capabilities.

Amazon, with the purchase of Whole Foods, adds stores to its distribution strategy. The future of retail may be "click and mortar" executed in the context of an omnichannel business model.

Customer experience starts with the discovery of customer needs. Just asking clients what they need is not going to help create a superior customer experience. Discovering the hidden or untold needs of clients is key to success. The analysis of the customer’s job-to-be-done or the mapping of the customer journey are two examples of practices on which digital enterprises are relying. The rapid adoption of design thinking is helping enterprises better understand the spoken and unspoken needs of their clients.

Design thinking is a human-centered approach that positions empathy as the centerpiece of the design discipline. It provides a process that helps design better products, services, or processes. The design thinking process has five steps: Empathize, Define, Ideate, Prototype, and Test, which are not intended to be applied in a linear manner.

Once the problem space is well understood, the enterprise can architect a product and service system that will satisfy its clients. Though the product concept is central to Agile, the Agile literature gives few if any product definitions.

3.4. Product-Centricity

The terms "product" and "service" have different meanings depending of their context of use. The common meanings of product are:

(1) Something produced
(2) Something (such as a service) that is marketed or sold as a commodity (Source: [Merriam-Webster]

This document defines a product as something a value stream produces. A product has a lifecycle which is comprised of a product and process development value stream and a production value stream. Broadly speaking a product can refer to a product or a service. A service will be referred to as a product if its delivery is industrialized or repeatable.

In an Agile context, product-centricity refers to the shift from temporary organizational structures – projects – to permanent ones. A product-centric organization is composed of cross-functional Agile teams which are responsible for developing products or services and operating or running them. The DevOps principle "you build it, you run it" is core to product-centricity.

When digging into the subtleties of the various product and service meanings a few key ideas emerge:

Some marketing experts claim customers hire services and products to get specific jobs done
The service-dominant logic viewpoint defines service as the application of competencies for the benefit of another entity: SD logic claims that value is always co-created through interactions and views all economies as service economies
The servitization of products refers to industries that use their products to sell "outcome as a service" rather than a one-off sale
Highly personalized services are leveraging technology to productize a few stages of the serving value chain to improve operational efficiency

Using the word "offering" helps talk about product, service, or a combination in a generic way. This avoids some of the semantic problems linked to the variety of meanings carried by the product and service terminology.

The following definition borrowed from the Lean body of knowledge summarizes how the ideas above can be combined to guide the design of digital offerings: "A product is an object, material or digital, that allows its owner to solve a problem autonomously. There are always limits to that of course, as a product will need consumables, services, and maybe even training to use it more efficiently. But the ideal product is one that I use effortlessly, grasping intuitively how it works without needing to talk to anyone else to make it work or maintain it. I’m autonomous." [Ballé 2019]

Agile shifts from project-oriented management to product-oriented management. A true Agile team does not deliver a project but a product! Product-centricity drives the Agile organization: "Moving from project-oriented management to product-oriented management is critical for connecting software delivery to the business" [Kersten 2018]. The product-based focus helps create stable Agile teams that develop an end-to-end perspective and are staffed with stable resources.

The Agile enterprise architects its product and service system in a modular way to minimize inter-team dependencies. The lower the dependencies, the faster products are delivered to the market. Capability modeling and Domain Driven Design (DDD) help guide the modular decomposition of the enterprise and its software systems.

The Inverse Conway Manoeuvre suggests modeling Agile teams' structure to map the intentional system architecture’s structure. When the teams' architecture mirrors the software system’s architecture, it reinforces the development of an end-to-end perspective that improves effectiveness and efficiency.

3.5. Solution Space

The term "digital offerings" denotes solutions that bundle products and services that help customers do their jobs. Time-to-market as well as quality are improved when the enterprise creates digital platforms that are made of reusable software components. The development of innovative business models helps enterprises differentiate their value proposition. Competing on business model is more effective than just competing on products.

The clear distinction between problem space and solution space facilitates innovation because it opens the space of possible solutions. More than one solution can solve a problem that has been well formulated.

4. What is Architecture?

Note	This section borrows from the systems thinking body of knowledge [Crawley 2016] to provide actionable guidelines to help Agile practitioners design and architect more modular systems.

This document adopts the architecture’s systems thinking perspective because:

Systems thinking can be applied to any field; for example, biology, engineering, software, or human organizations
It provides a comprehensive body of knowledge that helps understand and model complexity

The digital enterprise requires bridging fields such as marketing, finance, software engineering, or operations into a coherent whole. Systems thinking provides the glue that helps integrate such diverse fields into an actionable framework.

Digital enterprises' adoption of Agile at scale results into the creation of a great number of autonomous teams. When several hundred Agile teams run in parallel at a fast pace, it may be hard to cope with the resulting complexity.

System theory provides key insights and methods to understand the emergent properties of complex systems. It has developed models that can help the practitioner steer their evolution.

4.1. Complex Systems

A complex system has many elements or entities that are highly interrelated, interconnected, or interwoven [Crawley 2016].

John Holland [Holland 2014] characterizes the behavior of complex systems by:

Self-organization
Chaotic behavior when small changes in initial conditions produce large change later
Adaptive behavior, when interacting agents modify their strategies in diverse ways as experience accumulates

When interaction patterns emerge, it influences the behavior of interacting agents and therefore impacts the system itself. A complex system has emergent properties that cannot be understood nor predicted by just breaking it down into its elements. This lack of predictability makes steering the evolution of complex systems difficult. A complex system can evolve toward self-organization or chaos should it fail to maintain some kind of invariant state in the face of perturbations.

An Agile at scale enterprise that runs hundreds of Agile teams in parallel is a complex system. A fine line must be walked between self-organization and chaos. A research paper from MIT Sloan describes how Spotify™ balances autonomy with alignment [Baiyere 2017]. Spotify avoids chaos while it does not slow down innovation. The definition of shared missions which are key strategic objectives helps align Agile teams.

John Holland suggests steering complex systems by modifying their signal/boundary hierarchies. The components of complex systems are bounded sub-systems or agents that adapt or learn when they interact. Defining the boundary of sub-systems and their rules of interaction has a major influence on the evolution of a system. For example, establishing a taxonomy of Agile teams and clarifying their dependencies impact coordination and cooperation. Similarly, Domain-Driven Design (DDD) uses the bounded context concept to decompose complex software systems in a modular way [Evans 2003].

4.2. A System Engineering Definition of Architecture

Ed Crawley defines architecture as: "the embodiment of concept, and the allocation of physical/informational functions (processes) to elements of form (objects) and definition of structural interfaces among the objects".

It consists of:

Function
Related by concept
To form

Figure 3. System Engineering View of Architecture

Form is what a system is; it is the physical or informational embodiment that exists or has the potential to exist. Form has shape, configuration, arrangement, or layout. Over some period of time, form keeps its identity though it can change state. Form is not function, but form is necessary to deliver function. For example, in a software system, form defines the sub-systems and entities that compose it. In an enterprise, form defines the organization and the products that the organization produces.

Function is what a system does; it is the activities, operations, and transformations that cause, create, or contribute to performance. For example, the function of an Agile team (e.g., a tribe) could be to develop a payment product and the sharding function is to distribute data over a network. Emergence occurs in the functional domain.

A concept is a product or system vision, idea, or mental image which maps function to form. In the sharding example, the distributed computing concept of master-slave replication defines how to distribute data while keeping it consistent.

Upstream Influences

The design of an architecture is influenced by many factors such as the enterprise’s business strategy, compliance requirements, the needs of clients and users, or the behavior of competitors. When these upstream influences are not dealt with, ambiguity is likely to jeopardize the design of the new product or system. One of the key architecture missions is to help reduce this ambiguity.

A well-architected system delivers value to its stakeholders within the limits imposed by regulators, competition, and technology.

Value and Architecture

As defined by L.D. Miles from General Electric®, value is the lowest price you must pay to provide a reliable function or service. Value is the ratio of function to cost.

The value of a product can be increased by maintaining its functions and reducing its cost or keeping the cost constant and increasing the functionality of the product.

Utility is defined by the client or user and is driven by the function delivered
Cost is driven by the design of the form
The relationship of function to form drives value

A good architecture delivers expected benefits at competitive cost.

Downstream Influences

Downstream influences extend the architecture’s scope to the full product or system lifecycle. For example, product architecture decisions ought to factor in manufacturing costs or ease-of-maintenance of products or systems. When architecting a product, a product line strategy can influence architecture decisions.

4.3. Relationship to the TOGAF® Definition

ISO/IEC/IEEE 42010:2011 (https://www.iso.org/standard/50508.html) defines "architecture" as: "The fundamental concepts or properties of a system in its environment embodied in its elements, relationships, and in the principles of its design and evolution."

The emphasis is on form elements and relationships, and the definition does not give much guidance on function and concept.

The TOGAF standard embraces the ISO definition, but does not strictly adhere to it, adding the definition below:

"The structure of components, their inter-relationships, and the principles and guidelines governing their design and evolution over time."

The emphasis on form is expressed through a different vocabulary: the "structure of components and relationships" versus "elements and relationships".

The system engineering definition of architecture used in this document is fully compatible with the ISO and TOGAF definitions.

4.4. Evolution of Products and Systems versus Architecture Evolution

In an Agile world, products and systems evolve incrementally through rapid learning cycles. How stable an architecture remains during these iterations can translate into two questions:

How to design architectures that are resilient to product and system evolution?
How to evolve an architecture incrementally?

Chapter 5, Continuous Architectural Refactoring will describe how to design architectures that can evolve gracefully. It will explain the role played by refactoring to help meet the evolvability’s architecture quality.

A point to note when answering the first question is: what is the right balance between anticipating foreseeable change versus over engineering? Not all change can or should be anticipated.

There are cases when it is wise to make a product or system resilient to change. For example, using a strategy pattern that enables selecting a pricing algorithm at run time instead of implementing it at design time is useful when designing a trading system.

The second question raises the issue of the reversibility of architecture decisions. Hard to reverse architecture decisions make incremental architecture change more difficult. Chapter 9, Minimum Viable Architecture will cover this topic in more depth.

5. Continuous Architectural Refactoring

5.1. Introduction

In this section we discuss the topic of Continuous Architectural Refactoring, a topic of increasing importance in today’s Agile, DevOps-oriented software landscape.

However, to properly discuss the concept, we must first adequately define it, and every word in the term is important. We will briefly discuss each in turn (but not in order).

5.1.1. Refactoring

Our discussion will center around the concept of taking a system architecture and changing its structure over time. We may have many reasons for doing so: design debt or "cruft" which has inevitably accumulated; changes to our understanding of the important non-functional requirements; remedying suboptimal architectural decisions; changes to the environment; project pivots, etc. Whatever the reason, sometimes we need to change fundamental aspects of how our system is put together.

Before we continue, a note on the choice of the word "refactoring". Martin Fowler [Fowler 2019] would likely describe this topic as "Architectural Restructuring"; he uses the term "refactoring" to describe small, almost impact-free changes to the codebase of a system to improve its design. We decorate the term with the word "architectural" to make it obvious that we are describing larger-scale, structural system changes.

All of which leads us to the next question – what do actually we mean by "architecture" in this context?

5.1.2. Architectural

There may be as many definitions of "architecture" as there are software architects to define it. Ralph Johnson of the Gang of Four [GoF 1994] defined software architecture as: "the important stuff (whatever that is)". This deceptively obvious statement calls out the need for an architect to identify, analyze, and prioritize the non-functional requirements of a system. In this definition, the architecture could be viewed as a plan to implement these non-functional requirements. Ford [Ford 2017] gives a comprehensive list of such requirement types, or "-ilities". The TOGAF standard [TOGAF 2018] provides a more concrete description of architecture, namely: "the structure of components, their interrelationships, and the principles and guidelines governing their design and evolution over time".

This "evolvability" – the ability for architecture to be changed or evolved over time – is becoming critical. There are many reasons for this: the increasingly fast pace of the industry; adoption of Agile approaches at scale; the cloud-first nature of much new development; the failure of expensive, high-profile long-running projects, etc. System evolution has always been an important concept in architectural frameworks. Rozanski [Rozanski 2005] had an "evolution" perspective, the TOGAF standard has the concept of "Change Management". There is an increasing reluctance to worry up-front about five-year architecture plans or massive up-front architectural efforts, which is requiring organizations to consider building in "ease-of-change". This viewpoint is in harmony with that of Martin Fowler [Fowler 2015], who calls out that software architectures must address technical characteristics that are both important and hard to change.

5.1.3. Continuous

The industry has over the past few years revisited the "hard to change later" problem in a new light. Instead of looking at individual requirements from the perspective of how they will evolve in a system, what if "evolvability" was baked into the architecture as a first-class concept? Evolutionary architectures, as described by Ford [Ford 2017], have no end state. They are designed to evolve with an ever-changing software development ecosystem, and include built-in protections around important architectural characteristics &#8211' adding time and change as first-class architectural elements. Indeed, Ford [Ford 2017] describes such an architecture as one that "supports guided, incremental change across multiple dimensions". And it is this incremental nature of change that facilitates us making changes to our software architecture in a continuous manner, planning for such change from the outset, and having as part of our backlog items which reflect our desired architectural evolution.

5.2. Organization of this Section

The remainder of this section will discuss the key considerations in planning for continuous architectural refactoring of a software system; it answers the question of how we can set ourselves up to be in a position to continuously evolve our architectures in response to changing requirements, architectural debt, and other headwinds. We detail these under three headings:

Understanding and Guiding the Architecture
Creating the Right Technical Environment
Creating the Right Non-Technical Environment

Each sub-section covers a different aspect of the necessary prerequisites for continuous architectural refactoring. Taken together they offer a complete view of the enablers for successful continuous architectural refactoring.

5.3. Understanding and Guiding the Architecture

Before we can decide upon what technical and organizational mechanisms can be put in place to facilitate continuous refactoring, we must first understand the conditions under which we are operating. Once we have identified the business and technical constraints relevant to our system, we can then put in place structures that will allow us to evolve within those constraints. Fitness functions will allow us to actually test that our architecture is fit-for-purpose. Guardrails will contribute to the guidance referred to in the definition of architecture in Ford [Ford 2017], keeping our development teams from going astray in their system designs.

5.3.1. Constraints

Every organization operates under a range of constraints; they constrain the valid choices that can be made by a business in achieving its aims. They come in many flavors, including financial, cultural, technical, resource-related, regulatory, political, and time-based. The very nature of the word "constraint" implies a limiting, constricting force which will choke us of productivity and creativity, and it is human nature to try to dismiss them or rail against them. However, constraints need not be negative forces; they force us to describe our current reality, and provide guidance as to how that reality should shape our efforts. Individual constraints may be long-lived; others may be eliminated through effort; but to ignore any of them is folly.

Inevitably, some of these constraints will manifest in software as architectural constraints. Technical constraints may mandate an infrastructural topology (e.g., "organization A only deploys on Infrastructure as a Service (IaaS) vendor B’s offerings"), an architectural style of development (e.g., "organization C is a model-driven development software house"), or an integration requirement (e.g., "financial transactions are always handled by system D"). Financial and resource constraints can shape software development team members and their skill sets, in addition to imposing hardware and software limitations. Time-based constraints may manifest as software release cadences, which will influence development architectural choices. Regulatory constraints can have big impacts on development practices, deployment topology, and even whether development teams are allowed to continuously deploy into production.

When embarking upon a journey of continuous architectural refactoring, identification and documentation of such constraints is vital. As Rozanski [Rozanski 2005] sagely notes, one of the first jobs for an architect is to: "come up with the limits and constraints within which you will work and to ratify these with your stakeholders".

5.3.2. Fitness Functions

A frequent complaint about the discipline of software architecture is that it is all too easy for teams to regard it as an academic, rather abstract endeavor. Even with relatively mature development teams, whose architectural descriptions accurately describe how the system will implement the most important non-functional requirements, it has been difficult to demonstrate that the system actually does so. Even worse, as the nature and importance of these requirements change over time, it is easy for the architectural descriptions to lag behind, with the effect being that we no longer have a shared understanding of how the system will meet its non-functional requirements. If we do not know how to test that our architecture is meeting its goals, then how can we ever have confidence in it?

As an antidote to such problems, Ford [Ford 2017] introduces us to the deceptively simple concept of fitness functions. Fitness functions objectively assess whether the system is actually meeting its identified non-functional requirements; each fitness function tests a specific system characteristic.

For example, we could have a fitness function that measures the performance of a specific API call. Does the API complete in under one second at the 90th percentile? This question is far from abstract; it is an embodiment of a non-functional requirement that is testable. If evaluation of the fitness function fails, then this aspect of our system is failing a key non-functional requirement. This is not open to opinion or subjectivity; the results speak for themselves. To take the example further, imagine that one of our proposed architectural refactorings was to implement database replication to meet availability requirements.If we implemented this, and the "API performance" fitness function subsequently failed, then we know early in the development cycle that our architecture is no longer fit-for-purpose in this respect, and we can address the problem or pivot.

It follows, therefore, that fitness functions are key enablers of our goal to continuously restructure our architecture. They allow us to ensure that those system characteristics which need to remain constant over time actually do so. They reduce both our fear of breaking something inadvertently and the ability for us to show our stakeholders that we haven’t done so. They represent a physical, tangible manifestation of our constraints and architectural goals.

5.3.3. Guardrails

Another mechanism that organizations use to bake-in evolvability into their system architectures is the concept of architectural guardrails. As with their real-world roadside equivalents, software guardrails are designed to keep people from straying onto dangerous territory.

In real terms, guardrails represent a lightweight governance structure. They document how an organization typically "does" things – and how, by implication, development teams are expected to "do" similar things. For example, a guardrail may document not just the specific availability requirements for a new service, but also how the organization goes about meeting such requirements. Typically, guardrails are used in combination with an external oversight team – be this an architecture board, guild, or program office. Typically, the message from such oversight teams is simple: if you stick to the guardrails, you don’t need to justify your architectural choices – we will just approve them. However, in those situations where you could not abide by a guardrail, then we need to discuss it. If your reasoning is sound, then we may well agree with you and modify our guardrails, but we reserve the right to tell you to change your approach if there was no good reason not to abide by the guardrails.

The key to their power is that they are not mandates. They do not impose absolute bans on teams taking different approaches; rather they encourage creativity and collaboration, and encourage the evolution of the governance structure itself.

5.4. Creating the Right Technical Environment

Successful continuous architectural refactoring needs the development team to be empowered to iteratively make architectural changes. There are a number of key technical enablers for this which are discussed here:

Continuous delivery
Componentization

In addition, Agile development practices are a key enabler for continuous architectural refactoring. As described in Chapter 6, Architecting the Digital Enterprise, there are a number of practices which are promoted by Agile working. These practices allow continuous architectural refactoring to be successfully implemented; in particular the rapid iteration and experimentation, which allows architectural evolution to be readily incorporated into ongoing development activities.

5.4.1. Continuous Delivery

For some years now, the concept of continuous delivery has been key to a solid foundation for software development. Fowler simply defines it as: "a software development discipline where you build software in such a way that the software can be released to production at any time" [Fowler 2013]. To do this, he says, you need to continuously integrate your developed software, build them into executables, and test them via automated testing. Ideally, such testing is executed in an environment which is as close as possible to a production environment.

A seminal work on the topic Humble [Humble 2010] converted many software teams to the advantages of an Agile manifestation of configuration management, automated build, and continuous deployment. Most recently, Forsgren [Forsgren 2018] has statistically illustrated the advantages of continuous delivery – there is now no question but that its adoption will both help teams deploy on-demand, get continuous actionable feedback, and achieve one of the main principles of the Agile Manifesto [Agile Manifesto]: to "promote sustainable development". It is moreover difficult to achieve scalable continuous architectural refactoring without it.

Continuous integration and continuous delivery are important elements to support continuous architectural refactoring. Continuous integration and continuous delivery are often considered as a single concept, and in many cases are linked by a single implementation. However, this is not a requirement and for flexibility they will be discussed separately here.

Continuous integration is about developers' work being merged into a single branch frequently. Some source control tooling makes this the default, but irrespective of the technology choice it is possible to implement continuous integration with a combination of development practices and build process. One of the most important elements of continuous integration is the integration of automated testing into the build process, so that there is confidence in the quality of the code on the main branch at all times. The key benefit in terms of architectural refactoring is the removal of "long-running" branches, which mitigate against architectural change, but which extend the window of potential impact of a change until all branches have merged. In practice this can make it cumbersome for developers to manage the impact of architectural change, that it will prevent it from happening.

Continuous delivery is about being able to release at any time, which can be realized as releasing on every commit. It is important to note that in organizations with compliance, regulatory, or other mandatory checkpoints continuous delivery may not be about a release to production being fully automated. Rather, the aim of continuous delivery should be that as each change is integrated it should be possible to release that version, and in particular that the entire team is confident that it is releasable. The key benefit in terms of architectural refactoring is in empowering the developers to make architectural changes, knowing that the combination of continuous integration and continuous delivery will guarantee that the change is non-breaking in terms of functionality and deployment.

It is possible, and in many cases desirable, to evolve to have a continuous integration/delivery pipeline, rather than trying to take one step to a fully automated process. The key to this is to understand the required steps in the process, and work to automate them one at a time. It is also important to look at the larger environment and make the decision to find the right solution for your organization, even if that means that some manual checkpoints remain.

Finally, it is key here to take the advice of Humble [Humble 2010]: "in software, when something is painful, the way to reduce the pain is to do it more frequently, not less". Because building towards a continuous integration/delivery pipeline is hard, it is all the more important to do it, because if you don’t the effort to deliver it manually will be all the more limiting in your evolution.

5.4.1.1. Feature Toggles

Feature toggles (or feature flags) are an important mechanism in creating an environment to allow continuous architectural refactoring. They allow features to be developed and included on the main stream (see Section 5.4.1, “Continuous Delivery”), but without exposing them to end users. This gives the development team options to factor their work solely based on their needs.

In addition, as described by Kim [Kim 2016] the key enablers arising from the use of feature toggles are the ability to:

Roll back easily
Gracefully degrade performance
Increase our resilience through a Service-Oriented Architecture (SOA)

Hodgson [Hodgson 2017 details the different types of feature toggle that exist. Some toggles enable A/B testing (where several possible solutions are trialed simultaneously, but to different users), some enable gradual rollouts of new functionality (such as Canary testing), but of particular note to our discussion on continuous architectural refactoring is the "release toggle". Such toggles allow untested or incomplete refactorings and restructurings to be released into a production environment, safe in the knowledge that such code paths will never be accessed.

5.4.2. Componentization

The structure of your architecture can play a key role in mitigating against continuous architectural refactoring. A monolithic architecture, while not inherently bad, can as an organization expands or as the need for flexibility increases, become a key constraint. As Kim [Kim 2016] observes: "… most DevOps organizations were hobbled by tightly-coupled, monolithic architectures that – while extremely successful at helping them achieve product/market fit – put them at risk of organizational failure once they had to operate at scale …".

The key therefore is to evolve your architecture to have sufficient componentization to support your organizational evolution on an ongoing basis. The strangler pattern, described in Chapter 15, Strangler Pattern, can be key in this kind of evolution by creating the space for the implementation to evolve behind an unchanging API.

This can be achieved as a staged process moving from a monolithic architecture to a layered architecture, and on to micro-services, as described by Shoup [Shoup 2014].

5.5. Creating the Right Non-Technical Environment

Technical mechanisms such as continuous delivery and feature toggles are powerful enablers of continuous architectural refactoring, but they are certainly not the only ones. For example, what if you didn’t have the buy-in of senior management to do any refactoring? (Hint: architectural refactoring gets continuously prioritized behind functional evolution.) Even if you have such buy-in, to paraphrase the definition of architecture in Ford [Ford 2017], continuous refactoring needs to be guided and incremental. The guidance comes in the form of an architectural roadmap, a best-guess hypothesis of how the architecture needs to evolve. Finally, organizations need to balance the tensions between these forces; sometimes we should refactor; sometimes we should build new functionality.

Before we continue, it is worth noting that development team structure is also a key enabler for continuous architectural refactoring, in particular the Inverse Conway Manoeuvre. This technique has been described separately in Chapter 3, A Dual Transformation.

5.5.1. Justifying Ongoing Investment in Architectural Refactoring

A frequent frustration amongst software developers is the perception that their management team only values things that can be sold. To management, they believe, architectural refactoring is wasted money, occupying development teams for months at a time without a single additional thing that can be sold being produced. And for that matter, why does it take so long for them to add a feature? (Possible answer: that would be because the architecture has not been refactored in years.)

Management teams have businesses to run, and they have a point. Customers do not typically hand over money for architectural refactorings, no matter how elegant they are, and without shiny new things to sell, there may be no money to continue to employ the development teams who want to do the refactoring.

As such, this issue has two aspects: firstly, development teams need to learn how to justify such investment; secondly, such non-functional investment will always have to be balanced with functional requirements.

It is worth at this point returning to the Fowler [Fowler 2019] distinction between code refactoring and architectural restructuring. Fowler, like the present authors, would be strongly of the opinion that code refactoring requires no justification; rather it is part of a developer’s "day job". This does not mean that we have to take on a massive code restructuring exercise for a legacy codebase; to the contrary, there may be no reason whatsoever to restructure the code for a stable legacy project. However, that said, developers should refactor their code when the opportunity arises. Such activity constitutes a "Type 2" decision as documented in Chapter 9, Minimum Viable Architecture.

Architectural refactoring (restructuring), however, often requires explicit investment because the required effort is significant. In such cases, it is incumbent on development teams and architects to "sell" the refactoring in monetary, time, or customer success terms. For example, "if we perform refactoring A, the build for Product B will be reduced by seven minutes, resulting in us being able to deploy C times more frequently per day"; or, "implementing refactoring D will directly address key customer E’s escalated pain-point; their annual subscription and support fee is $12 million per annum". Note, however, that claims that "refactoring F will make us G% more productive" should be avoided as software productivity is notoriously difficult to measure.

5.5.2. Developing an Architectural Roadmap

In the authors' experience, an architectural roadmap needs to meet several key criteria to achieve continuous architectural refactoring:

Vision: a target end state is key to assessing individual changes as moving towards the target state
Step-wise: a number of intermediate states need to be described between the "as is" and "to be" architectures with the benefits and challenges of each state documented
Flexible: the target and intermediate states may evolve as the understanding of the architecture and the constraints themselves evolve
Open: a successful architecture is rarely defined by a committee, but the process and documentation of the architectural roadmap needs to be available to the whole team, and everyone must feel empowered to comment/question

In order to create the space for the Agile implementation, it is also important that the roadmap remains high-level. There is a tension here between the need to keep the project within its constraints, while giving the team the space and support to make Agile decisions as they are implementing the architectural roadmap. Beyond the roadmap and in particular the vision of a target architecture, guardrails (see above) are key to supporting and enabling emergent architecture, while allowing the overall architecture to remain effective and meet all of its identified requirements.

In particular, our suggested aim is to create an environment where the risk of architectural change can be removed by the supporting conditions, allowing the team the freedom to make architectural changes, knowing that the process and culture will support them. To quote from Kim [Kim 2013]: "Furthermore, there is hypothesis-driven culture, requiring everyone to be a scientist, taking no assumption for granted, and doing nothing without measuring." The measuring of the impact of architectural change was discussed in Section 5.3.2, “Fitness Functions”.

5.5.3. Progressive Transformation (Experience)

Delivering continuous architectural refactoring is more than the sum of the pieces already described in this section. It also needs a pragmatic approach from the entire team, what is "good enough" at every point to allow the product to evolve (in the right direction) and keep the business moving forward. Simon [Simon_2018] describes this as "liquid software", allowing the product (and its architecture) to evolve as needed, while also having an environment that ensures it continues to meet all the requirements placed on it.

In the authors' experience this can also have a varying focus over time; sometimes the business needs to "win" and the focus shifts to business features at the expense of architectural evolution. But it is critical that the environment for architectural evolution persists, so that if and when the focus shifts back on architecture concerns, the option to continue to evolve it would remain open.

6. Architecting the Digital Enterprise

Introduction

Value delivered to customers as well as operational efficiency are core concerns of any enterprise. While this is not new, Digital Transformation re-enforces the importance of these concerns. It also drives change in key areas:

Clients' experience expectations are shaped by Internet giants; for example, Amazon Prime where you can follow your delivery in real time using your smartphone - the digital enterprise is experience-driven
Delivering a superior client experience impacts the enterprise end-to-end; for example, operating model weaknesses are likely to result in client dissatisfaction and pain
Digital is about creating and delivering innovative products or services that meet clients’ explicit and implicit needs
Methods such as design thinking or the analysis of the “job-to-be-done” help invent innovative products and services
Fast learning cycles to quickly experiment with new products and services thanks to Lean Startup MVPs combined with DevOps’s rapid continuous deployment
New technology-enabled business models to gain competitive advantage and disrupt established industries

The authors of an article on Digital Transformation have surveyed more than 20,000 business executives, managers, and analysts [Kane 2019]. They conclude that leaders facing the challenges of digital disruption need to possess three distinctive skills:

Transformative vision and forward-looking perspective
Adaptability
Digital literacy

We believe only an outside-in perspective is suited to the development of a forward-looking perspective. Digital leaders must discover the evolving needs and fears of their customers. To find sources of innovation inspiration, digital leaders borrow design and marketing methods, in particular design thinking and jobs-to-be-done analysis.

The first level of innovation materializes into digital offerings that can meet underserved customer needs. The next innovation level is about inventing disruptive business models that provide sustainable competitive advantages and can sometimes disrupt industries.

New digital offerings and business models may require the enterprise to develop new capabilities. Before committing too many investment dollars, digital leaders validate their market strategy by experimenting with Minimum Viable Products (MVPs) with customers. A few learning cycles may be required before committing significant investments in new offerings or business models. This requires adaptability.

Enterprises that pre-date the age of software see a growing portion of their spend shifting to technology as their market success is increasingly determined by software. However, the productivity of software delivery of the clear majority of enterprises falls woefully behind that of tech giants [Kersten 2018]. Therefore, digital literacy of enterprise leaders is critically important to help them hire the right technology people and recognize when they receive bad technology advice.

Figure 4, “Architecting the Digital Enterprise” shows digital architecture developed concurrently and follows two key principles: outcome-driven and modularity.

Figure 4. Architecting the Digital Enterprise

Outcome-Driven

Looking at products with outside-in perspective requires a shift from outputs to outcomes. An output is what is created at the end of a process. Outputs tell the story of what you produced or your organization’s activities. A very large company in the car industry recently said they were looking at product as “how it is used by a customer rather than how it is created/delivered". Does it mean that these firms are no more car-producers? Absolutely not. This illustrates that the customer job perspective helps discover different and sometimes innovative ways to satisfy customer needs.

Outcomes are the effects produced by using an enterprise’s products and services. As stated by Karl Hellman: "outcomes are the benefits your customers receive from using your stuff <…> This requires a true understanding of customers’ needs — their challenges, issues, constraints, priorities — by walking in their shoes and in their neighborhoods" [Hellman 2018].

Defining desired outcomes is about:

Describing the outcomes you want to achieve: why your customer is using or would want to use your product?
Looking at these outcomes from the the customer’s shoes
Associating quantitative measure to these outcomes (i.e., % of clients demonstrating new behavior, % of clients coming back into treatment, etc.)

Because there is more than one way to deliver desired outcomes, more than one product can deliver it. This opens the range of possible solutions. A product owner should assess the qualities of each candidate product. She should ensure desired outcomes are linked to the product’s outputs or activities. In other words, she needs to be confident that the operating model supporting her product can reasonably deliver the customer’s desired outcomes.

Modular

Modularity is about decomposing a system into parts that are loosely-coupled. In this section the term system refers to any type of system from human or social to technical. Chapter 4, What is Architecture? provides a comprehensive description of systems thinking.

The main benefits of modularization are:

Enable parallel work to dramatically shorten capability or product development lead times
Changes in one part of the system have limited impact on other parts of the system which makes it more resilient to change
Failures of one part of the system are less likely to propagate to other parts of the system

For example, cloud-native computing promotes an architecture style that decomposes software systems into services that have well-defined boundaries. Changes in one service have a limited impact on other services and failures are easier to isolate which makes the system more resilient.

6.1. Strategic Marketing

Jean-Jacques Lambin [Lambin 2000] states: "the role of strategic marketing is to lead the firm towards attractive economic opportunities; that is, opportunities that are adapted to its resources and know-how and offer a potential for growth and profitability".

Strategic marketing is about:

Discovering what your customers want and how competition (if any) provides it
Segmenting your customers
Evaluating the attractiveness of each segment
Assessing if you have the capabilities to provide superior value
Deciding which segments to target
Positioning your brand to meet the needs of targeted segments

"A positioning statement defines the value proposition of products to the target: … the point of difference (reason to buy) and the point of parity (point of reference)" [Wharton 2018].

Strategic marketing provides context to help drive digital strategy. The enterprise can choose to develop new capabilities to serve targeted client segments. Traditional strategic marketing is mostly developed in a top-down manner.

6.2. Business Model

"Business model" is one of the great digital buzzwords. A good business model tells a good story. Describing an existing business model is easy and many templates can help you structure a story line; for example, the Business Model Canvas [Osterwalder 2010] or "Reinventing your Business Model" [Johnson 2008].

The difficult part is changing an existing business model or creating a brand new one that works. Following a generic template is no guarantee of success.

Successful business models start with an innovative value proposition that has been field tested. The Amazon Flywheel illustrates how systems thinking can help design successful business models. "Simply put, a flywheel is a self-reinforcing loop or systems diagram driven by key objectives or initiatives" [Rossman 2019].

John Rossman’s Idea 25 states: "Study and analyze either your industry or the situation you are trying to improve using systems thinking. Once you have an idea or hypothesis on how to achieve your goal, create a simple version of your system, often called a “flywheel”, to assist in testing your strategy and then in communicating your logic and plan to others".

Digital enterprises can pursue both differentiation and low cost. This creates a leap in value for both the enterprise and customers. The Amazon Prime Now™ service epitomizes this. Customers can purchase a large variety of products at a competitive price and get a free two-hour delivery at their home (https://primenow.amazon.com).

Product variety is increased through a platform that allows third-party vendors to sell and deliver their products leveraging Amazon’s portal and logistics capabilities. The Amazon business model is characterized by:

A two-sided market platform business model with customers on one side of the market and third-party vendors on the other
A logistics capability which translates into a superior experience allowing the customer to track delivery in real time using any device

The Amazon example illustrates how firms can compete leveraging difficult-to-replicate capabilities which are enabled by a digital platform that enables an adaptive operating model.

6.3. Customer Insights

Though classical marketing research provides a set of quantitative and qualitative tools to analyze customer needs and test product ideas, it is no guarantee of success. Therefore, the design discipline is gaining traction to gather customer insights which complement traditional marketing analysis.

Design thinking postulates that to create meaningful innovations, the enterprise needs to know its customers and care about their lives [Stanford 2010].

The concept of job-to-be-done is a key tool, developed by Clayton Christensen. It helps better understand customer needs by wrapping offered products and services in a usage context.

"After decades of watching great companies fail, we’ve come to the conclusion that the focus on correlation – and on knowing more and more about customers – is taking firms in the wrong direction. What they really need to home in on is the progress that the customer is trying to make in a given circumstance – what the customer hopes to accomplish. This is what we’ve come to call the job-to-be-done."

New market segmentation methods emerge from analyzing jobs-to-be-done. Customers can now be classified based on the outcomes they expect. Unlike abstract client segments, personas help better understand who the persons are the enterprise targets and what are their pains and expected gains.

The four remaining steps of design thinking help the enterprise formulate a better problem statement (define), generate a broad range of ideas (ideate), and prototype and test candidate solutions.

Validated customer insights provide key inputs to help define innovative value propositions.

6.4. Customer Journey

Customer journey mapping helps to systematically put the customer at the heart of the enterprise’s digital strategy. It offers a new way of seeing the enterprise’s markets and strategy from an outside-in rather than an inside-out perspective. It helps teams and management "walk in the shoes" of their customers. It helps correct strategy myopia because it shows a much broader picture of customer needs than classical marketing addresses.

Customer journey maps come in many shapes and forms. For example, customer maps can follow a timeline positioning activities or phases chronologically.

Alternatively, they can be represented as:

A network structure that shows a web of interrelationships between aspects of an experience
A spatially organized map that shows where interactions take place

In addition to describing the customer’s functional job, a customer journey map captures the feelings of customers during moments of truth. It can also capture how a customer believes she is perceived socially.

6.5. Digital Platform

Internet giants are dominating competition by leveraging the power of the platform. A platform leverages technology to connect participants to create or exchange value. For example, Airbnb™ connects travelers with local hosts who earn extra income, or Uber™ connects interested passengers with drivers who use their own car to provide transportation services.

Platform-based business models are based on the two-sided markets theory developed by the Economics Nobel prizewinner Jean Tirole. When platform-based businesses enter markets dominated by "pipelines", they enjoy a competitive advantage. Why? Because pipelines rely on inefficient gatekeepers to manage the flow of value when platforms promote self-service and direct interactions between participants.

A platform can scale and grow more rapidly and efficiently because the traditional gatekeeper is replaced by signals provided by market participants through a platform that acts as a mediator. Platforms stimulate growth because they expose new supply and unlock new demand. They also use big/fast data and analytics capabilities to create community feedback loops [Parker 2016].

Technology is a key enabler of platform-based business models because high-levels of automation and self-service capabilities are required to succeed. For example, when Amazon decided to develop a third-party sellers' market, one of the key requirements was: "A third-party seller, in the middle of the night without talking to anyone, would be able to register, list an item, fulfill an order, and delight a customer as though Amazon the retailer had received the order.” [Rossman 2019]

This document makes a clear distinction between the platform business model which is based on the two-sided market theory and the digital platform which is a technology enabler that allows enterprises to achieve economic gains by reusing or redeploying assets across families of products.

A digital platform is a software system composed of application and infrastructure components that can be rapidly reconfigured using:

DevOps to dramatically reduce the "requirement to deploy" lead time which is key to reducing digital offerings' time-to-market
Cloud Native Computing to bring agility, scalability, and resilience to the operating model

The definition in this document is compatible with Jeanne Ross' definition from the MIT CISR which defines a Digital Platform as: "a repository of business, data, and infrastructure components used to rapidly configure digital offerings" [Ross 2019].

Most established enterprises have monolithic legacy systems that get in the way of creating effective and efficient digital platforms. To compete with Internet giants or more nimble competitors, these enterprises must develop refactoring and modernization strategies.

6.6. Digital Offering

True Digital Transformation means allowing the rapid creation and market testing of new digital offerings inspired by customer insights and powered by a digital platform to deliver differentiated outcomes to customers. "A digital offering is the confluence of a customer solution and a great experience" says Jeanne Ross [Ross 2018].

The most advanced enterprises use a Lean product development approach to create digital offerings [Oosterval 2010] [Morgan 2019].

The Lean Startup best practices are helping to market test digital offerings before enterprises invest too much in them [Ries 2011].

6.7. Toward an Adaptive Operating Model

Delivering superior customer experience cannot be achieved by improving the front-end alone. If the back-end is the source of rigidity or defects, it will translate into poor customer experience. The customer journey should be driven by the customer and not by rigid back-end processes.

Figure 5, “Revised Service Blueprint”, inspired by service blueprinting, helps bridge customer journeys with required capabilities. The example describes a simplified loan origination journey. Variants of this journey can be created to account for a different channel mix.

The top part of the diagram borrows from a journey map. The bottom part below the line-of-visibility describes the capabilities that are required to support the story map. Capabilities are implemented by services that are described in functional terms and/or specified using APIs and/or business events.

Figure 5. Revised Service Blueprint

Architecting a set of modular and reusable services will allow for rapid reconfiguration of customer journeys. System architecting techniques combined with Domain-Driven Design (DDD) can help design loosely-coupled services that will be easier to assemble into unanticipated composite services. An adaptive operating model takes advantage of modularity and composability to gracefully adapt to changing customer experience requirements.

6.8. Accountability

In a paper published in the Sloan Management Review Jeanne Ross suggests to "initiate change by assigning accountabilities for specific business outcomes to small teams or individual problem owners" [Ross 2018].

In many enterprises this change impacts the formal organizational structure. For example, several banks are adopting the Spotify model to re-organize the IT function and some such as ING or ANZ are also re-organizing the business the same way.

Changing the organizational structure is not enough. It is key to change the culture, the ways of working, and the management system. Accountability is a powerful alignment mechanism when teams trust each other. Trust requires visibility and predictability which are not the hallmark of "command and control" organizations.

6.9. Set-Based Concurrent Engineering

In the Big Design Up-front (BDUF) style, architecture teams often pick fundamental concepts based on incomplete or old knowledge. The subsequent design iterations:

Look at one solution at a time and change it only when problems arise
Disregard new customer or technical knowledge that would challenge early design decisions

The resulting solution is likely to require significant rework and will be suboptimal because good concepts are eliminated too early.

In contrast, Set-Based Concurrent Engineering (SBCE) [Ward 2014]:

Simultaneously explores multiple solutions
Aggressively attacks those solutions with rapid, low-cost analysis and tests, progressively eliminating weak solutions
Uses the analysis and test results to define the limits of the possible
Converges on a solution only after it has been proven

"Taking time up-front to explore and document feasible solutions from design and manufacturing perspectives leads to tremendous gains in efficiency and product integration later in the process and for subsequent development cycles" [Sobek 1999].

In our context concurrent engineering brings agility and facilitates innovation, for example:

Customer insights can influence strategic marketing decisions
Business models can evolve because of experience design or product experimentation
The enterprise can evolve from a product distribution model toward the development of a two-sided market
Digital platforms can evolve as more products reuse them
Operating models evolve alongside the enterprise Agile Transformation as organizations become flatter and cross-functional
The accountability framework strengthens as the enterprise’s culture evolves toward increased teams' autonomy

7. Architecting the Agile Transformation

The Agile Transformation of the enterprise covers three areas:

Adopting new ways of working
Deploying new management systems
Changing the organizational structure

The new ways of working promote the practices below:

Rapid iteration and experimentation which promotes continuous learning
Fact-based decision-making
Information sharing and transparency
Cohesive cross-functional teams coached by "servant" leaders
Performance orientation driven by peer pressure

The management systems evolve to promote a mix of freedom balanced by clear accountable roles. Freedom is required to empower teams to rapidly make decisions closer to the field. Accountability in an Agile organization is not about controlling people; it is about a two-way exchange where you agree to deliver something to another person.

In an Agile organization an employee is accountable to her peers, her manager, and her clients. Managers are accountable to their teams, the board of directors, and society. The management system cascades goals at all levels of the organization and promotes a constructive dialog to help set up accountability relationships between employees and managers. The reward system recognizes individual performance while promoting collaboration.

The organizational structure is flattened. Autonomous cross-functional teams often named "feature teams" or "squads" are formed. Cross-functional roles emerge to help construct robust communities of practice often named "chapters" or "guilds". Resource allocation is flexible and driven by the evolution of demand or activity level.

The left part of Figure 6, “Agile Transformation” represents the three transformation dimensions we have introduced, plus one which is the enterprise’s culture. Culture evolution results from changes in the three dimensions. For culture change to take hold, people have to experience success operating in the new Agile organization.

Figure 6. Agile Transformation

The middle part of the figure lists a few important questions that the enterprise needs to address. For example, the waterfall scenario is likely to create intermediary stages that are suboptimal. New ways of working may conflict with the existing management system. The enterprise which deploys a new management system on top of an existing organizational structure will have to redeploy it at a later time.

Because of the interdependencies that link transformation dimensions, it is tempting to conduct change on the three dimensions in parallel. This leads to either a big bang scenario or an incremental deployment. In the case of an incremental deployment, the challenge is to define the scope of each increment.

7.1. Incremental Agile Transformation

Figure 7, “Transformation Increment” shows that each transformation increment should cover the entire hierarchical line. Why? Because if you change the way one level of the organization operates while the hierarchical line continues to operate the old way, the enterprise runs the risk of submitting its employees and managers to a double bind; i.e., employees and managers becoming at risk of being confronted with two irreconcilable demands or a choice between two undesirable courses of action. For example, re-prioritize your backlog but stick to established budgets and product plans.

Figure 7. Transformation Increment

Incremental transformation deployment is easier when two conditions are met:

Dependencies that link organizational units are minimal
The business and IT organizational structures are well aligned

When those conditions are not met, it is better to change the organizational structure before deploying new ways of working and the new management system. The new organizational structure reflects an architectural intent which is either implicit or explicit.

7.2. Architecting the Organization

We will illustrate the need to re-architect the organization with the example of an IT organization that evolved toward more agility.

The legacy IT organization was structured by process/function (front-office, middle-office, finance, risk) rather than by product families (plain vanilla swaps, equity derivative, commodities, etc.). Two powerful central functions would control operational teams:

Budgeting and financial control to rein in costs
Architecture to ensure the coherence of the information system

Pools of software development resources were organized by technologies; for example, server-side Java® development or mobile front-end development. Software development projects would on-board required resources for the duration of the project and release them when done. Last but not the least, IT operations and support was separate from application development.

Though the specialization of the legacy organization was supposed to help the IT organization capture economies of skills and scale, several problems surfaced.

The level of inter-team dependency is high because of the multiplication of organizational entities that have to intervene on a project. Time-to-market is increasing due to the high level of inter-silo coordination required; for example, between development and IT operations teams. Alignment between business and IT is complicated because of the lack of a simple mapping between organizational structures.

The IT re-organization was inspired by the Spotify model. Small cross-functional teams named "squads" replaced projects. The size of a squad does not exceed 10 to 15 people. Unlike a project, a squad is a stable team composed of dedicated resources that develops an end-to-end perspective. Squads are product-centric, meaning they develop, deploy, and operate a service. Squads adopt a DevOps culture which translates into the mantra "you build it, you run it".

Squads are grouped into tribes which are not bigger, on average, than 150 people. In order to maintain and develop functional or technical competencies, chapters and guilds are created. For example, chapters that regroup mobile developers or NoSQL DBMS experts.

As the number of Agile teams grows with a few hundred squads running in parallel, it is important to define a taxonomy of Agile teams that clearly defines the scope of each one and minimizes inter-team dependencies. True Enterprise Architecture thinking is required to discover an effective way to decompose the organization and to draw boundaries that minimize inter-team dependencies.

Figure 8, “Agile Teams' Taxonomy” represents a simplified taxonomy.

Figure 8. Agile Teams' Taxonomy

The primary goal of an Agile teams' taxonomy is to minimize redundancy and duplication [Rigby 2018]. Because an Agile teams' taxonomy may be different from the formal P&L structure of the enterprise, it is necessary to map it to the division, business unit, and P&L structure of the enterprise.

8. Agile Architecture Maturity Model

Starting in 1986, the Software Engineering Institute (SEI) developed a process maturity framework, the goal of which was to improve software processes. Harvesting experience gained over several years, the SEI published the Capability Maturity Model (CMM) for Software [SEI 1993].

The scope of this initial model was defined by two words: process and software. As the Capability Maturity Model gained in influence, the underlying approach was applied to other domains. For example, in 1995 the SEI developed a People Capability Maturity Model (PCMM®) [SEI 1995]. More recently, the SEI made public another maturity model, the Smart Grid Maturity Model (SGMM) [SEI 2018].

Other parties inspired by these maturity models created their own to be applied to different domains. For example, the US Department of Commerce developed a model for Enterprise Architecture [DoC 2007].

The original SEI CMM for software became CMMI®. The latest version, CMMI V2.0, is managed by the CMMI Institute which is an ISACA® Enterprise.

We propose reviewing maturity levels as defined by CMMI V2.0 to inspire the definition of the maturity levels in this document. Not all maturity dimensions are process-related, therefore we borrow the CMMI V2.0 "Practice Area" terminology which is more general than "Process Area". In the next sections we will define the maturity levels and corresponding practice areas.

8.1. Maturity Levels

Figure 9, “Maturity Levels” lists the CMMI V2.0 maturity levels. We like the idea of differentiating organization-wide practices from project-level practices.

Figure 9. Maturity Levels

Because Agile at scale shifts from project to product and organization-wide practices are not limited to standards, we need to define specific maturity levels.

Figure 10, “O-AAF Maturity Levels” introduces the maturity levels we propose.

Figure 10. O-AAF Maturity Levels

For each maturity level, the enterprise should develop specific architecture practices.

8.2. Essential Architecture Practices

Figure 11, “Architecture Practices” describes the essential architecture practices enterprises need to implement for each level.

Figure 11. Architecture Practices

We will now identify the practice areas that need to be analyzed for each maturity level.

8.3. Practice Areas

Figure 12, “Practice Areas” defines six practice areas. For each area and each level, the matrix defines assessment criteria. Maturity levels in the table start at level 2 because levels 0 and 1 are out of scope of this document.

Figure 12. Practice Areas

The practice areas table above can be used to assess the maturity level of an Agile enterprise. In the next version of this document, we will develop a maturity assessment method which will be described in a playbook.

Part 2: Playbooks

This section contains the O-AAF playbooks. The Merriam-Webster dictionary [Merriam-Webster] defines a playbook as: "a stock of usual tactics or methods".

The O-AAF playbooks provide guidelines to solve a particular Agile Architecture problem. For example, how to adapt governance or how to handle legacy systems when developing a digital platform.

Playbooks help modularize the O-AAF approach.

This part of the document is composed of a set of playbooks that architects can activate to meet the specific objectives and context of an enterprise. Each playbook is self-contained though it describes prerequisites that can constrain the order in which playbooks are activated.

9. Minimum Viable Architecture

Note	This section explores the Minimum Viable Architecture (MVA) concept and proposes guidelines to determine the optimum timing and sequencing of architecture decisions.

Context

In his book The Lean Startup [Ries 2011], Eric Ries coined the term Minimum Viable Product (MVP) defined as: "that version of the product that enables a full turn of the Build-Measure-Learn loop with a minimum amount of effort and the least amount of development time".

An MVP needs to be placed in front of customers to discover and analyze their reactions. Unlike a prototype whose quality is assessed by engineers and designers, the purpose of an MVP is to assess whether or not a product meets customers' expectation and if they would pay for it. When the experiment fails, meaning that customers are unlikely to buy, the product owner can pivot to a revised product concept or give up and stop development. The MVP helps save money because it minimizes the time and investment required to experimentally verify the product concept.

The term MVP is becoming very popular and is often used to mean something different; for example, justifying the development of a poor-quality prototype. Jumping on the MVP bandwagon, some agilists coined the term Minimum Viable Architecture (MVA).

The MVA concept means different things to different people; for example:

The minimum architecture work that is required to create an MVP
The "architecture that enables the delivery of the core product features to be deployed in a given phase of a project and satisfied known requirements" [Erder 2016]
Just enough or good enough architecture by opposition to big up-front design or heavy investment in "plumbing"
Architecture that is built in small increments over a period

To say the least, the MVA concept lacks clarity and we need to reframe the way the problem is defined.

9.1. Reframing the Problem

Chapter 4, What is Architecture? explores what distinguishes architecture from design. Not all design decisions are architecture decisions. Architecture decisions need to be organized driven by the identification of high-impact decisions. High-impact decisions are defined as those which are likely to have significant effect on quality metrics or on other decisions. There is a need for guidelines on how best to organize the architecture decision space.

Wording such as building architecture or architecture runway may give the impression that architecture means infrastructure or platform. Infrastructures and platforms implement architecture models but should not be confused with architecture which is about the fundamental concepts and properties of a system. Therefore, we suggest distinguishing two types of decisions: the true architecture ones from infrastructure sourcing and provisioning ones.

Taking the example of an MVP, we suggest distinguishing architecture decisions that pre-condition its development, from investment decisions to fund its development and production environments.

An enterprise can be modeled as a complex system. A large enterprise may be composed of many divisions and departments; it can operate in many regions of the world and it can market a large number of products and services. When Agile at scale is deployed, the minimum architecture is the one required to define an Agile team’s taxonomy as described in Chapter 7, Architecting the Agile Transformation.

In an Agile culture where team autonomy is valued, architecture is the result of a problem-solving process that starts from an intentional architecture vision which challenged, amended, and completed by Agile teams.

What is the minimum definition of this intentional architecture vision? How to conduct the dialog between the owner of the architecture vision and Agile teams? It depends on the context. The heuristics described in the next section can help architects answer these questions.

9.2. Heuristics for Structuring Architecture Decisions

This document proposes a set of heuristics to structure and time the architecture decisions which are contingent to the context in which they are made. The heuristics below can be combined.

Focus on Type 1 Decisions

Jeff Bezos, CEO of Amazon, distinguishes two types of decision: type 1 and type 2. Type 2 decisions are changeable and reversible; they are two-way doors. If you’ve made a suboptimal type 2 decision, you don’t have to live with the consequences for that long. Type 2 decisions can and should be made quickly by high judgment individuals or small groups." Type 2 decisions can be made by autonomous Agile teams, while type 1 decisions require architecture thinking. Reversing type 1 decisions has the potential of creating a lot of rework which is wasteful. This is why the right timing of type 1 decisions is critical.

Delay Architecture Decisions

Impacting decisions, mostly type 1, should be delayed until the probability that they would be questioned down the line is low enough. The right timing of type 1 decisions is a doubled-edged sword. Made too late, the resulting ambiguity is likely to significantly slow down design activities. Made too early, the rework needed to reverse them is likely to add significant delay and wasteful rework.

Lean product and process development promotes Set-Based Concurrent Engineering (SBCE) which helps to optimize when architecture decisions are to be made. Unlike traditional methods which lock design decisions too early or Agile methods which leave design decisions open too long, SBCE allows the final architecture design to emerge from teams' learning [Ward 2014].

Conventional development works in the following way:

Requirements are communicated to the product development team
A brainstorming effort produces a number of possible concepts
The product leadership team often picks a fundamental concept, sometimes even before the project begins
The product team details the concept through the identification of sub-systems
Each sub-team specifies and tests its sub-system in relative isolation
Integration testing of the whole system occurs late in the process

Architecture and design decision incompatibilities are often identified too late in the process, which is a major cause of rework. The solution space is constrained too early in the process which can result in suboptimal architecture decisions.

In contrast, SBCE works like this:

The product team breaks the system recursively into sub-systems until the right level of granularity is reached
It identifies a broad target for the system and each sub-system
Multiple concepts for the system and each sub-system are created
The team filters these concepts by thorough evaluation, eliminating concepts that don’t fit with each other
Failure information goes into a trade-off curve knowledge base that guides architecture design
As they filter, there is rapid convergence toward a solution that is often more elegant and innovative than the one conventional point-based development would have produced

"The set of design alternatives shrinks because they are eliminated, and as trade-off curves are developed, the remaining alternatives are developed to increasing levels of fidelity. Simultaneously, target ranges narrow in corresponding stages, converging on values that provide the best set of trade-offs."

Add Evolvability as a Key Non-functional Requirement

The authors of Building Evolutionary Architectures [Ford 2017] describe an evolutionary architecture as supporting "guided, incremental change across multiple dimensions". The key idea is to enable the incremental development of the product or system while preserving functional requirements such as scalability, elasticity, or resilience. As the product or system evolves during Agile iterations, its architecture qualities should not degrade over time.

The methods used to improve evolvability depend on the type of architecture. For example, loose-coupling and separation of concerns help architect software systems to evolve more gracefully. Let us illustrate this with clean architectures. Clean architectures such as the hexagonal one isolate the domain logic (the core) from non-core domain concerns such as inputs and outputs or persistence mechanisms. The code that implements the domain logic is protected from changes that could impact, for example, persistence mechanisms. The initial version of a piece of software could use an RDBMS and be ported later to a NoSQL DBMS with minimum rework.

Transforming type 1 decisions into type 2 decisions by making them easier to reverse also contributes to evolvability.

Sacrificial Architecture

Martin Fowler coined the term "sacrificial architecture" [Fowler 2014] to designate situations where a team deliberately chooses to throw away a codebase. Martin Fowler lists a few examples of companies who have done it, such as eBay® or Google®.

When the goal is to get rapid market feedback experimenting with an MVP, a sacrificial architecture is an option to consider as it would not be worth spending too much time designing an architecture that would have to change should the product owner decide to pivot.

Figure 13, “MVA – Forces and Heuristics” shows the relationships that link heuristics as well as the forces that influence architecture decisions and their timing.

Figure 13. MVA – Forces and Heuristics

Type 1 decisions can be delayed until the last possible moment, transformed into type 2 decisions by making the architecture more evolvable, or avoided with the creation of a sacrificial architecture.

Many forces influence the structure and timing of the architecture decision space:

The known unknowns
- The product development process is about creating verified and validated knowledge on customers, technology, and integration issues. The more unknowns, the higher risk that type 1 decisions would be reversed resulting in rework, higher costs, and delays.
The value of speed adjusted by the risk appetite
- It helps to determine the last responsible moment to make an architecture decision. The Lean SBCE strategy optimizes the speed/risk equation.
Other contextual forces are to be factored, in particular the organizational culture, the volatility of customer needs, and the stability of purpose

When architecture decisions are made, it is important to document the motivations behind them. A minimum architecture documentation should be composed of a collection of Architecture Decision Records (ADRs) [Nygard 2011]. Each ADR describes a set of forces and a single decision in response to those forces. A simple command line tool such as https://github.com/npryce/adr-tools provides a lightweight way of documenting an architecture.

10. Adaptive Operating Model

Understanding the enterprise’s process architecture is the first step toward analyzing its operating model.

In their book "Six Sigma for Financial Services" Rowland Hayler and Michael D. Nichols describe process architecture as "understand our organization’s end-to-end processes and how they fit together to maximize value" [Hayler 2006].

Most organizations describe their process architecture using some hierarchical modeling scheme. For example, the enterprise is decomposed into major process areas which are decomposed into process groups that are composed of processes.

This playbook will define the process concept and compare it to Lean value stream. It will also show how the rise in complexity of the digital enterprise requires to go beyond process architecture toward rethinking operating models. It will introduce the concept of the adaptive operating model which can gracefully evolve while it preserves and improves both effectiveness and efficiency.

10.1. What is a Process?

Michael Hammer, who is the author of a seminal HBR article [Hammer 1990], started the Business Process Re-engineering (BPR) movement in 1990. The key messages are:

Organize around outcomes, not tasks
Take advantage of the many options IT offers for reorganizing work
Apply a set of process redesign patterns to reinvent the process

The BPR movement enjoyed rapid growth for a few years before it fell out of favor following a number of large-scale BPR failures.

Hammer and Champy define process as: "a collection of activities that takes one or more kinds of input and creates an output that is of value to the customer" [Hammer 1993].

About at the same time, James Harrington [Harrington 1991] created a Business Process Improvement (BPI) method that defined the foundations of modern business process management practices.

James Harrington defines a process as: "any activity or group of activities that takes an input, adds value to it, and provides an output to an internal or external customer … There is no product and/or service without a process. Likewise, there is no process without a product or a service."

The three major objectives of BPI are:

Making processes effective – producing the desired results
Making processes efficient – minimizing the resources used
Making processes adaptable – being able to adapt to changing customer or business needs

A new role is defined – "process owner" – who is accountable for how well the process performs.

The method defines key process improvement concepts that are very similar to equivalent Lean concepts. Because the text was written at about the same time as The Machine that Changed the World [Womack 2007], we formulate the assumption that the authors (Harrington and Womack) either shared the same sources and/or invented the same concepts at the same time.

James Harrington recommends classifying activities into two categories:

Real Value-Added (RVA) that are required from a customer perspective to provide the output the customer is expecting
Non-Value Activities (NVA) that do not contribute to meeting customer requirements, and could be eliminated without degrading the product or service functionality

NVA are often activities that exist because the process is inadequately designed, or the process is not functioning as designed.

BPI aims at reducing the process cycle time. It proposes a set of heuristics to compress cycle time. The equivalent Lean concept is the reduction of lead time.

BPI recognizes the need for big picture improvement when making incremental process improvement does not bring the desired result. The big picture technique requires stepping out of today’s processes and defines what perfect processes would be without the constraints of the present organization, processes, and/or technologies.

Process architecture is especially relevant to help understand when and how big picture improvement is required because it helps connect the operating model level to the individual process level. That is why it is useful in the context of Digital Transformation because operating models are likely to change.

Last but not the least, the author explains why feedback systems are very important. He recommends relating feedback to the individual performing the tasks, so they quickly understand their impact on quality and giving them the responsibility to take immediate action. This last point is likely to have been influenced by the Toyota® practice of the andon cord.

10.2. Comparing Lean Value Stream and Process

The Lean Enterprise Institute (LEI) defines value stream as: "All of the actions, both value-creating and non-value-creating, required to bring a product from concept to launch (also known as the development value stream) and from order to delivery (also known as the operational value stream)."

The Lean value stream definition is similar to James Harrington’s process definition. So why has Lean settled for a different terminology? James Womack, a co-founder of LEI, explains that the word "process" has different meanings (i.e., confusing process and procedure), therefore using value stream instead of process helps to remove ambiguity.

James Harrington defines the process owner role, while Lean defines the value stream manager or leader role.

The Lean definition brings an important distinction: development value streams to develop products and processes versus operational value streams.

10.3. Sources of Process Architecture Complexity

Two main sources of complexity are product and service variety, plus the multiplication of touchpoints.

Product and Service Variety

Zeynep Ton observes: "Higher product variety and more promotions, in particular, increase costs all throughout the supply chain … More product variety and promotions also increase the likelihood of errors and operational problems in the stores." [Ton 2014].

Michael George recommends eliminating complexity the customer will not pay for and exploiting the complexity customers will pay for. He also recommends minimizing the costs of complexity offered [George 2004]. His approach includes analyzing core processes, identifying product families, and creating complexity value stream maps.

The author lists several levers to help manage complexity:

Remove process waste
Exploit commonality through modularization
Exploit design reuse
Design with the lifecycle in mind
Ensure standardization of tasks
Use core process analysis to verify IT supports strategic processes
Use IT to deliver variety at low cost
Fight back against IT complexity by modularizing your systems

Multiplication of Touchpoints

In their book Beyond Advertising, the authors recommend thinking about brand ecosystems defined as "the brand’s multiple touchpoints and how they interact with each other, from a digital out-of-home experience to a tablet, from mobile to the store" [Wind 2016].

The authors observe that all interactions with a brand, from the first time you become aware that it exists to every touchpoint that you encounter along the way in your daily life, have an impact: "From the customer perspective, touchpoints with a brand or product are not differentiated: it is the seamless experience that matters."

Figure 14, “Touchpoints” illustrates the variety of touchpoints that a brand has to orchestrate to deliver a positive customer experience.

Figure 14. Touchpoints

The authors predict that in the future: "Touchpoints will continue to multiply as we enter an era where every object has the potential to become connected and interactive."

This evolution impacts the enterprise as a whole: "New structures and processes will need to allow for agility and reaping the benefits both from decentralization and, when needed, the power that leveraging through centralization facilitates."

New operating models are emerging to support the ability to create real-time, personalized experiences.

10.4. Toward an Adaptive Operating Model

The exploding number of combinations of product/service-to-touchpoint poses significant challenges to basic process design and operating model architecture.

Bain positions an enterprise’s operating model as a bridge connecting strategy and execution [Bain 2014]. Operating model design starts from a clear formulation of the enterprise’s value proposition. According to the Operating Model Canvas [Lancelott 2017], an operating model has six components:

Value delivery chain(s): the work that needs to be done to deliver the value proposition
Organization: the people who do the work and how they are organized
Location: where the people will be located and the assets they need to help them
Information: what information systems the people need to help them
Suppliers: the suppliers who support the work
Management system: the management system used to run the organization

An adaptive operating model is characterized by:

A modular and flatter organization
A loosely-coupled information system

The accountability framework balances the autonomy agility required with effective alignment mechanisms.

The modular nature of the adaptive operating model allows for a "plug-and-pay" reconfiguration in response to evolving customer feedback.

10.5. Linking Journeys to the Operating Model

Providing a seamless experience at every touchpoint requires linking customer experience to the underlying operational processes.

The starting point is the discovery of what creates value across a given journey from the customer’s point of view. By analyzing customer journeys, enterprises can pinpoint the operational improvements that will have the biggest effect on customer experience [Chheda 2017]. Once the desired operational improvements have been identified, the enterprise can implement them by activating four levers:

Lean: to streamline processes, eliminate waste, and foster a culture of continuous improvement
Digitization: the process of using technology to automate and improve journeys directly
Advanced analytics: leveraging the power of machine learning to discover insights and make recommendations
Intelligent process automation: improving process by reconfiguring and automating work

This document includes a revised service blueprint modeling technique to analyze customer journeys end-to-end. Figure 15, “Revised Service Blueprint Template” introduces a template that can be used to analyze customer journeys end-to-end.

Figure 15. Revised Service Blueprint Template

For each stage of the journey, the diagram specifies the channel that supports the interaction with the user. A user story describes the set of interactions that occurs at that touchpoint.

For each of the user stories an analysis of required capabilities is performed. It starts with a functional description and can go as far as specifying the APIs that encapsulate corresponding business services.

When required capabilities are missing, an analysis of existing applications helps identify missing or incomplete ones. This gap analysis feeds an Agile requirement backlog. Backlog prioritization may result in customer journey changes.

The candidate services that would implement the required capabilities should be architected in a modular manner. This would facilitate the creation of new composite services. Reusable services can be aggregated into digital platforms that facilitate reuse and accelerate the customer journey’s reconfiguration.

11. Event Storming Workshop

"People exploring a domain with collaborative brainstorming techniques focusing on domain events"

This section presents the event storming workshop style, what it is, and its benefits.

11.1. Summary: Why? How? Who?

The goal of an event storming workshop is to explore a domain with several people collaboratively.

To do so, people will be placing domain events on sticky notes on a wall along a timeline with an unlimited modeling surface.

The workshop puts together three kind of people: people with questions, people with answers, and a facilitator.

Figure 16. Event Storming Modeling Surface Example

11.2. Domain Event

“Something that happened in the past that the domain experts care about" (Eric Evans)

The orange color sticky note is the convention for the event.

The event is named with a past participle because of its simple semantic and notation. For example, order paid, product sent, etc.

Events are used because they are easy to grasp and relevant for domain experts.

11.3. Event Storming Principles

Event storming workshops focus on domain events structured along a timeline.

The goal of the event storming workshop is to maximize the learning of all the participants. At the end of the workshop, the participants should have a shared knowledge of the domain subject of the workshop.

The physical set up is important: the surface of the wall represents an unlimited modeling surface and everyone should stand up to increase people’s engagement in the workshop. There must be a continuous effort of every participant to maximize the engagement. The collaborative approach involves people with questions, people with answers, and a facilitator.

Markers and sticky notes should be available in different colors and in sufficient quantity.

There should be no limitation of the scope under investigation and the model should be continuously refined through low-fidelity incremental notation.

11.4. Types of Event Storming Session

Event storming can have different goals, but mainly we can distinguish these two commons ones:

Big-picture event storming
- Embrace the whole complexity
- Maximize learning
Design-level event storming
- Assumption: people have a shared understanding of the underlying domain
- Focus is on implementing software features that are solving a specific problem

11.5. Event Storming Notation and Concepts

The concepts used during an event storming session, apart from the event concept, is voluntarily set by the participants whenever they feel they need one. However, the facilitator can suggest the following "usual" concepts:

Command (usually with a blue sticky note)
- Represents an intention sent to a system for making something
- Result is either success or failure
- Named with a verb and nominal group
Aggregate or entity (usually with a yellow sticky note)
- Cluster of domain objects treated as a single unit
- One of its objects is the root, ensuring the integrity of the whole aggregate by enforcing its invariant policy and transactional boundaries
Reaction (usually with a purple sticky note)
- Cluster of domain objects treated as a single unit
- One of its objects is the root, ensuring the integrity of the whole aggregate by enforcing its invariant policy and transactional boundaries
Hotspot (usually with a red sticky note)
- Issue or something important that we must care about
Policy or rule
- The flow of events and reactions together
Persona
- Person triggering a command
Read model/query
- Request asking for data in a specific model

11.6. Benefits

Opportunities to learn and facilitate a structured conversation about the domain; this workshop is the best and most efficient way for all participants to have shared knowledge of a domain
Uncover assumptions about how people think the system works; allows you to identify misunderstandings and missing concepts
Shared understanding of the problem and potential solutions
Highly visual, tactile representation of business concepts and how they relate
Allow participants to quickly try out multiple domain models so they can see where those concepts work and where they break down
Focus on concrete examples, not abstract ideas

11.7. Event Storming Workshop Facilitation Techniques

Facilitator hangs the first sticky note
Ask questions
- Where does the money come from?
- What are the targets? How will we know we have reached them?
- Is there something missing here? Why is there a gap?
- For whom is this event important? (end user, stakeholder, etc.)
Visualize alternatives
- It is too early to decide; let divergence happen
Reverse narrative
- Start from the end – what needs to happen before so that this event can happen too?
Interrupt long discussions
- Visualize every opinion and ask every party if they feel their opinion is accurately represented
Timebox
- Use the pomodoro technique (25 mn); after each pomodoro, ask what is going well and what isn’t – move on even if the model is incomplete
Constantly move sticky notes to create room for hotspots
- Start from the end – what needs to happen before so that this event can happen too?
Hang red sticky notes when you feel there is an issue
At the end, take a photo
- You can throw the model away and start again with different people

12. Event-Driven Architecture

This section discusses the concepts and benefits of event-driven architecture.

As its name implies, event-driven architecture is centered around the concept of "event"; that is, whenever something changes an event is issued to notify the interested consumers of such a change. Event is a powerful concept to build architecture around because of the immutable and decoupling nature of events, as well as being a great way to design and build domain logic. The following sections detail the concepts and benefits of event-driven architecture, and then dive into the practical details of implementing such an architecture.

12.1. Concepts of Command/Query/Event

First, before diving into the event-driven architecture style, we will define the Command, Query, and Event concepts and their relations regarding time and state management:

A command represents the intention of a system’s user regarding what the system will do that will change its state
A query asks a system for its current state as data in a specific model
An event represents a fact about the domain from the past; particularly, on every state change the systems perform, it will publish an event denoting that state mutation

Figure 17, “Concepts of Command/Query/Event and their Relation to Time” illustrates these concepts related to time.

Figure 17. Concepts of Command/Query/Event and their Relation to Time

Command

“A command is a request made to do something.”

A command represents the intention of a system’s user regarding what the system will do to change its state.

The result of a command can be either success or failure, the result is an event
In case of success, state change(s) must have occurred somewhere (otherwise nothing happened)
Commands should be named with a verb, in the present tense or infinitive and a nominal group coming from the domain (entity of aggregate type)

Query

"A query is a request asking to retrieve some data about the current state of a system."

A query asks a system for its current state as data with a specific model.

Query never changes the state of the system (it is safe)
Qquery processing is often synchronous
The query contains fields with some value to match for or an identifier
Query can result in success or failure (not found) and long results can be paginated
Queries can be named with “get” something (with identifier as arguments) or “find” something (with values to match as arguments) describing the characteristics of data we want to retrieve

Event

"Something happened that domain experts care about." (Eric Evans)

An event represents a fact about the domain from the past; particularly, on every state change performed by the system, it will publish an event denoting that state mutation.

Events characteristics:

Events are primarily raised on every state transition that acknowledged the new fact as data in our system
- Events can also represent every interaction with the system, even when there is no state transition as the interaction or the failure can itself be valuable (for instance, the failure of an hotel booking command because of no-vacancy can be an opportunity to propose something else to the customer)
Events reference the command or query identifier that triggered them
Events can be ignored, but can’t be retracted or deleted; only a new event can invalidate a previous one
Events should be named with a past participle
There are internal and external events:
- Internal events: the ones raised and controlled in our bounded context (see [bounded-context])
- External events: the ones from other upstream bounded contexts to which we subscribed

Events are published whenever there is an interaction with the system through a command (triggering or not a state transition; if not, the failure is also an event) or a query (no state transition, but the interaction is interesting in itself, such as for analytics purposes).

12.2. Benefits of Event-Driven Architecture

Better Handling of State, Concurrency, and Time

Basically, command and query represents the intention of the end users regarding the system:

A command represents the user asking the system to do something – they are not safe as they will mutate state
A query asks the current state of the system – they are safe as they will not mutate any data

This distinction relates to state and time management as well as expressing what the user wants the system to do and, once the state has mutated, the system will publish an event notifying the outside world that something has happened.

The Real World is Event-driven!

The world is event-driven as the present is very difficult to grasp and we can only clearly separate past and future. Past is the only thing we can – almost – be sure of and the event way of describing the result that occurred in the past or should occur with the future system is to use an event. It is as simple as "this happened". The event storming workshop format is one of the most efficient and popular ways of grasping a domain for people involved in software design. (See Chapter 11, Event Storming Workshop for a description of the technique.)

Loose-coupling & Strong Autonomy

Event mechanisms loosen the coupling between the event publisher and the subscribers: the publisher doesn’t know its subscribers and their numbers. This focus on event also enforces a better division of concerns and responsibility as too many events or coarse-grained events can be a design issue.

Focus on Behavior and Changeability

Commands and events force software designers to think about the system’s behavior instead of too much focus on its structure.

Software designers and developers should focus on separating their domain logic between:

Contextual data: retrieving data for that command to execute
Decision logic: whether this command is actually able to operate given the current contextual state (that includes any data coming from external systems, such as market data as well as time)
State mutation logic: once the context data is retrieved and decisions are made, the domain logic can issue what the state mutations are – whether internal and external
State mutation execution: this is where transactional mechanisms come into play, being automated for a single data source or distributed using a Saga pattern (see Section 12.7, “Ensuring Global Consistency with Saga Patterns”) and compensating transactions
Command execution result: the command execution result, be it a success or a failure, is expressed as an event published for private or public consumption

This separation of different concerns leads to greater changeability of software systems.

Better Operability with Events: Stability, Scalability, Resilience, and Observability

Commands can be used to better represent the user’s intention and fit well with deterministic domain logic that benefits from consensus algorithms such as the Raft Consensus Algorithm (see https://raft.github.io/) to distribute the execution of business logic on several machines for better resiliency and scalability along with some sharding logic (see http://www.startuplessonslearned.com/2009/01/sharding-for-startups.html).

Good operability needs good observability of the running systems and this is reached by strong logging practices. Good logs are actually events in disguise; replaying a system behavior through logs is actually following the flow of technical and domain events that exhibit what the system has done.

12.3. Event Sourcing

The main idea of event sourcing is to store all the "events" that represents stimuli asking for state change of the system, then being able to reconstruct the system’s end state by applying the domain logic for all of these events in order.

The event store becomes the source of truth and the system’s end state is the end result of applying all theses events. Event-driven architecture doesn’t mean event sourcing.

12.4. Command Query Responsibility Segregation (CQRS)

CQRS is an architecture style that advises us to use a different data model and storage for command (asking for a state change, aka a "write") and query (asking for the current state, aka a "read").

The main motivation of CQRS to use these dedicated models is to simplify and gain better performance for interactions with the system that is unbalanced (read-intensive/write-scarce or write-intensive/read-scarce interactions). If CQRS simplifies each model in itself, the synchronization and keeping all the models up-to-date also brings some complexity. CQRS can be implemented sometimes without proven requirements and can lead to some over-engineering.

12.5. Command, Query, and Event Metadata

Every command, query, and event should share the same metadata, essentially telling us who is emitting when, where, and what is the relevance, and also each artifact should be uniquely identified. This metadata can be the following:

Identifier: an identifier, such as a UUID or a URN (https://tools.ietf.org/html/rfc8141)
Type: an identifier of the type of this artifact; it should be properly namespaced to be unique among several systems
Emitted-by: identifier of the person or system emitting the command/query/event
Emitted-at: a timestamp in UTC timezone of when the command/query/event was emitted by the source
Source: the source system that emitted the artifact (in case of a distributed system, it can be the particular machine/virtual machine that emitted that artifact)
Various key: for partitioning the artifact among one or several values (typically, is the command issued for a particular use organization of the system, etc.)
Reference: for event, the command or query that triggers that particular event
Content-type: the content type of the payload
Payload: a payload as a data structure containing everything relevant to the purpose of the artifact

The Clound Native Computing Foundation issued a specification describing event data in a common way (see https://cloudevents.io/).

12.6. System Consuming Other Systems' Events

Events originate from bounded context that must be explicitly defined and then they can be consumed by others' bounded contexts. The traditional DDD’s context map and the various strategic patterns of bounded context integration are very useful tools for mapping the landscape.

We also advocate to translate each event that comes from another bounded context to a command of the context that consumes it to denote explicitly the intention behind the event’s consumption.

12.7. Ensuring Global Consistency with Saga Patterns

To ensure global consistency across multiple systems we need mechanisms to "rollback" or compensate the effect of applying a command to get back the systems’s global state to a consistent one (a consistent state does not means to the state that existed before applying all those commands; a consistent state is where all the sub-systems are consistent with their data).

As an example, think about the way your bank "cancels" a contentious movement on your account; the bank doesn’t remove the contentious movement, instead it issues a new movement compensating the effect of the bad one. For instance, given a contentious debit movement of $100, the bank issues a credit movement of $100 to get back to a consistent balance even if the movements list of the account now exhibits two movements cancelling each other.

As a first step, we need to identify the inverse of each command that will cancel the effect or "compensate" a former one. The Saga patterns describe the structure and behavior to attain such a consistency goal: in case of failure of one command, the other services issue new commands that compensate the former one, hence "rollback" the whole distributed transaction as a result. Two types of the Saga pattern exist: orchestration and choreography.

12.7.1. Saga Pattern: Choreography

In the choreography Saga pattern, each service produces and listens to other services' events and decides whether an action should be taken.

Benefits:
- Simple and easy to understand and build
- All services participating are loosely-coupled as they don’t have direct knowledge of each other; a good fit if the transaction has four or five steps
Drawbacks:
- Can quickly become confusing if extra steps are added to the transaction as it is difficult to track which services listen to which events
- Risk of adding cyclic dependency between services as they have to subscribe to one another’s events

12.7.2. Saga Pattern: Orchestration

In the orchestration Saga pattern, a "coordinator" service is responsible for centralizing the Saga pattern’s decision-making and sequencing business logic.

Benefits:
- Avoid cyclic dependencies between services
- Centralize the orchestration of the distributed transaction
- Reduce participants' complexity as they only need execute/reply commands
- Easier to implement and test; rollback is easier
Drawbacks:
- Risk of concentrating too much logic in the orchestration
- Increases the infrastructure complexity as there is one extra service

12.8. Software Design and Implementation

Command, Query, and Event as the Interface of the Software System

Figure 18. Command/Query/Event as the Interface of the Software System

Command/Query/Event Declaration

Command, Query, and Events are represented using a data record structure; they should not have any operations associated with them, contrary to a value type, for instance. Each one should have an identifier, even if they are immutable, to reference them easily.

Command/Query/Events are respectively declared with the "defcommand", "defquery", and "defevent" macros. The first argument is a keyword for naming the concept, and then a keyword of a spec describing the payload of that command, query, or event. For example:

(s/def ::panier (s/keys :req-un [::name ::org-ref ::created-by ::created-at ::updated-by ::updated-at ::notices]))
(defcommand ::create-panier-command ::panier)
(defevent ::panier-created ::panier)

Hexagonal Architecture

Hexagonal architecture decouples the application, domain, and infrastructure logic of the considered system. It does so by using interfaces (ports) located in the domain part and implementations (adapters) located in the application and infrastructure parts that are wired into the domain. The event bus abstraction belongs to the infrastructure part of the architecture; several implementations can then fulfill the infrastructure requirements (distributed or not), forwarding events to the browser with Server-Side-Events (SSE).

Hexagonal architecture allows effective decoupling of application, domain, and infrastructure concerns in a back-end system. In our opinion, hexagonal architecture is closely related to event-driven architecture as it allows an effective decoupling of the event publication and domain logic with the implementation details of the event infrastructure that can be quite tedious.

Figure 19. Hexagonal Architecture with Events

Bus Abstraction

The event bus has two sides: publishing events and subscribing to events. These two operations are exposed on an interface that can be implemented with various means, hence being a Service Provider Interface in a hexagonal architecture.

Therefore, bus implements the following protocol:

(defprotocol EventBus
  "A simple pub/sub interface for a publishing event indexed with key k (the key value must be extracted from the event and is up to the implementation; this ensures homogeneity of the key extraction from the event)"
  (subscribe [_ f]
             [_ k f] "Subscribe to events, optionally filtered with the key value k, then callback the function f")
  (publish! [_ event] "Publish the event to that bus "))

Each event needs to be classified depending on a value it holds; the exact value extracted from the event needs to be parameterized depending on the bus, but each bus should do it in a homogeneous way. So, the bus creation needs to have a key extraction function as an argument; usually it will be whether the organization that is concerned by this event – the ":org-ref" – or the event type ":event-type" to which every event belongs.

Trigger Reaction to Events for Subsequent Domain Logics

Some events can trigger other domain logic (e.g., a search query triggers adding this query to the user search history). Events here allow the decoupling of the business logic between the publishing context and the reacting context.

Event Publication

Domain logic that sits in back ends publishes events on every API interaction from the clients. These events are published through the EventBus interface with various implementations detailed in the following sections.

Kafka Messaging System

Kafka is used as the distributed messaging system that collects and distributes events between sub-systems.

Kafka Subscription

Back-end Bus Systems

Multicast Event Bus

Kakfa Event Bus

In-memory Event Bus

Metrics Exposition with Prometheus through Event Subscription

Metrics are a numeric representation of data measured over intervals of time. Metrics can derive knowledge of the behavior of a system over intervals of time in the present and future. Events counting through Prometheus counters (see https://prometheus.io/) is a great way to observe the system’s behavior over time. Metrics have:

Name
Timestamp
Labels (coming from the event metadata itself)
Value (can be as simple as +1 for a counter that denotes each published event)

Back ends expose technical and domain metrics through Prometheus endpoints for monitoring and alerting concerns. The business events are a strong proxy to any technical incidents; monitoring domain events should be the main values to monitor over time.

The biggest advantage of metrics-based monitoring over logs is that, unlike log generation and storage, metrics transfer and storage has a constant overhead. Metrics are also better suited to trigger alerts over aggregation of events.

Events Push to Connected Clients/Users with SSE Subscription

Once a user connects to the application, the browser app needs to collect all the events related to the organization to which this connected user belongs.

13. Agile Governance

This section discusses agile governance in the context of Enterprise Architecture and the business enterprise.

"Our highest priority is to satisfy the customer through early and continuous delivery … Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.“ [Agile Manifesto]

13.1. What is Governance?

At different levels of an organization, stakeholders and customers need to align on what is being delivered, and ascertain that goals are being achieved and policies adhered to. This alignment is done among continuous change driven by demand both internal and external – competitors, regulation, innovation, operations. A certain level of control and management is required.

The TOGAF standard defines governance as: “the ability to engage the involvement and support of all parties with an interest in or responsibility to the endeavor with the objective of ensuring that the corporate interests are served, and the objectives achieved” [TOGAF 2018].

[COBIT 5]^[1], in Principle 5, makes an important distinction between the management activities/processes and governance itself.

Governance ensures that enterprise objectives are achieved by evaluating stakeholder needs, conditions, and options; setting direction through prioritization and decision-making; and monitoring performance, compliance, and progress against agreed direction and objectives.

Management plans, builds, runs, and monitors activities in alignment with the direction set by the governance body to achieve the enterprise objectives.

Another standard, [ISO/IEC 38500], applies to the governance of management processes (and decisions) relating to the information and communication services used by an organization. The standard provides a structure for governance of IT to assist those at the highest level of organizations to understand and fulfill their legal, regulatory, and ethical obligations regarding their organization’s use of IT.

ISO/IEC 38500 is driven from the top down; IT departments need to make sure that they are ready for the new demands the board will pose (e.g., performance measurements, clear governance mechanisms). One criticism of both COBIT and ISO/IEC 38500 is that these frameworks are reactive and don’t adapt to quick change well; an example is change outside budgetary cycles, and the time it takes to enact these changes.

Agile doesn’t contradict both the COBIT and ISO/IEC 38500 definitions and guidance; in fact, it emphasizes the interaction of individuals, collaboration, and ensures the stakeholder objectives/goals are achieved. Often in non-Agile organizations, ownership is fluid between business and IT, whereas in Agile organizations the business stakeholder (product owner) "owns" the value chain. This risk has been reduced in Agile IT governance frameworks as it focuses on recognizing the right person as the suitable owner of the portfolio/program/project at the right level from inception to delivery.

13.2. Why Governance?

"To efficiently create chaos, Langdon realized, requires some order." (Dan Brown)

There is a need for procedures and policies to be in place to reconcile sometimes what can be described as conflicts of interest, agreements, responsibilities, and rewards, and update stakeholders on "contracts" that have been made.

Critics of Agile say that Agile methods allow teams to work in an unstructured way, which prevents clear lines of accountability and discourages documentation. On the other hand, proponents of Agile delivery argue that the methods rely on information on the current status of the project being visible to the whole business, instead of central processes for command and control. They also believe that, because the methods are designed to be self-assuring, there is proper governance and accountability built into Agile practices. Control processes, in Agile, are more collaborative and are run continuously by the business owner of the product or service. On the other hand, critiques of the status quo argue that “IT governance is killing innovation” [HBR 2013]; governance does not allow us to respond quickly enough to new demand.

Figure 21. A Balancing Act [Luna 2014]

Both camps agree that communicating on the status, progress, and budget and satisfying regulatory and auditing requirements are required.

Governance in Agile is a balancing act; on one hand autonomy, and on the other accountability to a “contract”, freedom to innovate versus following proven routines and policies [HBR 2017].

On this same point, Isdell’s “freedom within a framework” [Kesler 2008], governance became the means to corralling an accepted level of chaos – a way to engage the natural tension between many new global initiatives and the need for geographic General Managers to get more aggressive about finding local solutions to brand, product, and revenue gaps.

13.3. Design and Product-Led Enterprises

Increasingly businesses, in response to competitive pressures and new Agile competitors, are adopting a product-led approach to their enterprises which prompts their customers to use their products and share them with their networks. These products and networks run on "platforms" that have a significant viral effect (network effect: https://en.wikipedia.org/wiki/Network_effect).

What has this to do with Agile governance? Agile teams, and in consequence Enterprise Architecture, co-create and deliver customer-centric products (not projects). The implication is the need for cross-functional teams (business and technology) that are product-led. This poses a huge challenge to "legacy" data in siloed organizations that need to reinvent their own organizations and operating models. Other aspects to consider are metrics, performance, and incentives; how to recognize success or failure?

An Agile governance structure needs to recognize these new ways of working and to measure value.

The following sections assume you are on a journey to a customer-centric product-led organization where "Agile" is the way of working [McKinsey 2016].

"Product-led" Agile Enterprise Architecture Governance

"At PepsiCo®, we are leveraging design to create meaningful and relevant brand experiences for our customers any time they interact with our portfolio of products." (https://hbr.org/2015/08/pepsicos-chief-design-officer-on-creating-an-organization-where-design-can-thrive)

A product or service is delivered to a customer via a series of activities; the chain of activities that delivers a valuable product or service to a customer is usually referred as a "value chain". The concept was first described by Michael Porter [Porter 2004].

An Agile Architecture governance structure could look like the following diagram.

Figure 22. Agile Governance – Organization Viewpoint

Key Areas

Within an enterprise, governance structures exist that need to adhere to internal policies and external regulatory frameworks. Corporate and IT governance are such "bodies" that with support of frameworks (like COBIT and ISO/IEC 38500) help enterprises to manage such adherence. In a product-led organization, customer focus is essential and has increased regulatory attention [PWC 2013]. Architecture is represented by the organization’s Chief Architect or delegates on each of the governance boards as required.

The Enterprise Technology Board is a decision and policy-making group charged by the CIO/CTO with driving and influencing the Enterprise Architecture of the organization aligned with its strategic goals and objectives. Architecture should not impose a model nor define it; it should be an outcome as a result of collaboration within the enterprise – IT and business. As part of the Centers of Enablement and in conjunction with the business and product owners, architecture is defined and enabled via Innovation Hackathons. These innovation hackathons play a major part in enabling technology innovation and collaboration within the enterprise but also in partnership with external stakeholders such as vendors. Co-creation is an important feature of these enablement centers.

Within the product value chain, the product owner, with the support of the product architect, prioritizes features considering customer needs and existing technology roadmaps ("near-term roadmapping"). The impact of changes as a result of constant refinement is governed within the Product Architecture Board.

Teams make architectural design decisions; de-centralized decision-making focused on architecture and design quality and ensuring adherence to enterprise principles – but also these decisions are future-proof (business and technology vitality) and adhere to the solution Definition of Done (DoD) (testable, scalable, accepted by the product manager, regulatory/compliance standards met).

13.4. Public Links to Agile Governance Use-Cases

UK Audit Office: Governance for Agile Delivery: Examples from the Private Sector July 2012: https://www.nao.org.uk/wp-content/uploads/2012/07/governance_agile_delivery.pdf
Business Agility Institute: Agile Governance: Not an Oxymoron by Bala Bulusu, February 2019: https://www.youtube.com/watch?v=bgtOP5ArwIU&feature=youtu.be

14. Legacy Integration and Modernization

In most enterprises, legacy systems have grown in size and complexity to a point where they become an impediment to developing new digital capabilities. This playbook proposes guidelines to:

Connect new digital capabilities to legacy systems
Modernize legacy applications

14.1. Progressive Journey from Monolithic to Modular

New digital applications or platforms need to integrate with legacy systems. For example, new digital banking applications are likely to need the services provided by legacy core systems or Enterprise Resource Planning (ERP).

Effective integration mechanisms are required because pure greenfield or "big bang" approaches are not realistic. However, such integration mechanisms are not easy to architect, implement, and operate. The main challenges are:

To maintain data consistency when data duplication will occur by construction
To architect production-ready systems that are safe, secure, compliant, and that can scale
To cope with the variety of data models that are inherited from a variety of legacy systems developed in different countries using different technologies
To bridge the old software architecture practices with an entirely different way of thinking and reasoning about software architecture

The ambition of this playbook is to provide a roadmap to help progressively architect the overall system into a loosely-coupled one. Figure 23, “Monolithic to Modular Journey” illustrates this journey.

Figure 23. Monolithic to Modular Journey

The steps toward modularization are described below:

Create RESTful API domain extensions to legacy systems. Use mediation to translate legacy data types into API data types (Anti-Corruption Layer).
Decouple front-end from back-end development by jointly defining APIs that cater to the needs of front-end developers while being “implementable” by back-end developers (e.g., Reactive REST architectures, GraphQL technology developed by Facebook®, etc.).
Start modularizing the monolith by formalizing boundaries between sub-domains. Mix request-response style APIs with asynchronous message passing (events) to connect them.
Further modularize by creating microservices aligned with context boundaries (see Chapter 16, Domain-Driven Design Strategic Patterns). Move further toward event-orientation by implementing the Event Sourcing pattern.
From ACID to BASE (Basically Available, Soft State, Eventual consistency).

In order to facilitate steps 1 and 2, instead of a relying on a single API with a flat set of endpoints, Etsy® (see https://www.etsy.com/developers/documentation/getting_started/api_basics) created a two-layer API using meta-endpoints. Similar to a pattern used by Netflix® and eBay’s ql.io, each of Etsy’s meta-endpoints aggregates several other endpoints. This enables server-side composition of low-level, general-purpose resources into device or view-specific resources.

14.2. API Limitations

Since the heyday of SOA the idea of creating an API layer above legacy systems is seen by many as the magic bullet that can solve a majority of legacy integration problems.

An API is an application software intermediary that enables a software program to interact with other software. In the context of trading, an API can, for example, enable your software to connect with a broker to obtain real-time pricing data or place trades.

The quality of an API is constrained by the quality of the software that publishes it. A good API shields the internal complexity of the software that provides it and presents data in a usable way. Because legacy software is often complex and monolithic, it is a challenge to design APIs that are not prone to abstraction leak. Abstraction leak refers to a situation where an API consumer has to understand unnecessary implementation details to use it. When it happens it creates unintended coupling that puts the system’s agility in jeopardy.

Designing APIs requires business domain knowledge. Technologies such as API managers provide little if any help solving the hard problems which are:

How to design useful information architecture
How to design loosely-coupled systems that pre-condition "non-leaky" APIs

Designing APIs that external developers will love is key to creating the kind of digital ecosystems that characterize many digital business models. Designing the information architecture that will guide API definition does not happen in a vacuum. Best practices show that it is driven by business goals, persona modeling, and task analysis.

14.3. Read versus Write

Legacy applications when they are systems of record hold the authoritative version of data. For example, a payment application that makes a credit transfer. Primary data is generated by a system of record when it processes a business event; it cannot be derived from other data.

When integrating a new application with a legacy one that manages primary data, it is important to distinguish when:

The new application only needs to read or query primary data
It processes a business event that will result in the creation or modification of that primary data

In the latter case, it is safer to let the legacy application modify the primary data because it implements the business rules that ensure the modification will be performed in the right way. The new application could become the new system of record if it would implement equivalent business rules. This is often impractical because legacy applications tend to be poorly documented.

When a legacy application exposes a "write" API, it may increase the transactional load to a level that is not sustainable. For example, the third-party securities lending system of a global custodian could not include custody system calls within the boundary of a distributed transaction. In addition, developing APIs on top of some old technologies such as IMS/DC may present specific technical challenges.

Using the synchronous API style, in particular when write-APIs are in scope, decreases the ability of the system to stay responsive in the face of a failure (lower resilience). It also decreases its ability to respond in a timely manner because it has to wait for the response of a legacy application’s API (less responsive).

14.4. Asynchronous Message Passing and the Saga Pattern

When the synchronous API style has drawbacks that exceed its benefits, the alternative is to use asynchronous message passing which leads to a new kind of architecture design that is characterized by:

Controlled data replication that ensures consistency across redundant data sources
Event-driven logic where noteworthy state changes are broadcasted to interested software components that can respond to them
Eventual consistency that only guarantees that all replicas will eventually become consistent

Asynchronous message-passing still supports business transactions, but does it using Sagas. The Saga pattern describes how to implement business transactions without two-phase commit as this does not scale well, in particular in cloud-native systems.

The business transaction is divided into multiple steps or activities. The Saga pattern has the responsibility to either get the overall business transaction completed or to leave the system in a known termination state. So, in case of errors a business rollback procedure is applied which occurs by calling compensation steps or activities in reverse order. This pattern is not new, though it was not named Saga. For example, the idea of compensating transactions has been used in the past to process payments or securities handling business transactions.

Unlike two-phase commit transactions that are handled automatically by the database or the middleware, the Saga pattern may require specification of specific business logic to handle the consequences of business events. For example, if the securities that have been loaned (third-party lending) are sold, the securities lending system will replace them with other available securities. If the securities are no longer available, the system has to inform the trader and assist her resolving the issue.

14.5. Entities, Business Events, and Values

Business objects that have an identity which remains the same throughout their life are modelled as "entity objects". Clients, trades, and securities are examples of entity objects which experience state changes during their life. It is important to identify entity objects in a way that spans the life of a system and can extend beyond it. This is key to successful integration with legacy systems because entity objects are likely to be replicated.

The example of When Issued (WI) transactions illustrates that identity management is more than specifying tables’ primary keys. For example, a treasury bond can be purchased or sold when it has been authorized but not yet issued. A dummy CUSIP number is created to identify the security before CUSIP Global Services (CGS) creates the official one. The WI trade is conditional and settlement can occur only when the security has been listed and has been attributed with an official CUSIP number.

Some changes of an entity object state can have business consequences. For example, in the case that the WI security is not listed or admitted to trading, all transactions effected during that period are declared void by the exchange. This state change is a business event that triggers downstream actions to undo the WI trades. That is why asynchronous communication using IBM MQSeries® is not sufficient. Communicating events (state changes) is better suited to express dynamic business rules. Just sending messages that represent the state of an entity at a point in time does not inform the listener that a business event happened.

An entity object has attributes that describe its state. A state change of an entity object modifies the value of some of its attributes. Depending on the context, some attributes may not be relevant. For example, the front office does not need some of the data that will be required at settlement time.

Master data is about managing in a consistent manner the identifiers and key attributes of core entity objects of the bank such as Party, Security, or Account. Referential data can be fully managed by new digital applications when they implement all the business rules required to modify primary data. When this is not the case, derived master data can be replicated in new digital systems and accessed in read-only mode: updates being still handled by "legacy" systems of record.

Too many legacy systems are polluted by inflated master data structures that aggregate all the data that describe entities regardless of their context and usage. This tends to increase the overall system’s complexity and promote high coupling. When integrating new digital applications with legacy ones, it is preferable to protect new software code from unnecessary complexity that can pollute it.

14.6. Anti-Corruption Layer and Mediation Logic

The illusion of the "all encompassing" enterprise data model needs to be replaced by a modular way of architecting models. Modeling should be driven by use-cases specific to each context. For example, trading is different from clearing and settlement though some entities will span more than one context. When interfacing a new digital cloud-native application with a legacy one, an anti-corruption layer can be created to shield the new code from legacy complexity. The anti-corruption layer can mediate or translate old data structures into cleaner ones that meet digital application needs.

When designing the resource model of a RESTful API or an event data structure, business domain concepts should prevail over constraints imposed by legacy data structures. Finding the right balance between clean versus unclean design cannot be solved by technology alone; it requires business domain expertise.

14.7. Combining Integration Patterns

The simplified diagram in Figure 24, “Legacy Integration Strategies” represents how to combine integration patterns.

Figure 24. Legacy Integration Strategies

APIs can open legacy applications’ functionality to the outside.

An entity can be replicated in more than one system as long as its identity is preserved and data lineage properly managed. REST resources map to business entities. Derived data is formatted to meet the needs of the data consumer and cached to minimize network traffic.

Legacy applications publish events that model entity state changes that are of interest to digital cloud-native applications. Sagas whose scope spans digital and legacy help maintain the overall system’s (eventual) consistency.

The diagram also shows that the legacy application manages Entity D, which could be Instrument. If the Instrument’s price changes, the legacy application can publish a business event that represents that change. A new digital application that consumes the price change event can, for example, trigger a state change of limits to the order it manages. The digital application emits a new event that ascertains that the order (Entity E3) has been released for execution. This type of causality chain can be expressed in a graph that can drive Saga logic.

We believe that architecture models that only represent a system statically are incomplete. They need to be completed by models that represent the system’s behavior. The scalability, resilience, and responsiveness of a system cannot be verified in the absence of dynamic modeling.

14.8. Applying the Strangler Pattern to Decommission Legacy Systems

Monolithic legacy systems often combine what would be several bounded contexts in a well-modularized system. The approach consists of identifying these candidate bounded contexts and starting to develop new application components at the edge of the legacy system. For more details on the strangler pattern, see Chapter 15, Strangler Pattern.

Figure 25, “Strangling the Legacy Monolith” illustrates the process.

Figure 25. Strangling the Legacy Monolith

A new application component bc1 (see https://www.etsy.com/developers/documentation/getting_started/api_basics) is developed using a cloud-native application style (e.g., microservices). The application component bc1 is interfaced with the legacy system using patterns described in Figure 24, “Legacy Integration Strategies” (in particular the anti-corruption pattern). The corresponding functions and features are removed from the legacy system.

The process repeats itself to develop other bounded contexts (bc2 to bci) at the edge of the legacy systems. When the last bounded context (bcn) is developed, the legacy system can be decommissioned. It has been strangled!

Some recommend a more aggressive approach; they claim that you should starve your monolith to death (see https://read.acloud.guru/if-you-cant-strangle-the-monolith-starve-it-to-death-fcc824d3c82). Because decommissioning legacy systems requires significant investment, attention should be given to the economic side of the equation. It is good to starve your monolith rapidly if the business case proves positive.

Conclusion

The approach we have described is key to better integrate digital applications with legacy ones and to ultimately decommission them. However, the reader may think that the learning curve is too high and the enterprise is not up to the challenge.

The obvious alternative would be to develop new digital capabilities with the same old architecture models mastered by the IT organization. This is not such a good idea because:

Classical distributed computing models scale vertically up to the limit of "big iron" computers’ power
The enterprise would lose the elasticity, scalability, and cost benefits of cloud-native computing
The enterprise would be at a competitive disadvantage vis-à-vis market players who master this class of technology

Part 3: Architecture Patterns

This section contains a set of architecture patterns that can be reused by Agile architects to define an architecture "to-be" state or a solution.

MITRE® defines an architecture pattern as: "a method of arranging blocks of functionality to address a need. Patterns can be used at the software, system, or enterprise levels. Good pattern expressions tell you how to use them, and when, why, and what trade-offs to make in doing so. Patterns can be characterized according to the type of solution they are addressing (e.g., structural or behavioral)." [MITRE]

The TOGAF framework has not yet integrated architecture patterns but has published a template to describe them [TOGAF 2018, Chapter 28].

15. Strangler Pattern

This chapter introduces the "strangler" architecture pattern.

Martin Fowler described how to create a new system around the edges of the old, letting it grow slowly over several years until the old system is "strangled" [Fowler 2004].

Chris Stevenson published a paper that describes how his team rewrote a legacy application by creating new features using the pattern described by Martin Fowler [Stevenson 2004].

Eric Evans wrote a document that describes how to get started with Domain-Driven Design (DDD) when surrounded by legacy systems [Evans 2013]. It describes four strategies to progressively modularize a monolithic legacy system by applying DDD.

16. Domain-Driven Design Strategic Patterns

This chapter introduces the DDD strategic patterns.

The term "Domain-Driven Design" (DDD) was coined by Eric Evans in his book Domain-Driven Design: Tackling Complexity in the Heart of Software [Evans 2003].

Domain-Driven Design (DDD) offers strategic building blocks for analyzing and structuring the problem space and the solution space.

16.1. Problem Space

The problem space holds what the enterprise does – its business capabilities – to keep it running and able to operate. A business capability is a specific function or ability the enterprise possesses in order to achieve its goals.

The problem space describes several things:

The usages of the customers and employees of the enterprise
The words used by the people and their meanings – the domain language is the language used by people as it is, so it can be messy and organic
The requirements and constraints of the business
The people who operate the business

The problem space holds the domain within which the enterprise operates and represents the world as we perceive it; it describes the Business Architecture.

The domain is the set of concepts that, through use-cases, allows people in the enterprise to solve problems.

Sub-domains

A domain can be decomposed into sub-domains which typically reflect some organizational structure. Sub-domain boundaries are determined in part by communication structures within an organization. The sub-domains are stable; they change only for strategic reasons and are independent of software.

Example of an E-commerce System

An e-commerce system consists of a product catalog, an inventory system, a purchasing system, and an accounting system, etc. They are sub-systems in that the system as a whole is partitioned into them. The system is partitioned in this specific way because the resulting sub-systems form cohesive units of functionality.

How to Identify Sub-domains

Domain knowledge is key to decomposing a domain into sub-domains that have a high level of internal cohesion and minimum dependencies with other sub-domains. Conducting event storming workshops is a great way to accelerate the acquisition of domain knowledge and explore domain decomposition scenario. The event storming workshop technique is introduced in Chapter 11, Event Storming Workshop.

Distillation

The enterprise operates with several sub-domains. Depending on its business, some are generic (such as accounting or HR), some are support, and some are core, meaning the current strategy directly relies on the core domains to attains its goal. Not all parts of a large system will be well designed.

The core domain is the domain that directly contributes to the current enterprise’s strategy.

Strategy is defined as: "Strategy describes the organization;s objectives according to the environment and the available resources, then the resources allocation in order to create value for the clients along with profits for the organization and its employees."

16.2. Solution Space

The business capabilities are almost the same for different enterprises involved in the same business, but their implementations – the solution space – will differ. While sub-domains delimit the applicability of domains, bounded contexts delimit the applicability of domain models. As such, the bounded context is within the solution space.

Bounded context is the solution as we design it. It describes the software architecture and is used to manage the complexity, and is therefore linked to the business.

Bounded context means different models of the same thing (e.g., book, customer, etc). Bounded context is represented by models and software that implement those models. This is where we find patterns and heuristics.

Domain Model and Ubiquitous Language

A language structured around the domain model and used by all team members to connect all the activities of the team with the software.

— Eric Evans
Domain-Driven Design: Tackling Complexity in the Heart of Software

The ubiquitous language is a deliberate language designed to be unambiguous and on which all stakeholders agreed. This language is found in every artifact manipulated by the stakeholders (UI, database, source code, documents, etc.). The concepts conveyed by the domain model are the primary means of communication; these words should be used in speech and every written artifact. If an idea cannot be expressed using this set of concepts, the designers should iterate once again and extend the model, and they should look for and remove ambiguities and inconsistencies. The domain model is the backbone of the ubiquitous language.

Bounded Context

An operational definition of where a particular model is well-defined and applicable. Typically a sub-system, or the work owned by a particular team.

— Eric Evans
Domain-Driven Design: Tackling Complexity in the Heart of Software

A bounded context delimits the applicability of a particular model so that team members have a clear and shared understanding of what has to be consistent and how it relates to other contexts. Bounded contexts are not modules,

Bounded contexts separate concerns and decrease complexity. A bounded context is the boundary for the meaning of a model. A bounded context creates autonomy, hence allowing a dedicated team for each. Bounded contexts simplify the architecture by separating concerns.

How to Identify a Bounded Context?

Conflicts of naming suggest different contexts.

Context Map

A context map describes the flow of models between contexts and provides an overview of the systems landscape. A context map help to identify governance issues between applications and teams. It helps us to see how teams communicate and their "power" relationships. With a context map we get a clear view on where and how bad models propagate through IS landscapes.

You can use the metaphor of a river flowing to describe the relations between two bounded contexts: if you are upstream and pollute the river, the downstream people will be impacted - not the opposite.

A relationship between two bounded contexts in which the upstream group’s actions affect the downstream group, but the actions of the downstream do not affect the upstream. It is not about the data flow’s direction, but about the models' flow.

We can categorize the context map patterns in three categories:

Upstream patterns: Open Host Service and Event Publisher
Midway Patterns: Shared Kernel, Published Language, Separate Ways, Partnership
Downstream Patterns: Customer/Supplier, Conformist, Anti-corruption Layer

Upstream Patterns

Figure 26. Domain-Driven Design Context Map Upstream Patterns

Open Host Service

Define a protocol that gives access to your sub-system as a set of services. Open the protocol so that all who need to integrate with you can use it. Enhance and expand the protocol to handle new integration requirements, except when a single team has idiosyncratic needs. Then, use a one-off translator to augment the protocol for that special case so that the shared protocol can stay simple and coherent.

— Vaughn Vernon
Implementing Domain-Driven Design

Event Publisher

Domain events are something that happens in the domain and that is important to domain experts. An upstream context publishes all is domain events through a messaging system (preferably an asynchronous one) and downstream contexts can subscribe to the events that are relevant for them and conform or transform those events in their models (following an ACL) and react accordingly.

Midway Patterns

Figure 27. Domain-Driven Design Context Map Midway Patterns

Shared Kernel

Designate some subset of the domain model that the two teams agree to share. Of course this includes, along with this subset of the model, the subset of code or of the database design associated with that part of the model. This explicitly shared stuff has special status, and shouldn’t be changed without consultation with the other team.

— Eric Evans
Domain-Driven Design: Tackling Complexity in the Heart of Software

Published Language

The translation between the models of two bounded contexts requires a common language. Use a well-documented shared language that can express the necessary domain information as a common medium of communication, translating as necessary into and out of that language. Published Language is often combined with Open Host Service.

— Eric Evans
Domain-Driven Design: Tackling Complexity in the Heart of Software

Separate Ways

If two sets of functionality have no significant relationship, they can be completely cut loose from each other. Integration is always expensive, and sometimes the benefit is small. Declare a bounded context to have no connection to the others at all, enabling developers to find simple, specialized solutions within this small scope.

— Vaughn Vernon
Implementing Domain-Driven Design

Partnership

Where development failure in either of two contexts would result in delivery failure for both, forge a partnership between the teams in charge of the two contexts. Institute a process for coordinated planning of development and joint management of integration. The teams must cooperate on the evolution of their interfaces to accommodate the development needs of both systems. Interdependent features should be scheduled so that they are completed for the same release.

— Vaughn Vernon
Implementing Domain-Driven Design

Downstream Patterns

Figure 28. Domain-Driven Design Context Map Downstream Patterns

Customer/Supplier

When two teams are in an upstream-downstream relationship, where the upstream team may succeed interdependently of the fate of the downstream team, the needs of the downstream team come to be addressed in a variety of ways with a wide range of consequences. Downstream priorities factor into upstream planning. Negotiate and budget tasks for downstream requirements so that everyone understands the commitment and schedule.

— Vaughn Vernon
Implementing Domain-Driven Design

The freewheeling development of the upstream team can be cramped if the downstream team has veto power over changes, or if procedures for requesting changes are too cumbersome. The upstream team may even be inhibited and worried about breaking the downstream system. Meanwhile, the downstream team can be helpless, at the mercy of upstream priorities.

— Eric Evans
Domain-Driven Design: Tackling Complexity in the Heart of Software

Conformist

When two development teams have an upstream/downstream relationship in which the upstream has no motivation to provide for the downstream team’s needs, the downstream team is helpless. Altruism may motivate upstream developers to make promises, but they are unlikely to be fulfilled. Belief in those good intentions leads the downstream team to make plans based on features that will never be available. The downstream project will be delayed until the team ultimately learns to live with what it is given. An interface tailored to the needs of the downstream team is not on the cards.

— Eric Evans
Domain-Driven Design: Tackling Complexity in the Heart of Software

The downstream team eliminates the complexity of translation between bounded contexts by slavishly adhering to the model of the upstream team.

— Vaughn Vernon
Implementing Domain-Driven Design

Anti-corruption Layer

Translation layers can be simple, even elegant, when bridging well-designed bounded contexts with cooperative teams. But when control or communication is not adequate to pull off a shared kernel, partner, or customer-supplier relationship, translation becomes more complex. The translation layer takes on a more defensive tone. As a downstream client, create an isolating layer to provide your system with functionality of the upstream system in terms of your own domain model. This layer talks to the other system through its existing interface, requiring little or no modification to the other system. Internally, the layer translates in one or both directions as necessary between the two models.

— Vaughn Vernon
Implementing Domain-Driven Design

Mapping the Context Map Patterns

We can organize the context map patterns along two axis: Control and Communication.

Figure 29. Mapping Context Map Patterns

Part 4: Methods

This section aims at developing a "meta methodology" discourse on an existing method of interest, not developing an exhaustive description of these methods.

The relevance and applicability of methods contained in this section precedes the listing of references and resources of interest to the reader.

When needed, specific method knowledge can be incorporated into the O-AAF playbooks or pattern sections. For example, Domain-Driven Design (DDD) is described in Part 4: Methods and the DDD strategic patterns are incorporated into Part 3: Architecture Patterns because they are key to the O-AAF Standard.

This Snapshot document does not include method references. However, relevant method knowledge has been incorporated in the chapters that are included in this document.

Appendices

Appendix A: Abbreviations

ACL: Access Control List
ADM: Architecture Development Method
ADR: Architecture Decision Record
API: Application Program Interface
BASE: Basically Available, Soft State, Eventual
BDUF: Big Design Up-front
BPI: Business Process Improvement
BPR: Business Process Re-engineering
CGS: CUSIP Global Services
CMM: Capability Maturity Model
CMMI: Capability Maturity Model Integration
CQRS: Command Query Responsibility Segregation
CUSIP: Committee on Uniform Security Identification Procedures
DBMS: Database Management System
DDD: Domain-Driven Design
DoD: Definition of Done
ERP: Enterprise Resource Planning
FCA: Financial Conduct Authority
IaaS: Infrastructure as a Service
IMS/DC: Information Management System/Data Communications
ISACA: IS Audit and Control Association
LEI: Lean Enterprise Institute
MVA: Minimum Viable Architecture
MVP: Minimum Viable Product
NoSQL: Not only SQL
NVA: Non-Value Activities
O-AAF: The Open Group Agile Architecture Framework
PCMM: People Capability Maturity Model
P&L: Profit and Loss
RDBMS: Relational Database Management System
REST: Representational State Transfer
RVA: Real Value-Added
SBCE: Set-Based Concurrent Engineering
SEI: Software Engineering Institute
SOA: Service-Oriented Architecture
SGMM: Smart Grid Maturity Model
SSE: Server-Side-Event
UUID: Universally Unique Identifier
URN: Uniform Resource Name
WI: When Issued

1. COBIT® provides an implementable "set of controls over IT and organizes them around a logical framework of IT-related processes and enablers."