10. Governance, Risk, Security, and Compliance

10.1. Introduction

Operating at scale requires a different mindset. When you were starting out, the horizon seemed bounded only by your imagination, will, and talent. At enterprise scale, it’s a different world. You find yourself constrained by indifferent forces and hostile adversaries, some of them competing fairly, and others seeking to attack by any means. Whether or not you are a for-profit, publicly traded company, you are now large enough that audits are required; you also likely have directors of some nature. Like it or not, the concept of “controls” has entered your awareness.

As a team of teams, you needed to understand resource management, finance, the basics of multiple product management and coordination, and cross-functional processes. Now that you are an enterprise, you need also to consider questions of corporate governance. Your stakeholders have become more numerous, and their demands have multiplied, so the well-established practice of establishing a governing body has been applied.

Security threats increase proportionally to the company’s size. The talent and persistence of these adversaries are remarkable. Other challenging players are, on paper, “on the same side,” but auditors are never to be taken for granted. Why are they investigating IT systems? What are their motivations and responsibilities? Finally, what laws and regulations are relevant to IT?

Important
As with other chapters in the later part of this book, we are going to some degree introduce this topic “on its own terms.” We will then add additional context and critique in subsequent sections.

More than any other chapter, the location of this chapter (and especially its Security subsection) in Section 4 draws attention. Again, any topic in any chapter may be a matter of concern at any stage in an organization’s evolution.

You’ve been doing security since your company started. Otherwise, you would not have gotten this big. But now, you have a Chief Information Security Officer, formal risk management processes, a standing director-level security steering committee, auditors, and compliance specialists. That kind of formalization does not usually happen until an organization grows to a certain size.

We needed the content in Section 3 to get this far. We had to understand our structure, how we were organizing our strategic investments, and how we were engaging in operational activities. In particular it’s difficult for an organization to govern itself without some ability to define and execute processes, as processes often support governance controls and security protocols.

This chapter covers “Governance, Risk, Security, and Compliance” because there are clear relationships between these concerns. They have important dimensions of independence as well. It is interesting that Shon Harris' popular Guide to the CISSP starts its discussion of security with a chapter titled “Information Security Governance and Risk Management.” Governance leads to a concern for risk, and security specializes in certain important classes of risk. Security requires grounding in governance and risk management.

Compliance is also related but again distinct, as the concern for adherence to laws and regulations, and secondarily internal policy.

10.1.1. Chapter 10 outline

  • Governance

  • Enablers

  • Risk management

  • Compliance

  • Assurance and audit

  • Security

  • Digital Governance

10.1.2. Chapter 10 learning objectives

  • Define governance versus management

  • Describe key objectives of governance according to major frameworks

  • Define risk management and its components

  • Describe and distinguish assurance and audit, and describe their importance to digital operations

  • Discuss digital security concerns and practices

  • Identify common regulatory compliance issues

  • Describe how governance is retaining its core concerns while evolving in light of digital transformation

  • Describe automation techniques relevant to supporting governance objectives throughout the digital delivery pipeline

10.2. Governance

10.2.1. What is governance?

The system by which organizations are directed and controlled.
— Cadbury Report
The CObIT 5 framework makes a clear distinction between governance and management. These two disciplines encompass different types of activities, require different organisational structures and serve different purposes . . . In most enterprises, governance is the responsibility of the board of directors under the leadership of the chairperson [while] management is the responsibility of the executive management under the leadership of the CEO.
— CObIT 5 Framework
ISACA

To talk about governing digital or IT capabilities, we must talk about governance in general. Governance is a challenging and often misunderstood concept. First and foremost, it must be distinguished from “management.” This is not always easy but remains essential.

A governance example

Here is simple explanation of governance:

Suppose you own a small retail store. For years, you were the primary operator. You might have hired an occasional cashier, but that person had limited authority; they had the keys to the store and cash register, but not the safe combination, nor was their name on the bank account. They did not talk to your suppliers. They received an hourly wage, and you gave them direct and ongoing supervision. [1] In this case, you were a manager. Governance was not part of the relationship.

two men in front of a store
Figure 169. Someone to “mind the store”

Now, you wish to go on an extended vacation — perhaps a cruise around the world, or a trek in the Himalayas. You need someone who can count the cash and deposit it, and place orders with and pay your suppliers. You need to hire a professional manager (see Someone to “mind the store” [2]).

They will likely draw a salary, perhaps some percentage of your proceeds, and you will not supervise them in detail as you did the cashier. Instead, you will set overall guidance and expectations for the results they produce. How do you do this? And perhaps even more importantly, how do you trust this person?

Now, you need governance.

As we see in the above quote, one of the most firmly reinforced concepts in the CObIT guidance (more on this and ISACA in the next section) is the need to distinguish governance from management. Governance is by definition a board-level concern. Management is the CEO’s concern. In this distinction, we can still see the shop owner and his or her delegate.

Important
There is too often a tendency to lump all of “management” in with governance. Sometimes it may be said that the VP of Sales, or Human Resources, “governs” their function, for example. While tempting to executives who want to elevate their status, this is not the intent of the term, as we will detail below.
Some theory of goverance

In political science and economics, the need for governance is seen as an example of the principal-agent problem [87]. Our shopkeeper example illustrates this. The hired manager is the “agent,” acting on behalf of the shop owner, who is the “principal.”

In principal-agent theory, the agent may have different interests than the principal. The agent also has much more information (think of the manager running the shop day-to-day, versus the owner off climbing mountains). The agent is in a position to do economic harm to the principal; to shirk duty, to steal, to take kickbacks from suppliers. Mitigating such conflicts of interest is a part of governance.

In larger organizations (such as you are now), it’s not just a simple matter of one clear owner vesting power in one clear agent. The corporation may be publicly owned, or in the case of a non-profit, it may be seeking to represent a diffuse set of interests (e.g.,environmental issues). In such cases, a group of individuals (directors) is formed, often termed a “board,” with ultimate authority to speak for the organization.

The principal-agent problem can be seen at a smaller scale within the organization. Any manager encounters it to some degree, in specifying activities or outcomes for subordinates. But this does not mean that the manager is doing “governance,” as governance is by definition an organization-level concern.

The fundamental purpose of boards of directors and similar bodies is to take the side of the principal. This is easier said than done; boards can become overly close to an organization’s senior management — the senior managers are real people, while the “principal” may be an amorphous, distant body of shareholders and/or stakeholders.

Because governance is the principal’s concern, and because the directors represent the principal, governance, including IT governance, is a board-level concern.

There are various principles of corporate governance we will not go into here, such as shareholder rights, stakeholder interests, transparency, and so forth. References on these topics are included in the chapter conclusion (COSO and ISACA are good places to start). However, as we turn to our focus on digital and IT-related governance, there are a few final insights from the principal-agent theory that are helpful to understanding governance. Consider:

the heart of principal-agent theory is the trade-off between (a) the cost of measuring behavior and (b) the cost of measuring outcomes and transferring risk to the agent. [87]

What does this mean? Suppose the shopkeeper tells the manager, “I will pay you a salary of $50,000 while I am gone, assuming you can show me you have faithfully executed your daily duties.”

The daily duties are specified in a number of checklists, and the manager is expected to fill these out daily and weekly, and for certain tasks, provide evidence they were performed (e.g.,bank deposit slips, checks written to pay bills, photos of cleaning performed, etc.).. That is a behavior-driven approach to governance. The manager need not worry if business falls off; they will get their money. The owner has a higher-level of uncertainty; the manager might falsify records, or engage in poor customer service so that business is driven away. A fundamental conflict of interest is present; the owner wants their business sustained, while the manager just wants to put in the minimum effort to collect the $50,000. When agent responsibilities can be well specified in this manner, it is said they are highly programmable.

Now, consider the alternative. Instead of this very scripted set of expectations, the shopkeeper might tell the manager, “I will pay you 50% of the shop’s gross earnings, whether they may be. I’ll leave you to follow my processes however you see fit. I expect no customer or vendor complaints when I get back.”

In this case, the manager’s behavior is more aligned with the owner’s goals. If they serve customers well, they will likely earn more. There are any number of hard-to-specify behaviors (less programmable) that might be highly beneficial.

For example, suppose the store manager learns of an upcoming street festival, a new one that the owner did not know of or plan for. If the agent is managed in terms of their behavior, they may do nothing -— it’s just extra work. If they are measured in terms of their outcomes, however, they may well make the extra effort to order merchandise desirable to the street fair participants, and perhaps hire a temporary cashier to staff an outdoor booth, as this will boost store revenue and therefore their pay.

(Note that we have considered similar themes in our discussion of Agile and contract management, in terms of risk sharing).

In general, it may seem that an outcome-based relationship would always be preferable. There is, however, an important downside. It transfers risk to the agent (e.g.,the manager). And because the agent is assuming more risk, they will (in a fair market) demand more compensation. The owner may find themselves paying $60,000 for the manager’s services, for the same level of sales, because the manager also had to “price in” the possibility of poor sales and the risk that they would only make $35,000.

Finally, there is a way to align interests around outcomes without going fully to performance-based pay. If the manager for cultural reasons sees their interests as aligned, this may mitigate the principal-agent problem. In our example, suppose the store is in a small, tight-knit community with a strong sense of civic pride and familial ties.

Even if the manager is being managed in terms of their behavior, their cultural ties to the community or clan may lead them to see their interests as well aligned with those of the principal. As noted in [87], “Clan control implies goal congruence between people and, therefore, the reduced need to monitor behavior or outcomes. Motivation issues disappear.” We have discussed this kind of motivation in Chapter 7, especially in our discussion of control culture and insights drawn from the military.

COSO and control
Internal control is a process, effected by an entity’s board of directors, management, and other personnel, designed to provide reasonable assurance regarding the achievement of objectives relating to operations, reporting, and compliance.
— Committee of Sponsoring Organizations of the Treadway Commission
Internal Control — Integrated Framework

An important discussion of governance is found in the statements of COSO on the general topic of "control.”

Control is a term with broader and narrower meanings in the context of governance. In the area of risk management, “controls” are specific approaches to mitigating risk. However, “control” is also used by COSO in a more general sense to clarify governance.

What is COSO?

The Council of Sponsoring Organizations of the Treadway Commission (COSO) has a non-intuitive name, especially given its global influence.

COSO is a “private sector initiative,” funded by:

  • Institute of Certified Public Accountants (AICPA),

  • American Accounting Association (AAA),

  • Financial Executives International (FEI),

  • Institute of Internal Auditors (IIA)

  • Institute of Management Accountants (IMA).

It was founded in 1985 to support the National Commission on Fraudulent Financial Reporting and has published various reports and guidance mostly concerned with the topic of internal control.

Control activities , according to COSO, are

the actions established through policies and procedures that help ensure that management’s directives to mitigate risks to the achievement of objectives are carried out. Control activities are performed at all levels of the entity, at various stages within business processes, and over the technology environment. They may be preventive or detective in nature and may encompass a range of manual and automated activities such as authorizations and approvals, verifications, reconciliations, and business performance reviews.

…​Ongoing evaluations, built into business processes at different levels of the entity, provide timely information. Separate evaluations, conducted periodically, will vary in scope and frequency depending on assessment of risks, effectiveness of ongoing evaluations, and other management considerations. Findings are evaluated against criteria established by regulators, recognized standard-setting bodies or management and the board of directors, and deficiencies are communicated to management and the board of directors as appropriate. [73]

Systems theory, feedback, control, and governance We’ve encountered systems theory and associated concepts such as feedback and control throughout this book. The idea of “governance” predates these -— but in an interesting way.

The term “governance” originates from the Greek word kubernao, which means “to steer,” as in a ship. Nautical navigation is a process of feedback and correction. The same Greek word is also the basis for the term "cybernetics,” another word closely associated with systems and control theory.

mechanism
Figure 170. Centrifugal governor

“Governors” have been part of mechanical systems for centuries (see Centrifugal governor [3]). These mechanisms have the effect of automatically controlling a system so that it (for example) operates at the desired revolutions per minute. Without governors, steam engines tended to blow up, or go out; applying devices such as the centrifugal governor to regulate them was an important step in the development of steam power. Importantly, such devices operated to control the process from variation on either side, whether too fast or too slow. They did not operate merely as brakes.

So, if you find yourself coping with arbitrary and bureaucratic “governance” processes, it might be good to remember the origins of the term. There is more to governance than just slowing a system down.

10.2.2. Analyzing governance

Governance context
governance environment
Figure 171. Governance in context

Enough theory. Governance is also a practical concern for you because, at your scale, you have a complex set of environmental forces to cope with (see Governance in context). You started with a focus on the customer, and the market they represented. Sooner or later, you encountered regulators and adversaries: competitors and cybercriminals.

Mark Burgess and Promise Theory

Promise Theory is . . . an engineering framework for coping with uncertainty in information systems.
— Mark Burgess
Promise Theory

We use the term “promise” here in a manner intended to be roughly consistent with Promise Theory. Promise Theory is a way of reasoning about relationships and systems developed by Mark Burgess. [42]

Promise Theory is a sophisticated, mathematically-founded framework for reasoning about intent, outcome, control, and related topics of interest to digital governance. It is not possible to discuss it in depth here, but one basic heuristic we will follow is that a promise is better than a command, because “A promise expresses intent about the end point, or ultimate outcome, instead of indicating what to do at the starting point.” We have explored similar themes in our chapter section on culture.

These external parties intersect with your reality via various channels:

  • Your brand, which represents a sort of general promise to the market (see [261], p.16)

  • Contracts, which represent more specific promises to suppliers and customers

  • Laws, regulations, and standards, which can be seen as promises you must make and keep in order to function in civil society, or in order to obtain certain contracts.

  • Threats, which may be of various kinds:

    • legal

    • operational

    • intentional

    • unintentional

    • illegal

    • environmental

We will return to the role of external forces in our discussion of assurance. For now, we will turn to how digital governance, within an overall system of digital delivery, reflects our emergence model.

Governance and the emergence model

In terms of our emergence model, one of the most important distinctions between a “team of teams” and an “enterprise” is the existence of formalized organizational governance.

governance emergence
Figure 172. Governance emerges at the enterprise level

As illustrated in Governance emerges at the enterprise level, formalized governance is represented by the establishment of a governing body, responsive to some stakeholders who seek to recognize value from the organization or “entity” -— in this case, a digital delivery organization.

Corporate governance is a broad and deep topic, essential to the functioning of society and its organized participants. These include for-profit, non-profit, and even governmental organizations. Any legally organized entity of significant scope has governance needs.

One well known structure for organizational governance is seen in the regulated, publicly owned company (such as those listed on stock exchanges) . In this model, shareholders elect a governing body (usually termed the Board of Directors), and this group provides the essential direction for the enterprise as a whole.

However, organizational governance takes other forms. Public institutions of higher education may have a Board of Regents or Board of Governors, perhaps appointed by elected officials. Nonprofits and incorporated private companies still require some form of governance, as well. One of the less well-known but very influential forms of governance is the venture capital portfolio approach, very different from a public, mission-driven company. We will talk more about this in the digital governance section.

These are well known topics in law, finance, and social organization, and there are many sources you can turn to if you have further interest. If you are taking any courses in Finance or Accounting, you will likely cover governance objectives and processes.

Illustrated in Governance and management with interface [4] is a more detailed visual representation of the relationship between governance and management in a digital context.

Reading the figure from the top down:

Value recognition is the fundamental objective of the stakeholder. We discussed in Chapter 4 the value objectives of effectiveness, efficiency, and risk (aka top line, bottom line, and risk). These are useful final targets for impact mapping, to demonstrate that lower level perhaps more “technical” product capabilities do ultimately contribute to organization outcomes.

Note
The term “value recognition” as the stakeholder goal is chosen over “value creation” as “creation” requires the entire system. Stakeholders do not “create” without the assistance of management, delivery teams, and the individual.

Here, we see them from the stakeholder perspective of

  • Benefits realization

  • Cost optimization

  • Risk optimization

(Adapted from [136 p. 23])

Both ISO 38500 [142] as well as CObIT [136] specify that the fundamental governance activities of governance are:

  • Direct

  • Evaluate

  • Monitor

architecture of goverance
Figure 173. Governance and management with interface

Evaluation is the analysis of current state, including current proposals and plans. Directing is the establishment of organizational intent as well as the authorization of resources. Monitoring is the ongoing attention to organizational status, as an input to evaluation and direction.

Direct, Evaluate, and Monitor may also be ordered as Evaluate, Direct, and Monitor. These are highly general concepts that in reality are performed simultaneously, not as any sort of strict sequence.

The governance/management interface is an essential component. The information flows across this interface are typically some form of the following:

From the governing side

  • Goals (e.g.,product and go-to-market strategies)

  • Resource authorizations (e.g.,organizational budget aprovals)

  • Principles and policies (e.g.,personnel and expense policies)

From the governed side

  • Plans & proposals (at a high level, e.g., budget requests)

  • Performance reports (e.g.,sales figures)

  • Conformance/compliance indicators (e.g.,via audit and assurance)

Notice also the circular arrow at the center of the Governance/Management interface. Governance is not a one-way street. Its principles may be stable, but approaches, tools, practices, processes, and so forth (what we will discuss below as “enablers,” in CObIT terminology) are variable and require ongoing evolution.

We often hear of “bureaucratic” governance processes. But the problem may not be “governance” per se. It is more often the failure to correctly manage the governance/management interface. Of course, if the board is micro-managing, demanding many different kinds of information and intervening in operations, then governance and its management response is all much the same thing. In reality, however, burdensome organizational “governance” processes may be an overdone, bottom-up management response to perceived Board-level mandates.

Or they may be point-in-time requirements no longer needed. The policies of 1960 are unsuited to the realities of 2020. But if policies are always dictated top-down, they may not be promptly corrected or retired when no longer applicable. Hence, the scope and approach of governance in terms of its enablers must always be a topic of ongoing, iterative negotiation between the governed and the governing.

Finally the lowermost digital delivery chevron -— aka value chain, represents most of what we have discussed in Sections I, II, and III:

  • the individual working to create value using digital infrastructure and lifecycle pipelines

  • the team collaborating to discover and deliver valuable digital products

  • the team of teams coordinating to deliver higher-order value while balancing effectiveness with efficiency and consistency

Ultimately, governance is about managing results and risk. It’s about objectives and outcomes. It’s about “what,” not “how.” In terms of practical usage, it is advisable to limit the “governance” domain -— including the use of the term -— to a narrow scope of the board or director-level concerns, and the existence of certain capabilities, including:

  • organizational policy management

  • external and internal assurance and audit

  • risk management, including security aspects

  • compliance

We turn to a useful concept for the implementation of digital governance -— the concept of enablers.

10.3. Enablers

10.3.1. An introduction to enablers

Enablers are factors that, individually and collectively, influence whether something will work — in this case, governance and management over enterprise IT.
— CObIT 5 Framework
ISACA

As we explore the goverance/management interface further, we encounter the CObIT concept of enablers [136] (see CObIT enablers across the governance interface).

Enablers are the fundamental components of any purposeful organization. CObIT has a detailed structure that positions enablers in a broader context of stakeholder objectives, enterprise and IT goals, and various quality criteria. One easy to understand example of a governance-level enabler concern is when processes serve as risk controls; we will discuss this further in the next chapter section.

CObIT’s 7 enablers are:

  • Principles, Policies, and Frameworks

  • Processes

  • Organizational Structures

  • Culture, Ethics, and Behavior

  • Information

  • Services, Infrastructure, and Applications

  • People, Skills, and Competencies

CObIT enablers
Figure 174. CObIT enablers across the governance interface

In the illustration, varying lengths of the enablers are deliberate. The further upward the bar extends, the more they are direct concerns for governance. All of the enablers are discussed elsewhere in the book.

Table 18. CObIT enablers
Enabler Covered here

Principles, Policies, and Frameworks

Principles & policies covered in this chapter. Frameworks covered in Chapter 8 (PMBOK), Chapter 9 (CMMI, ITIL, CObIT, TOGAF), Chapter 11 (DMBOK), Chapter 12 (further TOGAF).

Processes

Chapter 7

Organizational Structures

Chapter 9

Culture, Ethics and Behavior

Chapter 9

Information

Chapter 11

Services, Infrastructure, and Applications

Part I

People, Skills, and Competencies

Chapter 9

Here, we are concerned with their aspect as presented to the governance interface; here are some notes.

Principles, policies, and frameworks

Principles are the most general statement of organizational vision and values. Policies will be discussed in detail in the next section. In general, they are binding organization mandates or regulations, sometimes grounded in external laws. Frameworks were defined in Chapter 8. We discuss all of these further in this chapter section.

Note
Some companies may need to institute formal policies quite early. Even a startup may need written policies if it is concerned with regulations such as HIPAA. However, this may be done on an ad hoc basis, perhaps outsourced to a consultant. (A startup cannot afford a dedicated VP of Policy and Compliance). This topic is covered in detail in this section because, at enterprise scale, ongoing policy management and compliance must be formalized. Recall that Emergence means formalization is the basis of our emergence model.
People, skills, and competencies

People and their skills and competencies (covered in Chapter 8) are an enabler upon in which all the other enablers rest. “People are our #1 asset” may seem to be a cliche, but it is ultimately true. Formal approaches to understanding and managing this base of skills are therefore needed. A basic “Human Resources” capability is a start, but sophisticated and ambitious organizations institute formal organizational learning capabilities to ensure that talent remains a critical focus.

Culture, ethics and behavior

Culture, ethics, and behavior as an enabler can both drive revenue as well as risk and cost. Culture and hiring are discussed in Chapter 7.

Organizational structures

We discussed basic organizational structure in Chapter 7. However, governance also may make use of some of the scaling approaches discussed in Chapter 8. Cross-organization coordination techniques (similar to those discussed in Chapter 8) are frequently used in governance (e.g.,cross-organizational coordinating committees, such as an enterprise security council).

Processes

Process was defined in Chapter 9. We will discuss enablers as controls in the upcoming chapter section on risk management. A control is a role that an enabler may play. Processes are the primary form of enabler used in this sense.

Information

Information is a general term; in the sense of an enabler, it is based on data in its various forms, with overlays of concepts (such as syntax and semantics) that transform raw “data” into a resource that is useful and valuable for given purposes. From a governance perspective, information carries governance direction to the governed system, and the fed back monitoring also is transmitted as information. Information resource management and related topics such as data governance and data quality are covered in Chapter 11; it is helpful to understand governance at an overall level before going into these more specific domains.

Services, infrastructure, and applications

Services, infrastructure, and applications of course are the critical foundation of digital value. These fundamental topics were covered in Part I. In the sense of enablers, they have a recursive or self-reflexive quality. Digital technology automates business objectives; at scale, a digital pipeline becomes a nontrivial business concern in and of itself, requiring considerable automation [24], [204]. Applications that serve as digital governance enablers might include:

  • Source control

  • Build management

  • Package management

  • Deployment and configuration management

  • Monitoring

  • Portfolio management

10.3.2. One-way policy begins

Your company was incorporated long ago, but the “board” was always a bit of a joke. The three people who started the company were the directors of record, and they would have an annual “meeting” at the local bar where enough paperwork would be done to satisfy the company lawyer.

Your company did well and accumulated enough cash to purchase another company, run in much the same way. The people who owned the company being acquired were good, and your company didn’t want to lose them, so in addition to senior management positions, they were offered equity -— a share of ownership in the new combined firm.

This raised the topic, “how is the new firm directed?” One of the incoming shareholders wanted a seat on the “board,” even though neither company had done much with board-level governance.

The lawyer and accountant hired to assist with the merger also recommended that as part of the acquisition, a formal audit is to be conducted of both firms (which had never been done).

This audit came back generally clean but shone a light on differences in how the companies had operated, and unearthed some irregularities.

For example, your company had started to purchase phones for all employees, while the acquired company was pure BYOD (Bring Your Own Device). One company had corporate credit cards, while the other was requiring people to carry their own expenses for reimbursement. One company had an informal “understanding” that first class travel was OK for Asian trips at least, while the other didn’t, but neither had written anything down. And so on.

The lawyer said, “I think you need some policies,” and everybody groaned. One person said, “I just read about Nordstrom. All they say is “Use Good Judgment.” Why do we need anything more?”

The lawyer said, “Um, that’s an urban legend. The actual Nordstrom Code of Business Conduct and Ethics, while it starts off with that, runs about 8,000 words and covers a variety of topics such as handling customer information, using technology, social media, and so forth.”

And the new CFO said, “Look, I get that we want to stay agile, and keep our informal culture. I’m no fan of policy for the sake of policy. But I need those policies to keep my staff costs down. Two different expense approaches don’t add any value to us, and that’s only one of twenty issues we’ve uncovered here. \'Do the right thing' doesn’t cut it. We’ve got to have some means of establishing a baseline for new employees, someplace people can turn to when they don’t know what the expectation is.”

The HR director chimed in. “If we don’t document our official position on things like harassment we are going to have problems. We could fire someone who has done something really bad, and they could sue us for wrongful termination. Or their victims could sue us for failing to prevent the issue. That could cost us real money.” The lawyer nodded, and the company owners looked thoughtful.

Another person spoke up. “I came from a company that had a 500-page policy manual. It went down into way too much detail and was always out of date. No-one could find anything in it, and there would be stuff that was wrong because the revision process was broken.”

The lawyer said, “You need to keep your policies light and on the general side (like Nordstrom) and cover more detailed topics elsewhere. For example, the exact approach on how to reimburse employee expenses probably doesn’t belong in the policy manual. Of course, that means that somewhere you need to lay out how your principles inform your policies which are implemented by processes, procedures, guidelines, and so forth. Your actual employee handbook will probably be thirty or forty pages — sorry. You also should take advantage of your internal intranet and make sure people can find just the policy they need, with related guidance, instead of having to page through a huge document.

“Finally, you need to carefully distribute the authorship and revision control, especially for lower levels of the guidance (e.g.,technical standards that can change quickly). This is both because the people most affected should have a stronger voice in the policy, and also because centralized policy groups become bottlenecks if they are doing all the work.”

Another said, “This is all getting complicated.”

“Yes, complexity is to some extent unavoidable as you move to this new scale. I’m a big fan of sunset dates on policies and supporting materials, so you are periodically questioning whether something is still needed. Of course, this drives demand for someone to analyze and update policies — please don’t forget that.

“Overall, you need to always keep your outcomes in mind, and continue to push as much decision making down to individuals as you can. CObIT recognizes that culture is one of the critical enablers for governance, and so 'use good judgment' is still a great place to start -— IF you can hire people with good judgment, and continually reinforce them in using it.”

see [199], [175]

10.3.3. Mission, principle, strategy, and policy

Carefully drafted and standardized policies and procedures save the company countless hours of management time. The consistent use and interpretation of such policies, in an evenhanded and fair manner, reduces management’s concern about legal issues becoming legal problems.
— Michael Griffin
“How To Write a Policy Manual"
policy hierarchy
Figure 175. Vision/mission/policy hierarchy

Illustrated in Vision/mission/policy hierarchy is one way to think about policy in the context of our overall governance objective of value recognition.

The organization’s vision and mission should be terse and high level, perhaps something that could fit on a business card. It should express the organization’s reason for being in straightforward terms. Mission is reason for being; vision is a “picture” of the future, preferably inspirational.

The principles and codes should also be brief. (“Codes” can include codes of ethics or codes of conduct). For example, Nordstrom’s is about 8,000 words, perhaps about 10 pages.

Policies are more extensive. There are various kinds of policies:

In a non-IT example, a compliance policy might identify the Foreign Corrupt Practices Act and make it clear that bribery of foreign officials is unacceptable. Similarly, an HR policy might spell out acceptable and unacceptable side jobs (e.g., someone in the banking industry might be forbidden from also being a mortgage broker on their own account).

Policies are often independently maintained documents, perhaps organized along lines similar to:

  • Employment and HR policies

  • Whistleblower policy (non-retaliation)

  • Records retention

  • Privacy

  • Workplace guidelines

  • Travel and expense

  • Purchasing and vendor relationships

  • Use of enterprise resources

  • Information security

  • Conflicts of interest

  • Regulatory

(not a comprehensive list)

Policies, even though more detailed than codes of ethics/conduct, still should be written fairly broadly. In many organizations, they must be approved by the governing board. They should, therefore, be independent of technology specifics. An information security policy may state that the hardening guidelines must be followed, but the hardening guidelines (stipulating for example what services and open ports are allowable on Debian Linux) are not policy. There may be various levels or classes of policy.

Finally, policies reference standards and processes and other enablers as appropriate. This is the management level, where documentation is specific and actionable. Guidance here may include:

  • Standards

  • Baselines

  • Guidelines

  • Processes and procedures

These concepts may vary according to organization, and can become quite detailed. Greater detail is achieved in hardening guidelines. A behavioral baseline might be “Guests are expected to sign in and be accompanied when on the data center floor.” We will discuss technical baselines further in the chapter section on security, and also in our discussion of the technology product lifecycle in Chapter 12. See also Shon Harris' excellent CISSP Exam Guide [119] for much more detail on these topics.

The ideal end state is a policy that is completely traceable to various automation characteristics, such as approved “infrastructure as code” settings applied automatically by configuration management software (as discussed in “The DevOps Audit Toolkit,” [80]-— more on this to come). However, there will always be organizational concerns that cannot be fully automated in such manners.

Policies (and their implementation as processes, standards, and the like) must be enforced. As Steve Schlarman notes “Policy without a corresponding compliance measurement and monitoring strategy will be looked at as unrealistic, ignored dogma.” [236]

Policies and their derivative guidance are developed, just like systems, via a lifecycle. They require some initial vision and an understanding of what the requirements are. Again, Schlarman: “policy must define the why, what, who, where and how of the IT process” [236]. User stories have been used effectively to understand policy needs.

Finally, an important point to bear in mind:

Company policies can breed and multiply to a point where they can hinder innovation and risk-taking. Things can get out of hand as people generate policies to respond to one-time infractions or out of the ordinary situations [112 p. 17].

It’s advisable to institute sunset dates or some other mechanism that forces their periodic review, with the understanding that any such approach generates demand on the organization that must be funded. We will discuss this more in the chapter section on digital governance.

10.3.4. Standards, frameworks, methods, and the innovation cycle

We used the term “standards” above without fully defining it. We have discussed a variety of industry influences throughout this book: PMBOK, ITIL, CObIT, Scrum, Kanban, ISO/IEC 38500 and so on. We need to clarify their roles and positioning further. All of these can be considered various forms of “guidance” and as such are governance enablers. However, their origins, stakeholders, format, content, and usage vary greatly.

First, the term "standard” especially has multiple meanings. A “standard” in the policy sense may be a set of compulsory rules. Also, “standard” or “baseline” may refer to some intended or documented state the organization uses as a reference point. An example might be “we run Debian Linux 16_10 as a standard unless there is a compelling business reason to do otherwise.”

This last usage shades into a third meaning of uniform, normative standards such as are produced by the IEEE, IETF and ISO/IEC.

  • ISO/IEC: Internatonal Standards Organization/International Eletrotechnical Commission

  • IETF: Internet Engineering Task Force

  • IEEE: Institute of Electrical and Electronics Engineers

The International Standards Organization, or ISO, occupies a central place in this ecosystem. It possesses “general consultative status” with the United Nations, and has over 250 technical committees that develop the actual standards.

The IEEE standardizes such matters as wireless networking (e.g.,WiFi). The IETF (Internet Engineering Task Force) standardizes lower level Internet protocols such as TCP/IP and HTTP. The W3C (World Wide Web Consortium) standardizes higher-level Internet protocols such as HTML. Sometimes standards are first developed by a group such as the IEEE and then given further authority though publication by ISO/IEC. The ISO/IEC in particular, in addition to its technical standards, also develops higher order management/"best practice” standards. One well known example of such an ISO standard is the ISO 9000 series on quality management.

Some of these standards may have a great effect on the digital organization. We’ll discuss this further in the chapter section on compliance.

Frameworks were discussed in Chapter 9. Frameworks have two major meanings. First, software frameworks are created to make software development easier. Examples include Struts, AngularJS, and many others. This is a highly volatile area of technology, with new frameworks appearing every year and older ones gradually losing favor.

In general we are not concerned with these kinds of specific frameworks in this book, except governing them as part of the technology product lifecycle. We are concerned with “process” frameworks such as ITIL, PMBOK, CObIT, CMMI, and the TOGAF framework. These frameworks are not “standards” in and of themselves. However, they often have related ISO standards:

Table 19. Frameworks and related standards
Framework Standard

ITIL

ISO/IEC 20000

CObIT

ISO/IEC 38500

PMBOK

ISO/IEC 21500

CMMI

ISO/IEC 15504

TOGAF

ISO/IEC 42010

Frameworks tend to be lengthy and verbose. The ISO/IEC standards are brief by comparison, perhaps on average 10% of the related framework. Methods (aka methodologies) in general are more action oriented and prescriptive. Scrum and XP are methods. It is at least arguable that PMBOK is a method as well as a framework.

Note
There is little industry consensus on some of these definitional issues, and the student is advised not to be overly concerned about such abstract debates. If you need to comply with something to win a contract, it doesn’t matter whether it’s a “standard,” “framework,” “guidance,” “method,” or what have you.

Finally, there are terms that indicate technology cycles, movements, communities of interest, or cultural trends: Agile and DevOps being two of the most current and notable. These are neither frameworks, standards, nor methods. However, commercial interests often attempt to build frameworks and methods representing these organic trends. Examples include the Scaled Agile Framework, Disciplined Agile Delivery, the DevOps Institute, the DevOps Agile Skills Association, and many others.

standards cycle
Figure 176. Innovation cycle

Innovation cycle illustrates the innovation cycle. Innovations produce value, but innovation presents change management challenges, such as cost and complexity. The natural response is to standardize for efficiency, and standardization taken to its end state results in commodification, where costs are optimized as far as possible, and the remaining concern is managing the risk of the commodity (as either consumer or producer). While efficient, commoditized environments offer little competitive value, and so the innovation cycle starts again.

Note that the innovation cycle corresponds to the elements of value recognition:

  • Innovation corresponds to Benefits Realization

  • Standardization corresponds to Cost Optimization

  • Commoditization corresponds to Risk Optimization

10.4. Risk management

10.4.1. Risk management fundamentals

Risk is defined as the possibility that an event will occur and adversely affect the achievement of objectives.
— Committee of Sponsoring Organizations of the Treadway Commission
Internal Control — Integrated Framework

Risk is a fundamental concern of governance. Management (as we have defined it in this chapter section) may focus on effectiveness and efficiency well enough, but too often disregards risk.

As we noted above, the shop manager may have incentives to maximize income, but usually, does not stand to lose their life savings. The owner, however, does. Fire, theft, disaster — without risk management, the owner does not sleep well.

For this reason, risk management is a large element of governance, as indicated by the popular GRC acronym: governance, risk, and compliance.

Defining “Risk"

The definition of “risk” is surprisingly controversial. The ISO 31000 standard [143] and the Project Management Institute's PMBOK [214] both define risk as including positive outcomes (benefits). This definition has been strongly criticized by (among others) Douglas Hubbard in The Failure of Risk Management [126]. Hubbard points out that traditionally, risk has meant the chance and consequences of loss.

As this is an an overview text, we will use the more pragmatic, historical definition. Practically speaking, operational risk management as a function focuses on loss. The possibility (“risk”) of benefits is eagerly sought by the organization as a whole and does not need “management” by a dedicated function.

“Loss,” however, can also equate to “failure to achieve anticipated gains.” This form of risk applies (for example) to product and project investments.

Risk management can be seen as both a function and a process (see Risk management context). As a function, it may be managed by a dedicated organization (perhaps called Enterprise Risk Management or Operational Risk Management). As a process, it conducts the following activities:

  • Identifying risks

  • Assessing and prioritizing them

  • Coordinating effective responses to risks

  • Ongoing monitoring and reporting of risk management

risk context
Figure 177. Risk management context

Risk impacts some asset. Examples in the digital and IT context would include:

  • Operational IT systems

  • Hardware (e.g.,computers) and facilities (e.g.,data centers)

  • Information (customer or patient records)

It is commonly said that organizations have an “appetite” for risk [139 p. 79], in terms of the amount of risk the organization is willing to accept. This is a strategic decision, usually reserved for organizational governance.

Risk management typically has strong relationships with the following organizational capabilities:

  • Enterprise governance (e.g.,board-level committees)

  • Security

  • Compliance

  • Audit

  • Policy management

For example, security requires risk assessment as an input, so that security resources focus on the correct priorities. Risk additionally may interact with:

  • Project management

  • Asset management

  • Processes such as Change Management

and other digital activities. More detail on core risk management activities follows, largely adopted from the CObIT for Risk publication [139].

Risk identification

There are a wide variety of potential risks, and many accounts and anecdotes constantly circulating. It is critical that risk identification begins with a firm understanding of the organization’s objectives and context.

Risk identification can occur both in a “top down” and “bottom up” manner. Industry guidance can assist the operational risk management function in identifying typical risks. For example, the CObIT for Risk publication includes a useful 8 page “Generic Risk Scenarios” section [139 pp. 67-74] identifying risks such as

  • “Wrong programmes are selected for implementation and are misaligned with corporate strategy and priorities”

  • “There is an earthquake.”

  • “Sensitive data is lost/disclosed through logical attacks.”

These are only three of dozens of scenarios. Risks, of course, extend over a wide variety of areas:

  • Investment

  • Sourcing

  • Operations

  • Availability

  • Continuity

  • Security

and so forth. The same guidance also strongly cautions against over-reliance on these generic scenarios.

Risk assessment

Risk management has a variety of concepts and techniques both qualitative and quantitative. Risk is often assumed to be the product of probability * impact. For example, if the chance of a fire in a facility is 5% over a given year, and the damage of the fire is estimated at $100,000, the annual risk is $5,000. An enterprise risk management function may attempt to quantify all such risks into an overall portfolio.

Where quantitative approaches are perceived to be difficult, risk may be assessed using simple ordinal scales (e.g.,1-5, where 1 is low risk, and 5 is high risk). CObIT for Risk expresses concern regarding “The use of ordinal scales for expressing risk in different categories, and the mathematical difficulties or dangers of using these numbers to do any sort of calculation.” [139 p. 75]. Such approaches are criticized by Doug Hubbard in The Failure of Risk Management as misleading and potentially more harmful than not managing risk at all [126].

Hubbard instead suggests that quantitative techniques such as Monte Carlo analysis are rarely infeasible, and recommends their application instead of subjective scales.

The enterprise can also consider evaluating scenarios that have a chance of occurring simultaneously. This is frequently referred to as ‘stress’ testing.

Risk response
He who fights and runs away lives to fight another day.
— Menander
342 BC — 291 BC

Risk response includes several approaches:

  • Avoidance

  • Acceptance

  • Transference

  • Mitigation

Avoidance means ending the activities or conditions causing the risk; e.g., not engaging in a given initiative or moving operations away from risk factors.

Acceptance means no action is taken. Typically, such “acceptance” must reside with an executive.

Transference means that some sharing arrangement, usually involving financial consideration, is established. Common transfer mechanisms include outsourcing and insurance. (Recall our discussion of Agile approaches to contract management and risk sharing).

Mitigation means that some compensating mechanism -— one or more “controls” is established. This topic is covered in the next section and comprises the remainder of the material on risk management.

(The above discussion was largely derived from ISACA’s CObIT 5 for Risk [138]).

Controls
The term 'control objective' is no longer a mainstream term used in CObIT 5, and the word 'control' is used only rarely. Instead, CObIT 5 uses the concepts of process practices and process activities.
— ISACA
CObIT 5 for Assurance

The term “control” is problematic.

It has distasteful connotations to those who casually encounter it, evoking images of "command and control” management, or “controlling” personalities. CObIT, which once stood for Control Objectives for IT, now deprecates the term control (see the above quote). Yet it retains a prominent role in many discussions of enterprise governance and risk management, as we saw at the start of this chapter in the discussion of COSO’s general concept of control.

And (as discussed in our coverage of Scrum’s origins) it is a technical term of art in systems engineering. As such it represents principles essential to understanding large scale digital organizations.

In this section, we are concerned with controls in a narrower sense, as risk mitigators. As noted above, ISACA replaced the term “controls” with process practices & activities, which are specific examples of enablers. As controls, enablers such as policies, procedures, organizational structures, and the rest are used and intended to ensure that:

  • investments achieve their intended outcomes;

  • resources are used responsibly, and protected from fraud, theft, abuse, waste, and mismanagement;

  • laws and regulations are adhered to; and

  • timely and reliable information is employed for decision making.

Note
You will likely encounter the term control as “testing enablers” or “testing process practices” is not how security personnel and auditors talk.

But what are examples of “controls"? Take a risk, such as the risk of a service (e.g.,e-commerce website) outage resulting in loss of critical revenues. There are a number of ways we might attempt to mitigate this risk:

  • Configuration management (a preventative control)

  • Effective monitoring of system alerts (a detective control)

  • Documented operational responses to detected issues (a corrective control)

  • Clear recovery protocols that are practiced and well understood (a recovery control)

  • System redundancy of key components where appropriate (a compensating control)

and so forth. Another kind of control appropriate to other risks is deterrent (e.g.,an armed guard at a bank).

Other types of frequently seen controls include:

  • Separation of duties

  • Audit trails

  • Documentation

  • Standards and guidelines

A control type such as "separation of duties” is very general and might be specified by activity type, e.g.

  • Purchasing

  • System development and release

  • Sales revenue recognition

Each of these would require distinct approaches to separation of duties. Some of this may be explicitly defined; if there is no policy or control specific to a given activity, an auditor may identify this as a deficiency.

Policies and processes in their aspect as controls are often what auditors test. In the case of the website above, an auditor might test the configuration management approach, the operational processes, inspect the system redundancy, and so forth. And risk management would maintain an ongoing interest in the system in between audits.

As with most topics in this book, risk management (in and of itself, as well as applied to IT and digital) is an extensive and complex domain, and this discussion was necessarily brief. The student is referred to the readings at the end of the chapter for further information.

Business continuity

Business continuity is an applied domain of digital and IT risk, like security. Continuity is concerned with large scale disruptions to organizational operations, such as:

  • Floods

  • Earthquakes

  • Tornadoes

  • Terrorism

  • Hurricanes

  • Industrial catastrophes (e.g.,large scale chemical spills)

A distinction is commonly made between business continuity planning and disaster planning.

Disaster recovery is more tactical, including the specific actions taken during the disaster to mitigate damage and restore operations, and often with an IT-specific emphasis.

Continuity planning takes a longer term view of matters such as long-term availability of replacement space and computing capacity.

There are a variety of standards covering business continuity planning, including:

  • NIST Special Publication 800-34

  • ISO/IEC 27031:2011

  • ISO 22301

In general, continuity planning starts with understanding the business impact of various disaster scenarios and developing plans to counter them. Traditional guidance suggests that this should be achieved in a centralized fashion; however, large, centralized efforts of this nature tend to struggle for funding.

While automation alone cannot solve problems such as “where do we put the people if our main call center is destroyed,” it can help considerably in terms of recovering from disasters. If a company has been diligent in applying infrastructure as code techniques, and loses its data center, it can theoretically re-establish its system configurations readily, which can otherwise be a very challenging process, especially under time constraints. (Data still needs to have been backed up to multiple locations).

10.4.2. Compliance

Compliance is a very general term meaning conformity or adherence to

  • laws

  • regulations

  • policies

  • contracts

  • standards

and the like. Corporate compliance functions may first be attentive to legal and regulatory compliance, but the other forms of compliance are matters of concern as well.

A corporate compliance office may be responsible for the maintenance of organizational policies and related training and education, perhaps in partnership with the human resources department. They also may monitor and report on the state of organizational compliance. Compliance offices may also be responsible for codes of ethics. Finally, they may manage channels for anonymous reporting of ethics and policy violations by whistleblowers (individuals who become aware of and wish to report violations while receiving guarantees of protection from retaliation).

Compliance uses techniques similar to risk management, and in fact, non-compliance can be managed as a form of risk, and prioritized and handled much the same way. However, compliance is an information problem as well as a risk problem. There is an ongoing stream of regulations to track, which keeps compliance professionals very busy. In the U.S. alone, these include:

  • HIPAA

  • SOX

  • FERPA

  • PCI DSS

  • GLBA PII

Some of these regulations specifically call for policy management, and therefore companies that are subject to them may need to institute formal governance earlier than other companies, in terms of the emergence model. Many of them provide penalties for the mis-management of data, which we will discuss further in the next chapter section and in Chapter 11. Compliance also includes compliance with the courts (e.g.,civil and criminal actions). This will be discussed in the Chapter 11 section on cyberlaw.

10.5. Assurance and audit

10.5.1. Assurance

Trust, but verify.
— Russian proverb

Assurance is a broad term. In this book, it is associated with governance. It represents a form of additional confirmation that management is performing its tasks adequately. Go back to the example that started the chapter, of the shop owner hiring a manager. Suppose that this relationship has continued for some time, and while things seem to be working well, the owner has doubts about the arrangement. Over time, things have gone well enough that the owner does not worry about the shop being opened on time, having sufficient stock, or paying suppliers. But are any number of doubts the owner might retain:

  • Is money being accounted for honestly and accurately?

  • Is the shop clean? Is it following local regulations? For example, fire, health and safety codes?

  • If the manager selects a new supplier, are they trustworthy? Or is the shop at risk of selling counterfeit or tainted merchandise?

  • Are the shop’s prices competitive? Is it still well regarded in the community? Or has its reputation declined with the new manager?

  • Is the shop protected from theft and disaster?

audit compliance risk
Figure 178. Assurance in context

These kinds of concerns remain with the owner, by and large, even with a reliable and trustworthy manager. If not handled correctly, the owner’s entire investment is at risk. The manager may only have a salary (and perhaps a profit share) to worry about, but if the shop is closed due to violations, or lawsuit, or lost to a fire, the owner’s entire life investment may be lost. These concerns give rise to the general concept of assurance, which applies to digital business just as it does to small retail shops. The following diagram, derived from previous illustrations, shows how this book views assurance: as a set of practices overlaid across the enablers, and in particular concerned with external forces (see Assurance in context).

Assurance is like out-of-band management

In terms of the governance-management interface, assurance is fundamentally distinct from the information provided by management and must travel through distinct communication channels. This is why auditors (for example) forward their reports directly to the audit committee and do not route them through the executives who have been audited.

Technologists, especially those with a background in networking, may have heard of the concept of “out-of-band control.” With regard to out-of-band management or control of IT resources, the channel over which management commands travel is distinct from the channel over which the system provides its services. This channel separation is done to increase security, confidence, and reliability, and is analogous to assurance.

governance and assurance
Figure 179. Assurance is an objective, external mechanism

As ISACA stipulates,

The IS audit and assurance function shall be independent of the area or activity being reviewed to permit objective completion of the audit and assurance engagement. [141 p. 9]. Assurance can be seen as an external, additional mechanism of control and feedback. This independent, out-of-band aspect is essential to the concept of assurance (see Assurance is an objective, external mechanism).

Three party foundation
Assurance means that pursuant to an accountability relationship between two or more parties, an IT audit and assurance professional may be engaged to issue a written communication expressing a conclusion about the subject matters to the accountable party.
— CObIT 5 for Assurance

There are broader and narrower definitions of assurance. But all reflect some kind of three-party arrangement (see Assurance is based on a three-party model [5])

three party assurance
Figure 180. Assurance is based on a three-party model

The above diagram is one common scenario:

  1. The stakeholder (e.g.,the audit committee of the board of directors) engages an assurance professional (e.g.,an audit firm). The scope and approach of this are determined by the engaging party, although the accountable party in practice often has input as well.

  2. The accountable party, at the direction, responds to the assurance professional’s inquiries on the audit topic.

  3. The assurance professional provides the assessment back to the engaging party, and/or other users of the report (potentially including the accountable party).

This is a simplified view of what can be a more complex process and set of relationships. The ISAE3000 standard states that there must be at least three parties to any assurance engagement:

  • The responsible (accountable) party

  • The practitioner

  • The intended users (of the assurance report)

But there may be additional parties:

  • The engaging party

  • The measuring/evaluating party (sometimes not the practitioner, who may be called on to render an opinion on someone else’s measurement)

ISAE3000 goes on to stipulate a complex set of business rules for the allowable relationships between these parties [133 pp. 95-96]. Perhaps the most important rule is that the practitioner cannot be the same as either the responsible party or the intended users. There must be some level of professional objectivity.

What’s the difference between assurance and simple consulting? There are two major factors:

  • Consulting can be simply a two-party relationship — a manager hires someone for advice

  • Consultants do not necessarily apply strong assessment criteria. Indeed, with complex problems, there may not be any such criteria. Assurance, in general, presupposes some existing standard of practice, or at least some benchmark external to the organization being assessed.

Finally, the concept of assurance criteria is key. Some assurance is executed against the responsible party’s own criteria. In this form of assurance, the primary questions are “are you documenting what you do, and doing what you document?” That is, for example, do you have formal process management documentation (as discussed in Chapter 9)? And are you following it?

Other forms of assurance use external criteria. A good example is the Uptime Institute's data center tier certification criteria, discussed below. If criteria are weak or non-existent, the assurance engagement may be more correctly termed an advisory effort. Assurance requires clarity on this topic.

Types of assurance
Exercise caution in your business affairs; for the world is full of trickery.
— Max Ehrmann
“Desiderata"

The general topic of “assurance” implies a spectrum of activities. In the strictest definitions, assurance is provided by licensed professionals under highly formalized arrangements. However, while all audit is assurance, not all assurance is audit. As noted in CObIT for Assurance, “assurance also covers evaluation activities not governed by internal and/or external audit standards.” [138 p. 15].

This is a blurry boundary in practice, as an assurance engagement may be undertaken by auditors, and then might be casually called an “audit” by the parties involved. And there is a spectrum of organizational activities that seem at least to be related to formal assurance:

  • Brand assurance

  • Quality assurance

  • Vendor assurance

  • Capability assessments

  • Attestation services

  • Certification services

  • Compliance

  • Risk management

  • Benchmarking

  • Other forms of “due diligence”

Some of these activities may be managed primarily internally, but even in the case of internally-managed activities, there is usually some sense of governance, some desire for objectivity.

From a purist perspective, internally directed assurance is a contradiction in terms. There is a conflict of interest in that in terms of the three-party model above, the accountable party is the practitioner.

However, it may well be less expensive for an organization to fund and sustain internal assurance capabilities and get much of the same benefits as from external parties. This requires sufficient organizational safeguards be instituted. Internal auditors typically report directly to the Board-level audit committee, and generally, are not seen as having a conflict of interest.

In another example, an internal compliance function might report to the corporate general counsel (chief lawyer), and not to any executive whose performance is judged based on their organization’s compliance -— this would be a conflict of interest. However, because the internal compliance function is ultimately under the CEO, their concerns can be overruled.

The various ways that internal and external assurance arrangements can work, and can go wrong, is a long history. If you are interested in the topic, review the histories of Enron, Worldcom, the 2008 mortgage crisis, and other such failures.

Assurance and risk management

Risk management (discussed in the previous chapter section) may be seen as part of a broader assurance ecosystem. For evidence of this, consider that the Institute of Internal Auditors offers a certificate in Risk Management Assurance; The Open Group also operates the Open FAIR™ Certification Program. Assurance in practice may seem to be biased towards risk management, but (as with governance in general) assurance as a whole relates to all aspects of IT and digital governance, including effectiveness and efficiency.

Audit practices may be informed of known risks and particularly concerned with their mitigation, but risk management remains a distinct practice. Audits may have scope beyond risks, and audits are only one tool used by risk management (see Assurance and risk management).

In short, and as shown in the following diagram, assurance plays a role across value recognition, while risk management specifically targets the value recognition objective of risk optimization.

assurance and risk
Figure 181. Assurance and risk management
Non-audit assurance examples
Businesses must find a level of trust between each other . . . 3rd party reports provide that confidence. Those issuing the reports stake their name and liability with each issuance.
— James DeLuccia
“Successfully Establishing and Representing DevOps in an Audit"

Before we turn to a more detailed discussion of the audit, we’ll discuss some specifically non-audit examples of assurance seen in IT and digital management.

Example 1: Due diligence on a cloud provider

Your company is considering a major move to cloud infrastructure for its systems. The agility value proposition -— the ability to minimize cost of delay -— is compelling, and there may be some cost structure advantages as well.

But you are aware of some cloud failures:

  • In 2013, UK cloud provider 2e2 went bankrupt, and customers were given “24 to 48 hours to get …​ data and systems out and into a new environment” [85]. Subsequently, the provider demanded nearly 1 million pounds (roughly $1.5 million) from its customers in order for their uninterrupted access to services (i.e., their data). [275]

  • Also in 2013, cloud storage provider Nirvanix went bankrupt, and its customers also had a limited time to remove their data. MegaCloud went out of business with no warning two months later, and all customers lost all data. [48], [49]

  • In mid-2014, online source code repository Cloud Spaces (an early Github competitor) was taken over by hackers and destroyed. All data was lost. [274], [179]

The question is, how do you manage the risks of trusting your data and organizational operations to a cloud provider? This is not a new question, as computing has been outsourced to specialist firms for many years. You want to be sure that their operations meet certain standards:

  • Financial standards

  • Operational standards

  • Security standards

Data center evaluations of cloud providers are a form of assurance. Two well known approaches are:

  • The Uptime Institute’s Tier Certification

  • The American Institute of Certified Public Accountants' (AICPA) SOC 3 “Trust Services Report” certifying “Service Organizations” (based in turn on the SSAE-16 standard)

The Uptime Institute provides the well-known “Tier” concept for certifying data centers, from Tier I to Tier IV. In their words, “Certification provides assurances that there are not shortfalls or weak links anywhere in the data center infrastructure.” [273]. The Tiers progress as follows [272]:

  • Tier I: Basic Capacity

  • Tier II: Redundant Capacity Components

  • Tier III: Concurrently Maintainable

  • Tier IV: Fault Tolerance

Uptime Institute certification is a generic form of assurance in terms of the 3-party model; the data center operator must work with the Uptime Institute who provides an independent opinion based on their criteria as to the data center’s tier (and therefore effecctiveness).

The SOC 3 report is considered an “assurance” standard as well. However, as mentioned above, this is the kind of “assurance” done in general by licensed auditors, and which might casually be called an “audit” by the participants. A qualified professional, again in the 3-party model, examines the data center in terms of the SSAE 16 reporting standard.

Your internal risk management organization might look to both Uptime Institute and SOC 3 certification as indicators that your cloud provider risk is mitigated. (More on this in chapter section on Risk Management).

Example 2: Internal process assessment

You may also have concerns about your internal operations. Perhaps your process for selecting technology vendors is unsatisfactory in general; it takes too long and yet vendors with critical weaknesses have been selected.

More generally, the actual practices of various areas in your organization may be assessed by external consultants using the related guidance:

  • Enterprise Architecture with the TOGAF ADM

  • Project Management with PMBOK

  • IT processes such as Incident Management, Change Management, and Release Management with ITIL or CMMI-SVC

These assessments may be performed through using a maturity scale, e.g., CMM-derived. The CMM-influenced ISO/IEC 15504 standard may be used as a general process assessment framework. (Remember that we have discussed the problems with the fundamental CMM assumptions on which such assessments are based).

According to [21], “In our own experience, we have seen that the maturity models have their limitations.” They warn that maturity assessments of Enterprise Architecture at least are prone to being:

  • Subjective

  • Academic

  • Easily manipulated

  • Bureaucratic

  • Superfluous

  • Misleading

Those issues may well apply to all forms of maturity assessments. Let the buyer beware. At least, the concept of maturity should be very carefully defined in a manner relevant to the organization being assessed.

Example 3: Competitive benchmarking

Finally, you may wonder, “how does my digital operation compare to other companies?” Now, it is difficult to go to a competitor and ask this. It’s also not especially practical to go and find some non-competing company in a different industry you don’t understand well. An entire industry has emerged to assist with this question.

We talked about the role of industry analysts in chapter 8. Benchmarking firms play a similar role, and in fact, some analyst firms provide benchmarking services.

There are a variety of ways benchmarking is conducted, but it is similar to assurance in that it often follows the 3-party model. Some stakeholder directs an accountable party to be benchmarked within some defined scope. For example, the number of staff required to managed a given quantity of servers (aka admin:server) has been a popular benchmark. (Note that with cloud, virtualization, and containers, the usefulness of this metric is increasingly in question).

An independent authority is retained. The benchmarker collects, or has collected, information on similar operations; for example, they may have collected data from 50 organizations of similar size on admin:server ratios. This data is aggregated and/or anonymized so that competitive concerns are reduced. Wells Fargo will not be told “JP Morgan Chase has an overall ratio of 1:300;” they will be told, “The average for financial services is 1:250.”

In terms of formal assurance principles, the benchmark data becomes the assessment criteria. A single engagement might consider dozens of different metrics, and where simple quantitative ratios do not apply, the benchmarker may have a continuously maintained library of case studies for more qualitative analysis. This starts to shade into the kind of work also performed by industry analysts. As the work becomes more qualitative, it also becomes more advisory and less about “assurance” per se.

10.5.2. Audit

The Committee, therefore, recommends that all listed companies should establish an audit committee.
— Cadbury Report
Agile or not, a team ultimately has to meet legal and essential organizational needs and audits help to ensure this.
— Scott Ambler
Disciplined Agile Delivery

If you look up “audit” online or in a dictionary, you will see it mainly defined in terms of finance: an audit is a formal examination of an organization’s finances (sometimes termed “books”). Auditors look for fraud and error so that investors (like our shop owner) have confidence that accountable parties (e.g.,the shop manager) are conducting business honestly and accurately.

Audit is critically important to the functioning of the modern economy because there are great incentives for theft and fraud, and owners (in the form of shareholders) are remote from the business operations.

But what does all this have to do with information technology and digital transformation?

Digital organizations, of course, have budgets and must account for how they spend money. Since financial accounting and its associated audit practices are a well established practice, we won’t discuss it here. (We discussed IT financial management and service accounting in Chapter 8).

money over history
Figure 182. Money, from physical to virtual

Money represents a form of information, that of value. Money once was stored as a precious metal. When carrying large amounts of precious metal became impossible, it was stored in banks and managed through paper record keeping. Paper record keeping migrated onto computing machines, which now represent the value once associated with gold and silver. Bank deposits (our digital users bank account balance from Chapter 1) are now no more than a computer record -— digital bits in memory -— made meaningful by tradition and law, and secured through multiple layers of protection and assurance (see Money, from physical to virtual [6]).

Because of the increasing importance of computers to financial fundamentals, auditors became increasingly interested in information technology. Clearly, these new electronic computers could be used to commit fraud in new and powerful ways. Auditors had to start asking, “How do you know the data in the computer is correct?” This led to the formation in 1967 of the Electonic Data Processing Auditors Association (EDPAA), which eventually became ISACA (developer of CObIT).

It also became clear that computers and their associated operation were a notable source of cost and risk for the organization, even if they were not directly used for financial accounting. This has led to the direct auditing of information technology practices and processes, as part of the broader assurance ecosystem we are discussing in this chapter section.

A wide variety of IT practices and processes may be audited. Auditors may take a general interest in whether the IT organization is “documenting what it does and doing what it documents” and therefore this author has seen nearly every IT process audited.

IT auditors may audit projects, checking that the expected project methodology is being followed. They may audit IT performance reporting, such as claims of meeting Service Level Agreements. And they audit the organization’s security approach — both its definition of security policies and controls, as well as their effectiveness.

External versus internal audit

There are two major kinds of auditors of interest to us:

  • External auditors

  • Internal auditors

Here is a definition of external auditor:

An external auditor is chartered by a regulatory authority to visit an enterprise or entity and to review and independently report the results of that review. [187 p. 319].

Many accounting firms offer external audit services, and the largest accounting firms (such as PriceWaterhouse Coopers and Ernst & Young) provide audit services to the largest organizations (corporations, non-profits, and governmental entities). External auditors are usually certified public accountants, licensed by their state, and following industry standards (e.g.,from the American Institute of Certified Public Accountants).

By contrast, internal auditing is housed internally to the organization, as defined by the Institute of Internal Auditors:

Internal auditing is an independent appraisal function established within an organization to examine and evaluate its activities as a service to the organization [187], p. 320.

Internal audit is considered a distinct but complementary function to external audit. [69], 4_39. The internal audit function usually reports to the audit committee. As with assurance in general, independence is critical — auditors must have organizational distance from those they are auditing, and must not be restricted in any way that could limit the effectiveness of their findings.

Audit practices

As with other forms of assurance, audit follows the three-party model. There is a stakeholder, an accountable party, and an independent practitioner.

The typical internal audit lifecycle consists of (derived from [138]):

  • Planning/scoping

  • Performing

  • Communicating

In the scoping phase, the parties are identified (e.g.,the board audit committee, the accountable and responsible parties, the auditors, and other stakeholders). The scope of the audit is very specifically established, including objectives, controls, and enablers (e.g.,processes) to be tested. Appropriate frameworks may be utilized as a basis for the audit, and/or the organization’s own process documentation.

The audit is then performed. A variety of techniques may be used by the auditors:

  • Performance of processes or their steps

  • Inspection of previous process cycles and their evidence (e.g.,documents, recorded transactions, reports, logs, etc.)

  • Interviews with staff

  • Physical inspection or walkthroughs of facilities

  • Direct inspection of system configurations and validation against expected guidelines

  • Attempting what should be prevented (e.g.,trying to access a secured system or view data over the authorization level)

A fundamental principle is, “expected versus actual.” There must be some expected result to a process step, calculation, etc., that the actual result can be compared to.

Finally, the audit results are reported to the agreed users (often with a preliminary “heads up” cycle so that people are not surprised by the results). Deficiencies are identified in various ways and typically are taken into system and process improvement projects.

10.6. Security

A measure of a system’s ability to resist unauthorized attempts at usage or behavior modification, while still providing service to legitimate users.
— Sandy Bacik
Enernex
In reality, organizations have many other things to do than practice security.
— Shon Harris
Guide to the CISSP

You have been practicing security since you first selected your initial choice of development platform in Chapter 2. But by now, your security capability is a well-established organization with processes spanning the enterprise, and a cross-functional steering committee composed of senior executives and with direct access to Board governance channels.

Security is a significant and well-known domain in and of itself. Ultimately, however, security is an application of the governance and risk management principles discussed in the previous sections. Deriving ultimately from the stakeholder’s desire to sustain and protect the enterprise (see Security context), security relies on

security context
Figure 183. Security context

So what distinguishes security from more general concepts of risk? The definition at the top of this section is a good start, with its mention of “unauthorized attempts” to access or modify systems. The figure Security context uses the term “sanctionable,” meaning violations might lead to legal or at least organizational penalties.

Many risks might involve carelessness, or incompetence, or random technical failure, or accidents of nature. They might even involve fraud or misrepresentation. But security focuses on violations (primarily intentional, but also unintentional) of policies protecting organizational assets. In fact, "Assets Protection” is a common alternate name for corporate security.

“Authorization” is a key concept. Given some valuable resource, is access restricted to those who ought to have it? Are they who they say they are? Do they have the right to access what they claim is theirs? Are they conducting themselves in an expected and approved manner?

The security mentality is very different from the mentality found in a startup. A military analogy might be helpful. Being in a startup is like engaging in missions: search, extract, destroy, etc. One travels to a destination and operates with a single-point focus on completion.

Security, on the other hand, is like defending a perimeter. You have to think broadly across a large area, assessing weaknesses and distributing limited resources where they will have the greatest effect.

This is where a systematic approach, including an accepted set of terminology, becomes key. The CISSP (Certified Information Systems Security Professional) Guide proposes the following taxonomy:

  • Vulnerability

  • Threat agent

  • Threat

  • Risk

  • Control

  • Exposure

  • Safeguard (e.g.,control or enabler)

These terms are best understood in terms of their relationships, which are graphically illustrated Security taxonomy footnote:[similar to CISSP, [119].

security taxonomy
Figure 184. Security taxonomy

In implementing controls, the primary principles are:

  • Availability. The asset must be accessible to those entitled to it.

  • Integrity. The asset must be protected from destruction or corruption.

  • Confidentiality. The asset must not be accessible to those not entitled to it.

10.6.1. Information classification

At a time when the significance of information and related technologies is increasing in every aspect of business and public life, the need to mitigate information risk, which includes protecting information and related IT assets from ever-changing threats, is constantly intensifying.
— ISACA
CObIT 5 for Security

Before we turn to security engineering and security operations, we need to understand the business context of security. The assets at risk are an important factor, and risk management gives us a good general framework. One additional technique associated with security is information classification. A basic hierarchy is often used, such as:

  • Public

  • Internal

  • Confidential

  • Restricted

The military uses the well known levels of:

  • Unclassified

  • Confidential

  • Secret

  • Top Secret

These classifications assist in establishing the security risk and necessary controls for a given digital system and/or process.

Information also can be categorized by subject area. This becomes important from a compliance point of view. This will be discussed in Chapter 11, in the chapter section on records management.

10.6.2. Security engineering

For the next two sections, we will adopt a “2-axis” view, first proposed in [24].

In this model, the systems lifecycle is considered along the horizontal access, and the user experience is considered along the vertical access (which also maps to the “stack.”) In the following picture, we see the distinct concerns of the various stakeholders in the dual-axis model (see Security and the dual-axis value chain).

dual axis view of security
Figure 185. Security and the dual-axis value chain
Consumer versus sponsor perspective

The consumer of the digital service has different concerns from the sponsor/customer (in our 3-party model). The consumer (our woman checking her bank balance) is concerned with immediate aspects of confidentiality, integrity, and availability:

  • Is this communication private?

  • Is my money secure?

  • Can I view my balance and do other operations with it (e.g.,transfer it) confident of no interference?

The sponsor, on the other hand, has derivative concerns:

  • Are we safe from the bad publicity that would result from a breach?

  • Are compliant with laws and regulations or are we risking penalties for non-compliance (as well as risking security issues?)

  • Are our security activities as cost-efficient as possible, given our risk appetite?

Security architecture and engineering

Security engineering is concerned with the fundamental security capabilities of the system, as well as ensuring that any initial principles established for the system are adhered to as development proceeds, and/or as vendors are selected and perhaps replaced over time.

There are multitudes of books written on security from an engineering, architecture and development perspective. The tools, techniques, and capabilities evolve quickly every year, which is why having a fundamental business understanding based on a stable framework of risk and control is essential.

This is a book on management, so we are not covering technical security practices and principles, just as we are not covering specific programming languages or distributed systems engineering specifics. Studying for the Certified Information Systems Security Professional exam will provide both an understanding of security management, as well as current technical topics. A glance at the CISSP guide shows how involved such topics can be:

  • The Harrison-Rizzo-Ullman security model

  • The Diffie-Hellman Asymmetrical Encryption Algorithm

  • Functions and Protocols in the OSI Model

Again, the issue is mapping such technical topics to the fundamentals of risk and control. Key topics we note here include:

  • Authentication & authorization

  • Network security

  • Cryptography

Authentication and authorization are the cornerstones of access,i.e.,the gateway to the asset. Authentication confirms that a person is who they say they are. Authorization is the management of their access rights (can they see the payroll? reset others' passwords?)

Network security is a complex sub-domain in and of itself. Because attacks typically transpire over the Internet and/or internal organizational networks, the structure and capabilities of networks are of critical concern, including topics such as:

  • Routing

  • Firewalls

  • the Domain Name Service

Finally, cryptography is the “storage and transmission of data in a form that only those it is intended for can read and process” [119].

All of these topics require in depth study and staff development. At this writing (mid-2016), there is a notable shortage of skilled security professionals. Therefore, a critical risk is that your organization might not be able to hire people with the needed skills (consider our section on resource management)

Security and the systems lifecycle

Security is a concern throughout the systems lifecycle. You already know this. Otherwise, you would not have reached enterprise scale. But now you need to formalize it with some consistency, as that is what regulators and auditors expect, and it also makes it easier for your staff to work on various systems.

Security should be considered throughout the SDLC, including systems design, but this is easier said than done. Organizations will always be more interested in a system’s functionality than its security. However, a security breach can ruin a company.

The CISSP recommends (among other topics) consideration of the following throughout the systems lifecycle:

  • The role of environmental (e.g.,operating system-level) safeguards versus internal application controls

  • The challenges of testing security functionality

  • Default implementation issues

  • Ongoing monitoring

Increasingly important controls during the construction process in particular are:

  • Code reviews

  • Automated code analysis

Finally, we previously discussed the Netflix Simian Army, which can serve as a form of security control.

Sourcing and security

Vendors come and go in the digital marketplace, offering thousands of software-based products across every domain of interest (we call this the technology product lifecycle). Inevitably, these products have security errors. A vendor may issue a “patch” for such an error, which must be applied to all instances of the running software. Such patches are not without risk, and may break existing systems; they, therefore, require testing under conditions of urgency.

Increasingly, software is offered as a service, in which case it is the vendor responsibility to patch their own code. But what if they are slow to do this? Any customer relying on their service is running a risk, and other controls may be required to mitigate the exposure.

One important source of vulnerabilities is the National Vulnerability Database supported by the National Institute for Standards and Technology. In this database, you can look up various products and see if they have known security holes. Using NVD is complex and not something that can be simply and easily “implemented” in a given environment, but it does represent an important, free, taxpayer-supported resource of use to security managers.

An important type of vulnerability is the “zero-day” vulnerability. With this kind of vulnerability, knowledge of a security “hole” becomes widespread before any patches are available (i.e., the software’s author and users have “zero days” to create and deploy a fix). Zero-day exploits require the fast and aggressive application of alternate controls, which leads us to the topic of security operations.

10.6.3. Security operations

Networks and computing environments are evolving entities; just because they are secure one week does not mean they are secure three weeks later.
— Shon Harris
Guide to the CISSP

Security requires ongoing operational attention. Security operations is first and foremost a form of operations, as discussed in Chapter 6. It requires on-duty and on-call personnel, and some physical or virtual point of shared awareness (for example, a physical Security Operations Center, perhaps co-located with a Network Operations Center). Beyond the visible presence of a Security Operations Center, various activities must be sustained. These can be categorized into four major areas:

  • Prevention

  • Detection

  • Response

  • Forensics

Prevention

An organization’s understanding of what constitutes a “secure” system is continually evolving. New threats continually emerge, and the alert security administrator has an ongoing firehose of bulletins, alerts, patch notifications, and the like to keep abreast of.

These inputs must be synthesized by an organization’s security team into a set of security standards for what constitutes a satisfactorily-configured (“hardened”) system. Ideally, such standards are automated into policy-driven systems configuration approaches; in less ideal situations, manual configuration -— and double-checking -— is required.

Prevention activities include:

  • maintaining signatures for intrusion detection and and anti-virus systems

  • software patching (e.g.,driven by the technology product lifeccyle and updates to the National Vulnerability Database)

  • ongoing maintenance of user authorizations and authentication levels

  • ongoing testing of security controls (e.g., firewalls, configurations, etc.).

  • updating security controls appropriately for new or changed systems

Detection

There are many kinds of events that might indicate some security issue; systems exposed to the open Internet are continually scanned by a wide variety of often-hostile actors. Internal events, such as unscheduled/unexplained system restarts, may also indicate security issues. The challenge with security monitoring is identifying patterns indicating more advanced or persistent threats. When formalized, such patterns are called “signatures.”

One particular form of event that can be identified for systems under management are configuration state changes.

For example, if a core operating system file -— one that is well known and not expected to change -— changes in size one day with no explanation, this might be indicative of a security exploit. Perhaps an attacker has substituted this file with one containing a “backdoor” allowing access. Tools such as Tripwire are deployed to scan and inventory such files and key information about them (“metadata”) and raise alerts if unexpected changes occur. Infrastructure managers such as Chef and Puppet may also serve as inputs into security event management systems; for example, they may detect attempts to alter critical configuration files and in their re-converging, the managed resource back to its desired state, can be a source of valuable information about potential exploits. Such tools also may be cited as controls for various kinds of security risks.

We have discussed the importance of configuration management in both Chapter 2 and Chapter 6. In Chapter 2, we discussed the important concept of Infrastructure as Code and policy-driven configuration management; we revisited the importance of configuration management from an operational perspective in Chapter 6. Configuration management also will re-appear in Chapters 11 and 12.

Important
It should be clear by now that configuration management is one of the most critical enabling capabilities for digital management, regardless of whether you look to traditional IT service management practices or modern DevOps approaches.

Detection activities include:

  • Monitoring events and alerts from intrusion detection and related operational systems

  • Analyzing logs and other artifacts for evidence of exploits

Response

Security incidents require responses. Activities include:

  • Declaring security incidents

  • Marshalling resources (staff, consultants, law enforcement) to combat

  • Developing immediate tactical understanding of the situation

  • Developing a response plan, under time constraints

  • Executing the plan, including ongoing monitoring of effectiveness and tactical correction as needed

  • Keeping stakeholders informed as to situation

Forensics

Finally, security incidents require careful after-the-fact analysis:

  • Analyzing logs and other artifacts for evidence of exploits

  • Researching security incidents to establish causal factors and develop new preventative approaches (thus closing the loop)

Relationship to other processes

As with operations as a whole, there is ongoing monitoring and reporting to various stakeholders, and interaction with other processes.

One of the most important operational processes from a security perspective is Change Management. Configuration state changes (potentially indicating an exploit in progress) should be reconciled first to Change Management records. Security response may also require emergency Change processes. ITSM Event and Incident Management may be leveraged as well.

Note
The particular concerns of security may interfere with cross-process coordination. This is a topic beyond the scope of this book.

10.6.4. Security and assurance

Quis custodiet ipsos custodes?
— Latin for “Who watches the watchers?"

Given the critical importance of security in digital organizations, it is an essential matter for governance attention at the highest levels.

Security management professionals are accountable to governance concerns just as any other manager in the digital organization. Security policies, processes, and standards are frequently audited, by both internal auditors as well as external assurance professionals (not only auditors but other forms of assurance as well).

The idea that an “Assets Protection” group might itself be audited may be hard to understand, but security organizations such as police organizations have Internal Affairs units for just such purposes.

Security auditors might review the security processes mentioned above, or system configuration baselines, or log files, or any number of other artifacts, depending on the goals and scope of a security audit. Actual penetration testing is a frequently used approach: the hiring of skilled “white-hat” hackers to probe an organization’s defenses. Such hackers might be given license to probe as far as they possibly can and return with comprehensive evidence of what they were able to access (customer records, payrolls, account numbers, and balances, etc.).

10.7. Digital governance

…​transforming to an agile delivery approach is difficult. Companies or government agencies that have been in existence for decades have built up layers of deep and wide waste, many of them around governance.

Business or IT governance constructs and processes that do not support new technology-centric go-to-market models and products must be either replaced or substantially evolved; otherwise, they increase risk on the agile projects [emphasis added].
— Scott Ambler and Mark Lines
Disciplined Agile Delivery

As with other chapters, we have presented the material defining this chapter’s topic “on its own terms.” Certainly, there is little chance that the core principles of governance will cease to have importance, even in the new digital economy.

However, as Ambler and Lines note above, the legacy of IT governance is wide, deep, and often wasteful. Approaches based on mis-applied Taylorism and misguided, CMM-inspired statistical process control have resulted in the creation of ineffective, large-scale IT bureaucracies whose sole mission seems to be the creation and exchange of non-value-add secondary artifacts, while lacking any clear concept of an execution model.

What is to be done? Governance will not disappear any time soon. Simply arguing against governance is unlikely to succeed. Instead, this book argues the most effective answer lies in a re-examination of the true concerns of governance:

  • Sustaining innovation and effective value delivery

  • Maintaining efficiency

  • Optimizing risk

These fundamental principles (“top-line,” “bottom-line,” “risk”) define value for organizations around the world, whether for-profit, non-profit, or governmental. After considering the failings of IT governance, we’ll re-examine it in light of these objectives and come up with some new answers for the digital era.

10.7.1. The failings of IT governance

…​many GRC management processes within enterprises are designed and implemented within a command-and-control paradigm. They are highly centralized and are viewed as the purview of specialized GRC teams, who are not held accountable for the outcomes of the processes they mandate. The processes and controls these teams decree are often derived from popular frameworks without regard to the context in which they will be applied and without considering their impact on the entire value stream of the work they affect. They often fail to keep pace with technology changes and capabilities that would allow the desired outcomes to be achieved by more lightweight and responsive means. This forces delivery teams to complete activities adding no overall value, create bottlenecks, and increase the overall risk of failure to deliver in a timely manner.
— Jez Humble et al.
Lean Enterprise

From the perspective of digital transformation, there are many issues with traditional IT governance and the assumptions and practices characterizing it.

The new digital operating model

Consider the idea of “programmability” mentioned at the start of this chapter. A highly “programmable” position is one where the responsibilities can be specified in terms of their activities. And what is the fundamental reality of digital transformation? It is no accident that such positions are called “programmable.” In fact, they are being “programmed away” or “eaten by software"— leaving only higher-skill positions that are best managed by objective, and which are more sensitive to cultural dynamics.

Preoccupation with “efficiency” fades as a result of the decreasingly “programmable” component of work. The term “efficiency” signals a process that has been well defined (is “programmable”) to the point where it is repeatable and scalable. Such processes are ripe for automation, commoditization, and outsourcing, and this is in fact happening.

If the repetitive process is retained within an organization, the drive for efficiency leads to automation, and eventually, efficiency is expressed through concerns for capacity management and the consumption of computing resources. And when such repetitive concerns are not retained by the organization, but instead become a matter of sourcing rather than execution, the emphasis shifts to risk management and governance of the supplier.

The remaining uncertain and creative processes should not just be managed for “efficiency" and need to be managed for effectiveness, including fast feedback, collaboration, culture, and so forth.

Project versus operations as operating model

As we can see illustrated in Governance based on project versus operations footnote:[similar to [142], ISO/IEC 38500 assumes a specific IT operating model, one in which projects are distinct from operations. We have discussed the growing influence of product-centric digital management throughout this book, but as of 2016 the major IT governance standard still does not recognize it. The ripple effects are seen throughout other guidance and commentary. In particular, the project-centric mindset is closely aligned with the idea of IT and the CIO as primarily order-takers.

projects and operations as governance
Figure 186. Governance based on project versus operations
CIO as order-taker

Throughout much of the IT governance guidance, certain assumptions become evident:

  • There is an entity that can be called “The Business.”

  • There is a distinct entity called “IT” (for “Information Technology”).

  • It is the job of “IT” to take direction (i.e. orders) from “The Business” and to fulfill them.

  • There is a significant risk that “IT” activities (and by extension, dollars spent on them) may not be correctly allocated to the preferred priorities of “The Business.” IT may spend money unwisely, on technology for its own sake. This risk needs to be controlled.

  • The needs of “The Business” can be precisely defined and it is possible for “IT” to support those needs with a high degree of predictability as to time and cost. This predictability is assumed even when those activities imply multi-million dollar investments and months or years of complex implementation activities.

  • When such activities do not proceed according to initial assumptions, this is labeled an "IT failure.” It is assumed that the failure could have been prevented through more competent management, “rigorous” process, or diligent governance, especially on the IT side.

There may be organizations where these are reasonable assumptions. (This book does not claim they do not exist). But there is substantial evidence for the existence of organizations for whom these assumptions are untrue.

The fallacies of “rigor” and repeatability
…​it takes more time than you have to prove less than you’d like.
— Cem Kaner et al.
Testing Computer Software
Rigor? Or rigor mortis?
— anonymous

One of the most critical yet poorly understood facts of software development and by extension complex digital system innovation is the impossibility of “rigor.” Software engineers are taught early that “completely” testing software is impossible [147], yet it seems that this simple fact (grounded in fundamentals of computer science and information theory) is not understood by many managers.

A corollary fallacy is that of repeatable processes, when complexity is involved. We may be able to understand repeatability at a higher-level, through approaches like case management or the Checklist Manifesto's submittal schedules, but process control in the formal sense within complex, R&D-centric digital systems delivery is simply impossible, and the quest for it is essentially cargo cult management.

And yet quotes like the following are common in IT governance discussions:

…​the questions a senior manager should ask include: “How good are my IT governance processes at effectively delivering strategic business value year after year?” and “Are my processes repeatable, predictable, and scalable, and are they truly meeting the needs of my business (outside of IT) and my customers?" [187 p. 6].

With all due respect to the author, the value that can be delivered “repeatably,” “year after year” is for the most part commodity production, not innovative product development. Strategy is notably difficult to commoditize.

Another way to view this is in terms of the decline of traditional IT. As you review those diagrams, understand that much of IT governance has emerged from the arguably futile effort to deliver product innovation in a low-risk, “efficient” manner. This desire has led, as Ambler and Lines note at the top of this chapter section, to the creation of layers and layers of bureaucracy and secondary artifacts.

The cynical term for this is “theater,” as in an act that is essentially unreal but presented for the entertainment and distraction of an audience.

As we noted above, a central reality of digital transformation is that commoditized, predictable, programmable, repeatable, “efficient” activities are being quickly automated, leaving governance to focus more on the effectiveness of innovation (e.g.,product development) and management of supplier risk. Elaborate IT operating models specifying hundreds of interactions and deliverables, in a futile quest for "rigor” and “predictability,” are increasingly relics of another time.

10.7.2. Digital effectiveness

Let’s return to the first value objective: effectiveness. We define effectiveness as “top-line” benefits: new revenues and preserved revenues. New revenues may come from product innovation, as well as successful marketing and sales of existing products to new markets (which itself is a form of innovation).

Traditionally, “back-office” information technology was rarely seen as something contributing to effectiveness, innovation, and top-line revenue. Instead, the first computers were used to increase efficiency, through automating clerical work. The same processes and objectives could be executed for less money, but they were still the same back-office processes.

With digital transformation, product innovation and effectiveness is now a much more important driver. Yet product-centric management is still poorly addressed by traditional IT governance, with its emphasis on distinguishing projects from operations.

One tool that becomes increasingly important is a portfolio view. While project management offices may use a concept of “portfolio” to describe temporary initiatives, such project portfolios rarely extend to tracking ongoing operational benefits. Alternative approaches also should be considered such as the idea of an options approach.

10.7.3. Digital efficiency

Efficiency is a specific, technical term, and although often inappropriately prioritized, is always an important concern. Even a digitally-transforming, product-centric organization can still have governance objectives of optimizing efficiency. Here are some thoughts on how to re-interpret the concept of efficiency.

Consolidate the pipelines

One way in which digital organizations can become more efficient is to consolidate development as much as possible into common pipelines. Traditionally, application teams have owned their own development and deployment pipelines, at the cost of great, non-value add variability. Even centralizing source control has been difficult.

This is challenging for organizations with large legacy environments, but full-lifecycle pipeline automation is becoming well understood across various environments (including the mainframe).

Reduce internal service friction

Another way of increasing efficiency is to standardize integration protocols across internal services, as Amazon has done. This reduces the need for detailed analysis of system interaction approaches every time two systems need to exchange data. This is a form of reducing transaction costs and therefore consistent with Coase’s theory of the firm [63].

Within the boundary of a firm, a collaboration between internal services should be easier because of reduced transaction costs. It’s not hard to see that this would be the case for digital organizations: security, accounting, customer relationship management would all be more challenging and expensive for externally-facing services.

However, since a firm is a system, a service within the boundaries of a firm will have more constraints than a service constrained only by the market. The internal service may be essential to other, larger-scoped services, and may derive its identity primarily from that context.

Because the need for the service is well-understood, the engineering risk associated with the service may also be reduced. It may be more of a component than a product. See the parable of the the Flower and the Cog. Reducing service redundancy is a key efficiency goal within the bounds of a system -— more to come on this in Chapter 12.

Manage the process portfolio

Processes require ongoing scrutiny. The term "organizational scar tissue” is used when specific situations result in new processes and policies, that in turn increase transactional friction and reduce efficiency throughout the organization.

Processes can be consolidated, especially if specific procedural detail can be removed in favor of larger-grained case management or Checklist Manifesto concepts including the submittal schedule. As part of eventual automation and digital transformation, processes can be ranked as to how “heavyweight” they are. A typical hierarchy, from “heaviest” to “lightest,” might be:

  • Project

  • Release

  • Change

  • Service request

  • Automated self-service

The organization might ask itself:

  • Do we need to manage this as a project? Why not just a release?

  • Do we need to manage this as a release? Why not just a change?

  • Do we need to manage this as a change? Why not just a service request?

  • Do we need to manage this as a service request? Why is it not fully automated self-service?

As we saw in our examination of the Chubby locking service, there may be a good reason to retain some formality. The point is to keep asking the question. Do we really need a separate process? Or can the objectives be achieved as part of an existing process or another enabler?

Treat governance as demand

A steam engine’s governor imposes some load, some resistance, on the engine. In the same way, governance activities and objectives, unless fully executed by the directing body (e.g.,the board), themselves impose a demand on the organization.

This points to the importance of having a clear demand/execution framework in place to manage governance demand. The organization does not have an unlimited capacity for audit response, reporting, and the like. In order to understand the organization as a system, governance demand needs to be tracked and accounted for and challenged for efficiency just as any other sort of demand.

Leverage the digital pipeline

Finally, efficiency asks: can we leverage the digital pipeline itself to achieve governance objectives? This is not a new idea. The governance/management interface must be realized va specific enablers, such as processes. Processes can (and often should) be automated. Automation is the raison d’etre of the digital pipeline; if the process can be expressed as user stories, behavior-driven design, or other forms of requirement, it simply is one more state change moving from dev to ops.

In some cases, the governance stories must be applied to the pipeline itself. This is perhaps more challenging, but there is no reason the pipeline itself cannot be represented as code and managed using the same techniques. The automated enablers then can report their status up to the Monitoring activity of governance, closing the loop. Auditors should periodically re-assess their effectiveness.

10.7.4. Digital risk management

Poorly governed and managed information and technology will destroy value or fail to deliver benefits…​
— CObIT 5 for Risk

Finally, from an IT governance perspective, what is the role of IT risk management in the new digital world? It’s not that risk management goes away. Many risks that are well understood today will remain risks for the foreseeable future. But there are significant new classes of risk that need to be better understood and managed:

  • Unmanaged demand and disorganized execution models leading to multi-tasking, which is destructive of value and results

  • High queue wait states, resulting in uncompetitive speed to deliver value

  • Slow feedback due to large batch sizes, reducing effectiveness of product discovery

  • New forms of supplier risk, as services become complex composites spanning the Internet ecosystem.

  • Toxic cultural dynamics destructive of high team performance

  • Failure to incorporate cost of delay in resource allocation and work prioritization decisions

All of these conditions can reduce or destroy revenues, erode organizational effectiveness, and worse. It is hard to see them as other than risks, yet there is little attention to such factors in the current (as of late 2016) “best practice” guidance on risk.

Cost of delay as risk

In today’s digital governance there must be a greater concern for outcome and effectiveness, especially in terms of time to market (minimizing cost of delay). Previously, concerns for efficiency might lead a company to overburden its staff, resulting in queuing gridlock, too much work in process, destructive multitasking, and ultimately failure to deliver timely results (or deliver at all).

Such failure to deliver was tolerated because it seemed to be a common problem across most IT departments. Also relevant is the fact that digital transformation had not taken hold yet. IT systems were often a back office, and delays in delivering them (or significant issues in their operation) were not quite as damaging.

Now, the effectiveness of delivery is essential. The interesting, and to some degree unexpected result, is that both efficiency and risk seem to be benefiting as well. Cross-functional, focused teams are both more effective and more efficient, and able to manage risk better as well. Systems are being built with both increased rapidity as well as improved stability, and the automation enabling this provides robust audit support.

Team dynamics as risk

We’ve covered culture in some depth in Chapter 7. Briefly, from a governance perspective:

The importance of organizational culture has been discussed by management thinkers since at least W.E. Deming. In a quote often attributed to Peter Drucker, “culture eats strategy for breakfast.” But it has been difficult at best to quantify what we mean by culture.

Quantify? Some might even say quantification is impossible. But Google and the State of DevOps research have done so. Google has established the importance of psychological safety in forming effective, high-performing teams [231]. And the State of DevOps research, applying the Westrum typology, has similarly confirmed that pathological, controlling cultures are destructive of digital value [215].

These facts should be taken seriously in digital governance discussions. So-called “toxic” leadership (an increasing concern in the military itself [276]) is destructive of organizational goals and stakeholder value. It can be measured and managed and should be a matter of attention at the highest levels of organizational governance.

Sourcing risk

We have already covered contracting in terms of software and cloud. But in terms of the emergence model, it is typical that companies enter into contracts before having a fully mature sourcing and contract management capability with input from the governance, risk, and compliance perspective.

We’ve touched on the issues of cloud due diligence and sourcing and security in this chapter. The 2e2 case discussed is interesting; it seems that due diligence had actually been performed. Additional controls could have made a key difference, in particular business continuity planning.

There are a wide variety of supplier-side risks that must be managed in cloud contracts:

  • Access

  • Compliance

  • Data location

  • Multi-tenancy

  • Recovery

  • Investigation

  • Viability (assurance)

  • Escrow

We’ve emphasized throughout this book the dynamic nature of digital services. This presents a challenge for risk management of digital suppliers. This year’s audit is only a point-in-time snapshot; how to maintain assurance with a fast-evolving supplier? This leading edge of cloud sourcing is represented in discussions such as “Dynamic certification of Cloud services: Trust, but verify!":

the on-demand, automated, location-independent, elastic, and multi-tenant nature of cloud computing systems is in contradiction with the static, manual, and human process-oriented evaluation and certification process designed for traditional IT systems…​

Common certificates are a backward look at the fulfillment of technical and organizational measures at the time of issue and therefore represent a snapshot. This creates a gap between the common certification of one to three years and the high dynamics of the market for cloud services and providers.

The proposed dynamic certification approach adopts the common certification process to the increased flexibility and dynamics of cloud computing environments through using of automation potential of security controls and continuous proof of the certification status
[171].

It seems likely that such ongoing dynamic evaluation of cloud suppliers would require something akin to Simian Army techniques, discussed below.

Beyond increasing supply-side dynamism, risk management in a full SIAM (Supplier Integration and Management) sense is compounded by the complex interdependencies of the services involved. All of the cloud contracting risks need to be covered, as well as further questions such as

  • If a given service depends on two sub-services (“underpinning contracts”), what are the risks for the failure of either or both of the underpinning services?

  • What are the controls?

10.7.5. Automating digital governance

Digital exhaust

One governance principle we will suggest here is to develop a governance architecture as an inherent part of the delivery system, not as an additional set of activities. We use the concept of “digital exhaust” to reinforce this.

What is “digital exhaust"?

Digital exhaust, for the purposes of this book, consists of the extraneous data, and information that can be gleaned from it, originating from the development and delivery of IT services.

Consider an automobile’s exhaust. It does not help you get to where you are going, but it’s an inevitable aspect of having an internal combustion engine. Since you have it, you can monitor it and gain certain insights as to whether your engine is running efficiently and effectively. You might even be able to identify if you are at risk of an engine failure.

The term “digital exhaust” is also applied to the data generated from the Internet of Things. This usage is conceptually aligned to our usage here, but somewhat different in emphasis.

To leverage digital exhaust, focus on the critical, always-present systems that enable digital delivery:

These systems constitute a core digital pipeline, one that can be viewed as an engine producing digital exhaust. This is in contrast to fragmented, poorly-automated pipelines, or organizations with little concept of pipeline at all. Such organizations wind up relying on secondary artifacts and manual processes to deliver digital value.

governance based on secondary artifacts
Figure 187. Governance based on activities and artifacts

The illustration in Governance based on activities and artifacts represents fragmented delivery pipelines, with many manual processes, activities, and secondary artifacts (waterfall stage approvals, designs, plans, manual ticketing, and so forth). Much IT governance assumes this model, and also assumes that governance must often rely on aggregating and monitoring the secondary artifacts.

In contrast (see Governance of digital exhaust) a rationalized continuous delivery pipeline, governance increasingly can focus on monitoring the digital exhaust.

governance based on digital exhaust
Figure 188. Governance of digital exhaust

What can we monitor with digital exhaust for the purposes of governance?

  • Development team progress against backlog

  • Configuration management

  • Conformance to architectural standards (through inspection of source and package managers, code static analysis, and other techniques)

  • Complexity and technical debt

  • Performance and resource consumption of services

  • Performance of standards against automated hardening activities (e.g.,Simian Army)

As noted above, certain governance objectives may require the pipeline itself to be adapted, e.g., the addition of static code analysis, or implementation of hardening tools such as Simian Army.

Additional automation

The DevOps Audit Toolkit

The DevOps Audit Toolkit provides an audit perspective on pipeline automation [80]. This report provides an important set of examples demonstrating how modern DevOps toolchain automation can fulfill audit objectives as well or better than “traditional” approaches. This includes a discussion of alternate approaches to the traditional control of “separation of duties” for building and deploying code. These approaches include automated code analysis and peer review as a required control.

There are a variety of ways the IT pipeline can be automated. The Calavera simulation [27] shows a simplified end to end approach. Many additional components are seen in real-world pipelines:

  • Static code analyzers

  • Automated user interface (UI) testing

  • Load testing

  • More sophisticated continuous deployment infrastructure

and much more.

Additionally, there may still be a need for systems that are secondary to the core pipeline.

  • Service or product portfolio

  • Workflow and Kanban-based systems (one notable example is workflow to ensure peer review of code)

  • Document management

There may also be a risk repository if the case can’t be made to track risks using some other system. The important thing to remember when automating risk management is that risks are always with respect to some thing.

A risk repository needs to be integrated with subject inventories, such as the service portfolio and relevant source repositories and entries in the package manager. Otherwise, risk management will remain an inefficient, highly manual process.

What are the things that may present risks?

  • Products/services

    • Their ongoing delivery

    • Their changes & transformations (Releases)

    • Their revenues

  • Customers and their data

  • Employees and their positions

  • Assets

  • Vendors

  • Other critical information

10.8. Conclusion

Governance and its related topics are a broad and complex domain, one that represents a career path as you gain experience and seniority in your organization. It is critical to understand the motivations and objectives of governance-driven initiatives. While it can be tempting to dismiss governance-related activities as “bureaucracy,” hopefully after this chapter you have a greater understanding, and will be able to engage with governance in a constructive manner.

The following principles may be useful:

  • Treat governance objectives, as much as possible, as simply a form of digital requirement to be handled through the delivery pipeline

  • Consider the new classes of risk to digital effectiveness and innovation: overburden, multi-tasking, overloaded queues, and toxic culture

  • Automate governance objectives throughout the pipeline. Avoid manual processing wherever possible.

  • Understand that poorly conceived governance strategies and tactics can themselves constitute a risk to effective digital delivery.

10.8.1. Discussion questions

  1. Research further and discuss one of these IT disasters.

  2. Propose how the disaster could have been avoided or mitigated using the goverance concepts presented in this chapter.

  3. Does the term “governance” bother you? Why or why not? What about the term “control"?

  4. Debate, as a team, the proposition that “There is no realistic way to distinguish between governance and management.” One side of the team to argue for, the other to argue against. Present your best arguments to the rest of the class.

10.8.2. Research & practice

  • Beyond the brief discussion in this chapter, research how venture capital firms govern their portfolios of startup investments. Compare and contrast to public company governance.

  • Find a copy of the classic William Gibson short story, “Burning Chrome.” Write an analysis of it using the concepts described in this chapter.

  • Find a brief discussion of the human immune system, and compare and contrast it with the Netflix Simian Army.

  • Compare some well known technologies you use against the NIST National Vulnerability Database. Are there any issues? How would you fix them?

  • Using infrastructure as code, develop a demonstration environment employing some member of the Simian Army.


1. Credit: I got this analogy from hearing Brian Barnier speak at an ISACA meeting around 2011.
2. Image credit https://www.flickr.com/photos/garryknight/11240024613, commercial use permitted.
3. Public domain Image via Wikipedia
4. Synthesized from various sources including ISO 38500 and CObIT
5. Reflects concepts from [140, 133]
6. Image credits https://www.flickr.com/photos/tao_zhyn/442965594, https://www.flickr.com/photos/peagreenchick/396463634/, https://www.flickr.com/photos/intelfreepress/6722296265/, commercial use allowed for all