Tracking solution architecture risk is a critical activity for a great solution architect. In prior articles, we provided a primer on solution architecture risks, provided strategies for identifying solution architecture risk, and are now going to delve into a solution architecture risk register.
First, and this is important, while all risks are worth tracking, the risks that are the responsibility of a solution architect are the risks related to architecture decisions. If you refer back to the primer on solution architecture risks, you may remember that we defined risk as uncertain outcomes with negative consequences. Therefore, when a solution architect makes an architecture decision, and the outcome is not certain, he or she should identify and track potential negative consequences of those decisions.
In the risk primer, we introduced the concept of delivery vs. production risks, basically separating the risks leading up to production deployment (delivery risks) from those that exist once the solution is live in production (production risks). There was a method to that madness! Both can be a result of solution architecture decisions. Still, the delivery manager (project manager, scrum master, etc.) that is responsible for the delivery is the best person to manage the delivery risks. As a result, our recommended best practice for delivery risks is to hand them off to the delivery manager.
If you recall from the primer, production risks are the category of risks that exist once the solution has gone live. In an organization that has formal Enterprise Risk Management, these would be tracked and managed in an operational risk register. If that does not describe your organization, then they can be tracked in the architecture document for the design to which they apply. Position them near the top of the document so they are always top of mind for you and your audience.
What could happen that has a negative consequence? Describe the risk that an architecture choice has created.
Why do we care? What will the impact on the organization be if the risk is realized, e.g. it actually happens?
Identify the severity of the risk on a scale: negligible, marginal, critical, catastrophic. Provide rubrics for each value appropriate for your business or organization. e.g., if you are a bank, a non-financial impact on a single customer might be negligible, while unrecoverable financial transaction errors for all customers might be catastrophic. You can also use an alternative scale, like: high, medium, low. Whatever scale you use, you should always provide a rubric.
Identify the likelihood of the risk occurring on a scale: rare, unlikely, possible, likely, certain. Similar to severity, it can be helpful to provide a rubric. Rubrics could be based on likelihood percentage, the expected frequency of occurrence, or even by comparison with some event with which the audience would be familiar. e.g., A meteor hitting the data center is rare, but a power brownout is likely. As with severity, you can also use an alternative scale, but you should provide a rubric.
The narrative for handling the risk. All the different ways to treat risk could fill an article, or a book - or this Wikipedia article. Some common ones for architecture risks are:
Risk | Customer information in the Operational Data Store (ODS) will not match admin platform if the data warehouse daily loading batch cycle does not complete within the processing window. |
Business Impact | Customer-facing systems utilizing the ODS (call center, web, mobile) will report incorrect balances. |
Severity | Critical |
Likelihood | Possible |
Response | 1. Prioritize ODS data loads that include customer accessed data. 2. Add “last updated” field to customer-facing systems and display warning if more than 24h. 3. Monitor critical load processes and define manual procedures to load. |
Now what? Well, that depends. It depends on how formal and mature risk processes are within your organization. If processes exist, we recommend aligning with those existing processes for approving and managing operational risks. If such processes are not in place, tracking these risks is still worth the effort. Capturing the risks, even as an embedded solution architecture risk register in the architecture products and processes, enables better fact-based discussions, which result in better design outcomes, and provides a trigger for improvement in the future when further design occurs.
Close! Regarding risk articles, anyway. We may be done with the solution architecture risk register, but we are are not yet done with solution architecture risk. We have one more article left in us on how to use risk as a lever for better architecture.
Risk is just one of the topics included in our solution architecture training curriculum. Drop us a line at [email protected], call (401) 340-1400, or contact us to learn more. Like the tagline says, our reputation is our success. If we can do great things for you, we will. If we can’t, we’ll say so.