Understanding the flow of information through a solution architecture is critical for understanding any architecture. An Information Flow diagram is a simple approach for depicting the data movement aspect of any architecture.
Capturing the flow of information is a popular and effective approach for understanding a solution architecture, so various notations have been used across the years. Before I dig into our favorite, I present to you a rogue's gallery of information flows!
As mentioned above, we use a notation based on UML to capture information flows, but it is a very basic use of "boxes and lines."
You label each box with the name of a component, show the flow of information using a dotted line with an arrow, and label the information near the arrow. Seem simple? Perfect!
Each rectangular box is called a component. In the UML notation, a flow can be identified between any two items (components, classes, objects, nodes, etc.), but we keep things simple and focus on architecturally significant components. In this case, we use it to identify any places in the architecture where information "rests." It may be the system of record for the information or a temporary storage location. See the caveats below about information hubs!
A dotted line with an arrow indicates the direction of information flow between any two components.
The label on the line provides a high-level description of the information that is flowing. In formal UML, the line would also have a "<<flow>>" stereotype on the label. Since we are *only* showing flows on the diagram, we leave that off to keep it simple. The label should not describe technical details of the information transfer (we have other diagrams for that), but the nature of the information, e.g. "accounts."
These are all methods to add additional information to an information flow diagram, but use all of these sparingly. Less is always more when diagramming - anything on the page that does not contribute to communicating your message distracts from the message.
You can use a UML Note during drafting to capture open questions and when publishing to provide clarifications. You should use sparingly on completed diagrams: too many notes with additional information often is a sign that you might be trying to communicate something that you could depict better using a different notation.
You can use Bounding boxes, swimlanes (tiers), or even placement on the page to add an additional dimension to the components. For example, a subset of components could be within a bounding box labeled "partner systems." Remember to draw bounding boxes with dotted borders, to differentiate them from nested components.
Nested components can come in handy when some of the information flows from a component (like a data warehouse) and other information flows from a nested component *within* the other component (like a reference master), and it is important to your architecture to highlight that fact. Both this and the prior "bounding boxes," are, in essence, boxes inside boxes with two primary differences: (1) you depict bounding boxes with a dotted border and nested components with a solid line, and (2) arrows cannot be connected to bounding boxes.
Parentheticals on labels are sometimes helpful for calling out differences. A batch information flow diagram can have a note at the top that reads "all transfers are daily unless otherwise noted." You then label specific flows with "monthly" or "hourly" in parenthesis after the description of the flow.
While you can use fonts, colors, and line styles to add additional information to the diagram, be careful. It is a very short trip from additional helpful information to an eyesore peppered with distractions.
We keep this diagram pretty simple by design, so don't have too many advanced maneuvers, but...
If you are depicting a sequence of information flows where the order of the flows is essential, you can number them. However, needing to do this is usually a good indicator that you would be better off creating a UML sequence diagram.
If you have a lot of flows in both directions between systems, you can connect them with a dotted line and then include each flow's direction on the label. This is the style of most of the information flows we create!
By "information hub," I refer to any component that acts as a nexus for the information - e.g., a data hub, an ETL hub, service hub, or file management hub. It serves as a "FedEx" shipping hub of information: all the information flows through it, in from sources and out to destinations. The problem is that by including it on the diagram, you lose sight of where the information is actually flowing because it all flows in and out of the hub.
By including the information hub, we hide where the data is actually flowing. The example shows an ETL hub, but the same holds for API gateways, Enterprise Service Buses, and File Management hubs. The issue becomes worse as you add more systems.
There are some strategies for handling information hubs.
If it is not material to the information flow, leave it off. If an organization always uses an integration hub to move data, then there is no clarity lost by leaving it off. If needed, add as a "global" note to the diagram (near the title), e.g., "all data moved using enterprise ETL hub."
Add a bounding box around components using each information hub and label accordingly.
The only time it is appropriate to include an information hub as a component on an information flow is if you are doing a very abstracted view of the architecture and all the components are abstractions. This usually only occurs if you create an abstract view of the entire enterprise's data flow or diagram the hub architecture itself.
Here are some tips to get you on your way. I am going to skip general diagramming best practices and focus on some specific to the information flow.
So much! They are the best solution architecture tool in an architect's toolbox, next to the solution user diagram, because most people intuitively understand a diagram showing information flowing between systems. Since we keep the solution user diagram firmly focused on the people, the information flow is the solution user diagram's sibling, focused on the systems! Some typical uses are:
Enterprise Information Flows. It is excellent for enterprise-level views for the information lifeblood of your organization (although those with tons of systems can be a real challenge to create).
Integration Architectures. "Black-boxing" a technology solution in the middle of the page, then capturing all the information flows in and out, is a great way to inventory all the integrations for a new (or existing) solution. Not only do we use these for solution architecture design, but often as a quick and visual way to inventory all the required integrations during requirement facilitation.
They really can be used effectively to understand architecture at almost any level of abstraction. However, the lower you get, the more likely it is that a UML sequence diagram would bring more value.
There you have it. Let the information flow diagrams begin! You will find them highly effective for designing, communicating, and untangling architectures of all shapes and sizes. If you'd like any help with any of that (we love the untangling) or training you or your team to use visual design more effectively, drop us a line!