Risk analysis is often viewed as a “black art”—part fortune telling, part mathematics. Successful architecture risk analysis, however, is nothing more than a business-level decision-support tool: it’s a way of gathering the requisite data to make a good judgment call based on knowledge about vulnerabilities, threats, impacts, and probability.
Established risk-analysis methodologies possess distinct advantages and disadvantages, but almost all of them share some good principles as well as limitations when applied to modern software design. What separates a great software risk assessment from a merely mediocre one is its ability to apply classic risk definitions to software design and then generate accurate mitigation requirements. A high-level approach to iterative risk analysis should be deeply integrated throughout the software development life cycle.1 In case you’re keeping track, Figure 1 shows you where we are in our series of articles about software security’s place in the software development life cycle.
Example risk-analysis methodologies for software usually fall into two basic categories: commercial (including Microsoft’s STRIDE, Sun’s ACSM/SAR, Insight’s CRAMM, and Black Duck's SQM) and standards-based (from the National Institute of Standards and Technology’s ASSET or the Software Engineering Institute’s OCTAVE). An in-depth analysis of all existing methodologies is beyond our scope, but we’ll look at basic approaches, common features, strengths, weaknesses, and relative advantages and disadvantages.
As a corpus, “traditional” methodologies are varied and view risk from different perspectives. Examples of basic approaches include
Each basic approach has its distinctly different merits, but they almost all share some valuable concepts that should be considered in any risk analysis. We can capture these commonalities in a set of basic definitions:
Although they start with these basic definitions, risk methodologies usually diverge on how to arrive at specific values. Many methods calculate a nominal value for an information asset, for example, and attempt to determine risk as a function of loss and event probability. Others rely on checklists of threats and vulnerabilities to determine a basic risk measurement
One classic risk-analysis method expresses risk as a financial loss, or annualized loss expectancy, based on the following equation:
ALE = SLE × ARO,
where SLE is the single loss expectancy, and ARO is the annualized rate of occurrence (or the predicted frequency of a loss event happening).
Let’s consider an Internet-based equities trading application with a vulnerability that could result in unauthorized access (the implication being that unauthorized stock trades can be made). Assume a risk analysis determines that middle- and back-office procedures will catch and negate any malicious transaction such that the loss associated with the event is simply the cost of backing out of the trade. We’ll assign a cost of $150 for any such event, so SLE=$150. With an ARO of just 100 such events per year, the cost to the company (or ALE) will be $15,000. The resulting dollar figure provides no more than a rough yardstick, albeit a useful one, for determining whether to invest in fixing the vulnerability. Of course, for our fictional equities trading company, a $15,000 annual loss might not be worth getting out of bed for (typically, a proprietary trading company’s intraday market risk dwarfs such an annual loss figure).
Other methods take a more qualitative route. In the case of a Web server providing a company’s face to the world, the Web site’s defacement might be difficult to quantify as a financial loss (although some studies indicate a link simply between security events and negative stock-price movements2). In cases in which “intangible assets” are involved (such as reputation), qualitative risk assessment might be a more appropriate way to capture the loss.
Regardless of the technique used, most practitioners advocate a return on investment study to determine whether a given countermeasure is cost-effective for achieving the desired security goal. Adding applied cryptography to an application server via native APIs without the aid of dedicated hardware acceleration might be cheap in the short term, for example, but if it results in a significant loss in transaction volume throughput, a better ROI might come from investing up front in crypto acceleration hardware. Interested organizations should adopt the risk-calculation methodology that best reflects their needs.
Most risk-analysis process descriptions emphasize identification, ranking, and mitigation as continuous processes and not just a single step to be completed at one stage of the development life cycle. Risk-analysis results and risk categories tie in with both requirements (early in the life cycle) and testing (where developers can use results to define and plan particular tests).
Because it’s a specialized subject, risk analysis is not always best performed solely by the design team. Rigorous risk analysis relies heavily on an understanding of business impacts, which requires an understanding of laws and regulations as well as the business model supported by the software. Because developers and designers build up certain assumptions regarding their system and the risks it faces; at a minimum, risk and security specialists should assist in challenging those assumptions against generally accepted best practice. They’re in a better position to “assume nothing.”
Putting the right people together for an analysis is important: consider the risk team very carefully. Knowledge and experience cannot be overemphasized because risk analysis is not a science, and broad knowledge of vulnerabilities, bugs, flaws, and threats is a critical success factor.
A prototypical analysis involves several major activities that often include several basic substeps:
Learn as much as possible about the analysis target (substeps include reading and understanding specifications, architecture documents, and other design materials; discussing and brainstorming with the group; determining system boundary and data sensitivity/criticality; playing with the software if it exists in an executable form; studying the code and other software artifacts; and identifying threats and agreeing on relevant sources of threat).
Figure A on Black Duck's solution shows one commercial example that follows this basic approach.
Figure A. Black Duck's risk-management framework. Many aspects of frameworks such as this can be automated—for example, risk storage, business risk to technical risk mapping, and the display of status over time.
Design-level analysis is knowledge intensive. Microsoft’s STRIDE model, for example, involves the understanding and application of several threat categories during analysis.3 Similarly, Black Duck's SQM approach uses attack patterns4 and exploit graphs to understand attack resistance, knowledge of design principles for ambiguity analysis,5 and knowledge regarding commonly used frameworks (.NET and J2EE being two examples) and software components.
A central activity in design-level risk analysis is to build up a consistent view of the target system at a reasonably high level. The idea is to see the forest, not get lost in the trees. The most appropriate level for this description is the typical “white board” view of boxes and arrows describing the interaction of various critical design components. The nature of software systems leads many developers and analysts to assume (incorrectly) that a code-level description of software is sufficient for spotting design problems. Although this might occasionally be true, it does not generally hold. Extreme programming’s claim that “the code is the design” represents one radical end of this approach. Without a white-board level of description, an architectural risk analysis is likely to overlook important risks related to flaws.
Previous articles in this series consider security requirements definitions and discuss abuse cases as a method for generating requirements. In the purest sense, risk analysis begins at this point: design requirements should take into account the risks you’re trying to counter. Let’s look at three approaches to interjecting a risk-based philosophy into the requirements phase (note that the requirements systems based on UML tend to focus more attention on security functionality than they do on misuse and abuse cases):
A key variable in the risk equation is impact. Business impacts generally boil down into three broad categories:
The first step to risk analysis at the requirements stage is to break down requirements into three simple categories: must haves, important to haves, and nice but unnecessary. Unless you’re running an illegal operation, you should always class laws and regulations into the first category—these requirements should be instantly mandatory and not subject to further risk analysis (although an ROI study can help you select the most cost-effective mitigations). If the law requires you to protect private information, for example, this requirement is compulsory and should not be subject to a risk-based decision. Why? Because the government has the power to put you out of business, which is the mother of all risks (if you want to test government regulators on this one, go right ahead).
You’re then left with risk impacts—the ones that have as variables potential impact and probability—that must be managed in other ways. Examples of mitigations range from technical protections and controls, to business decisions for living with the risk. At the initial requirements definition stage, you might be able to make some assumptions regarding which controls are necessary.
Evenly applying these simple ideas will put you ahead of most application developers. As you move toward the design and build stages, risk analysis should begin to test your first assumptions from the requirements stage by testing the threats and vulnerabilities inherent in the design.
Traditional risk-analysis output is difficult to apply directly to modern software design. Even assuming a high level of confidence in the ability to predict the dollar loss for a given event and performing Monte Carlo distribution analysis of prior events to derive a statistically sound probability distribution for future events, there’s still a large gap between an ALE’s raw dollar figure (as discussed earlier) and a detailed software security mitigation definition.
A more worrying concern is that traditional risk-analysis techniques do not necessarily provide an easy guide (not to mention an exhaustive list) of all potential vulnerabilities and threats to consider at a component/environment level. This is why a large knowledge base and lots of experience is invaluable. The thorny knowledge problem arises in part because modern applications, including Web services applications, are designed to span multiple boundaries of trust. The vulnerability of—and threat to any given component varies with the platform on which that component exists (think C# on a Windows .NET server versus J2EE on Tomcat/Apache/Linux) and the environment in which it lives (think secure DMZ versus directly exposed LAN). However, few traditional methodologies adequately address the contextual variability of risk given changes in the core environment. This is a fatal flaw when considering highly distributed applications or Web services.
In modern frameworks such as .NET and J2EE, security methods exist at almost every layer, yet too many applications today rely on a “reactive” protection infrastructure that only provides protection at the network transport layer. This is too often summed up by saying, “We’re secure because we use SSL and implement firewalls,” which opens the door to all sorts of problems such as those engendered by port 80 attacks, SQL injection, class spoofing, and method overwriting (to name just a few).
One approach to overcoming these problems is to start looking at software risk analysis on a component-by-component, tier-by-tier, and environment-by-environment level and then apply the principles of measuring threats, vulnerabilities, and impacts at each level.
At the design stage, any risk-analysis process should be tailored to software design. Recall that the object of this exercise is to determine specific vulnerabilities and threats that exist for the software and assess their impact. A functional decomposition of the application into major components, processes, data stores, and data communication flows, mapped against the environments across which the software will be deployed, allows for a desktop review of threats and potential vulnerabilities. We cannot overemphasize the importance of using a forest-level view of a system during risk analysis. Some sort of high-level model of the system (from a whiteboard with boxes and arrows to a formally specified mathematical model) makes risk analysis at the architectural level possible.
Although we could contemplate using modeling languages such as UMLsec to attempt to model threats, even the most rudimentary analysis approaches can yield meaningful results. Consider Figure 2, which shows a simple four-tier deployment design pattern for a standard-issue Web-based application. If we apply risk-analysis principles to this level of design, we can immediately draw some useful conclusions about the application’s security design.
During the risk-analysis process, we use the high-level design to consider
In the simple example shown in Figure 2, each tier exists in a different security realm or trust zone. This fact immediately gives us the context of the threat each tier faces. If we go on to superimpose data types (such as user-logon credentials, records, and orders), their flows (logon requests, record queries, and order entries), and, more importantly, their security classifications, we can draw conclusions about the protection for these data elements and their transmission given the current design.
Suppose that SSL protects user-logon flows between the client and the Web server. Our deployment pattern indicates that although the encrypted tunnel terminates at this tier (because of the inherent threat in the zones occupied by the Web and application tiers), we really must prevent eavesdropping inside and between these two tiers as well. This might indicate the need to establish yet another encrypted tunnel or to consider a different approach to securing this data (maybe message-level encryption instead of tunneling).
Considering the communications risks, it becomes clear why a deployment pattern is valuable, because it lets us consider infrastructure (operating system and network) security mechanisms and application-level mechanisms as risk-mitigation measures.
Decomposing software on a component-by-component basis to establish trust zones is a comfortable way for most software developers and auditors to begin adopting a risk-management approach to software security. Because most systems, especially those exhibiting the n-tier architecture, rely on several third-party components and a variety of programming languages, defining zones of trust and taking an outside/in perspective similar to the one normally found in traditional security has clear benefits. In any case, interaction of different products and languages is an architectural element likely to be a vulnerability hotbed.
At its heart, decomposition is a natural way to partition a system. Given a simple decomposition, security professionals will be able to advise developers and architects about aspects of security they’re familiar with, such as network-based component boundaries and authentication. However, the composition problem is unsolved and very tricky—even the most secure components can be assembled into an insecure mess.
As organizations become adept at identifying vulnerability and its business impact, the risk-analysis team should evolve the basic approach to include additional assessment of the risks found within—or encompassing all—tiers. This evolution can uncover technology-specific vulnerabilities based on failings other than trust issues across tier boundaries. Examples of more subtle risks that can only be flushed out with a more sophisticated approach include transaction management risks and luring attacks
Risk analysis is, at best, a good general-purpose yardstick by which we can judge our security design’s effectiveness. Because roughly 50 percent of security problems are the result of design flaws, performing a risk analysis at the design level is an important part of a solid software security program. Taking the trouble to apply risk-analysis methods at the design level for any application often yields valuable, business-relevant results. The process of risk analysis is continuous and applies to many different levels, at once identifying system-level vulnerabilities, assigning probability and impact, and determining reasonable mitigation strategies. By considering the resulting ranked risks, business stakeholders can determine how to manage particular risks and what the most cost-effective controls might be.
We thank John Steven and Stan Wisseman for their insightful comments on early drafts of this work. We also thank Bruce Phillips of Fidelity National Financial.
1. G. McGraw, “Software Security,” IEEE Security & Privacy, vol. 2, no. 2, 2004, pp. 80–83.
2. H. Cavusoglu, B. Mishra, and S. Raghunathan, The Effect of Internet Security Breach Announcements on Market Value of Breached Firms and Internet Security Developers, tech. report, Univ. of Texas at Dallas, School of Management, Feb. 2002; www.ut dallas.edu/~huseyin/breach.pdf.
3. M. Howard and D. LaBlanc, Writing Secure Code, 2nd ed., Microsoft Press, 2003.
4. G. Hoglund and G. McGraw, Exploiting Software, Addison-Wesley, 2004.
5. J. Viega and G. McGraw, Building Secure Software: How to Avoid Security Problems the Right Way, Addison-Wesley, 2001.
6. G. Sindre and A.L. Opdahl, “Eliciting Security Requirements by Misuse Cases,” Proc. 37th Technology of Object-Oriented Languages and Systems (TOOLS-37), IEEE CS Press, 2000.
Denis Verdon is senior vice president of corporate information security at Fidelity National Financial. He has 21 years experience in Information Security and IT, much of it gained while working both as a senior information security executive and as a consultant to senior security executives at Global 200 companies across 19 countries. Contact him at [email protected].