Combination Attacks and Metrics

I think the most important benefit of blackbox and graybox application testing is the screenshots. It's actually not very helpful to a developer to just tell them "you had x number of XSS", but to show them the steps that were taken in order to do something really condemning to the application (steal credentials, money, install malware, start a worm, etc.) in order that those who receive it get a better picture of how it really impacts them.

However, at some point, metrics have to be made. Somebody has to know if site A is "better" than site B, or if site A is improved over last year.

But do combination attacks change the picture? When measured alone, 9.76% of websites were vulnerable to HTTP Response Splitting attacks, but only a trace had Insufficient Authentication, Authorization, or Session Expiration problems. I would assume Session Fixation would lie in one of those three. But how many sites actually had both? Where an attacker can send a URL to a victim with a known session iD, the client takes that ID because of a response splitting (actually, header injection in this case) problem, and the attacker waits for the victim to authenticate (not receiving a new session ID). Either of the problems alone is pretty bad. But in combination, that's very serious.

Another example would be cross-site scripting and key exposure. If a site has key exposure problems where an authenticated user can find the email address of all the users in the system by enumerating key numbers, there's an opportunity for spearphishing. But if you combine that with an anonymous, reflected Cross-site Scripting exploit, you can make a very compelling attack against users you know to be legitimate users.

How do folks roll those combinations into their metrics? I suppose that's where a consistent scoring algorithm such as CVSS (maybe not CVSS exactly) that takes multiple factors (combinations of vulnerabilities, for example) into account is helpful in keeping metrics that still show the weight of combined problems.


  1. Check out Chris Wysopal's CWE Scoring from Metricon 2.0.

    Also, my favorite on this subject comes from Beizer in "Software Testing Techniques, 2E":

    Consequences: What are the consequences of the bug? You might measure this by the mean size of the awards made by juries to the victims of your bug.

    A reasonable metric for bug importance is:

    importance($) = frequency*(correction_cost + installation_cost + consequential_cost)

    Frequency tends not to depend on application or environment, but correction, installation, and consequential costs do. As designers, testers, and QA workers, you must be interested in bug importance, not raw frequency. Therefore you must create your own importance model.

    That's just how the chapter on Bug Taxonomy starts off. It just gets better and better over the next 100+ paragraphs that follow, with the Table 2.1 on Bug Statistics right before the summary. Speaking of which - here it is:

    1. The importance of a bug depends on its frequency, the correction cost, the consequential cost, and the application. Allocate your limited testing resources in proportion to the bug's importance.
    2. Use the nightmare list as a guide to how much testing is required.
    3. Test techniques are like antibiotics - their effectiveness depends on the target - what works against a virus may not work against bacteria or fungi. The test techniques you use must be matched to the kind of bugs you have.
    4. Because programmers learn from their mistakes, the effectiveness of test techniques, just as antibiotics, erodes with time. TEST SUITES WEAR OUT.
    5. A comprehensive bug taxonomy is a prerequisite to gathering useful bug statistics. Adopt a taxonomy, simple or elaborate, but adopt one and classify all bugs within it.
    6. Continually gather bug statistics (in accordance to your taxonomy) to determine the kind of bugs you have and the effectiveness of your current test techniques against them.

  2. I'm curious; where does that 9.76% come from?

    I'd question that statistic containing session fixation since I have seen very few applications with no session fixation issues, and most of those had worse issues (say like using user IDs in cookies for auth...).

    Or maybe people just aren't testing for Session Fixation?

  3. @kuza55 - the 9.76% for HTTP Response Splitting came from the Web Application Security Consortium statistics project at http://webappsec.org/projects/statistics/ . Their numbers for all sorts of session issues (Insufficient Authentication, Insufficient Authorization, and Insufficient Session Expiration) was a sum total of 0.01% of all sites.

    I suspect that the reason that Session Fixation doesn't show up as a proper category is because it's generally accomplished by some other failure - XSS or Response Splitting (or both), or that, as you said, it's often not checked for.