In an earlier field note, I explored the idea that testing might be understood as a form of assessment. Testing generates evidence. Assessment uses evidence to reach a judgment. But this raises a more difficult question:
If we want to assess the quality of a system, what would we assess it against?
For a capability assessment, the answer is relatively clear. A reference model describes the capability we expect to find. An assessment model explains how evidence should be collected, interpreted and judged.
For system quality, the equivalent reference model is less obvious. Requirements and acceptance criteria may seem to provide the answer. They describe what the system should do and the conditions under which it can be accepted.
But are they sufficient as a reference model for system quality? I do not think they are.
Requirements are not a complete model of quality
Requirements describe selected expectations. They may address functionality, performance, security, usability and other quality concerns. In a well-developed specification, they can provide a substantial basis for testing and acceptance.
But requirements are still the result of selection. Some expectations are documented. Others remain implicit. Some stakeholders are represented more strongly than others. Some concerns are recognised early, while others become visible only during operation.
A system can satisfy its documented requirements and still be difficult to operate, hard to maintain, inaccessible, fragile under unusual conditions, unsuitable for future growth or inappropriate for its business context.
This does not necessarily mean that the requirements were badly written. It means that requirements describe what has been specified, not necessarily everything that matters.
Acceptance criteria make selected expectations testable
Acceptance criteria make expectations concrete. For example:
Ninety-five per cent of transactions must complete within two seconds.
This gives us something observable and testable. We can determine whether the system meets the threshold. But it does not answer the wider questions:
- Why are two seconds acceptable?
- Which transactions and operating conditions are included?
- What happens to the slowest five per cent?
- Will the threshold remain adequate as demand grows?
- Does it reflect actual user and business needs?
The criterion defines a threshold. It does not explain the significance of that threshold.
The same limitation applies to qualities that are difficult to reduce to a pass-or-fail statement. A system may be maintainable for its current team but difficult for another supplier to take over. It may be secure against the threats that were tested but exposed through dependencies outside the test scope. Acceptance criteria can express parts of these expectations. They cannot easily represent the entire quality picture.
A reference model also reveals what is missing
Testing against acceptance criteria primarily confirms whether specified expectations have been met. A system-quality assessment must also ask whether the relevant expectations were specified in the first place. A reference model should prompt questions such as:
- Have the relevant quality characteristics been considered?
- Are the important stakeholder groups represented?
- Have operational and lifecycle concerns been included?
- Are dependencies and consequences of failure understood?
- Have future changes in scale and use been considered?
- Which assumptions remain unvalidated?
This leads to an important distinction:
Acceptance criteria confirm what has been specified. A reference model also helps reveal what has not been specified.
That may be one of its most valuable contributions. It enables us to challenge the completeness of the test basis rather than only derive tests from it.
A taxonomy is not yet a reference model
Established quality models describe characteristics such as functional suitability, reliability, performance efficiency, usability, security, compatibility, maintainability and portability. These provide a useful starting point.
But a list of characteristics is not automatically an assessment reference model. For each quality area, a usable model should help us understand what the characteristic means, which questions should be asked, what strong or weak quality might look like, what evidence could support a conclusion and how the operating context affects the judgment.
Without this, the model tells us where to look but not how to judge what we find. It remains a classification system rather than an assessment foundation.
Reference model and assessment model
The distinction between a reference model and an assessment model is important. The reference model describes the quality space against which the system is assessed. The assessment model describes how the assessment is performed.
The reference model might identify recoverability as an important aspect of reliability. The assessment model would explain how to determine whether recoverability is adequate, perhaps through recovery tests, operational exercises, incident data, architecture analysis and interviews.
The reference model tells us that recoverability matters. The assessment model tells us how to reach a defensible conclusion about it.
A possible structure could be:
- quality characteristic;
- sub-characteristic;
- assessment question;
- evidence requirement;
- indicator or measure;
- acceptance criterion.
For example:
- Reliability
- Recoverability
- Can acceptable service be restored after failure?
- Evidence from recovery tests and operational exercises
- Recovery time, data loss and restoration success
- Service restored within 30 minutes with no more than five minutes of data loss
- Recovery time, data loss and restoration success
- Evidence from recovery tests and operational exercises
- Can acceptable service be restored after failure?
- Recoverability
The acceptance criterion remains important, but it becomes one element within a wider reasoning framework.
System quality is contextual
A reference model for system quality cannot describe only the intrinsic properties of the system. It must also address fitness for context. The same system may be suitable for one purpose and unsuitable for another. A service may be adequate for an internal pilot but unacceptable for a business-critical process.
The assessment therefore needs two connected perspectives. The first concerns the qualities the system possesses: functionality, reliability, security, usability, maintainability, operability and performance.
The second concerns whether those qualities are adequate for the intended context: business criticality, stakeholder needs, user groups, operational environment, expected scale, risk appetite, regulatory obligations, dependencies and consequences of failure.
A technically strong system can still be unfit for a particular context. A technically modest system can be entirely adequate for limited, low-risk use. Quality cannot be judged independently of purpose.
The model must support judgment and uncertainty
Acceptance criteria are commonly propositions that can be verified. A reference model must support a broader judgment.
It may need to conclude that a quality characteristic is strong, weak, adequate only under certain conditions, acceptable for limited use, insufficiently evidenced or dependent on unverified assumptions.
It must also distinguish between the quality of the system and confidence in the conclusion.
A system may appear reliable because no serious failures have been observed. But if it has never been exposed to realistic load or failure conditions, confidence in that conclusion should be low.
A failed test can provide strong evidence of weakness. A passed test may provide weak evidence of quality when the environment, data or scope were unrepresentative.
The model must therefore connect quality characteristics not only to criteria, but also to evidence and uncertainty.
Perhaps the real limitation is not the test process
When testing is criticized for being too narrow, attention often turns to the process. We conclude that testing should begin earlier, include more automation, involve more stakeholders or extend into production. These may all be valid improvements.
But perhaps testing often appears narrow because the model against which the system is judged is narrow. When requirements and acceptance criteria define the entire quality horizon, testing can only confirm expectations that have already been made visible.
A stronger reference model would change the questions we ask before it changes the tests we perform. It would expose missing quality concerns, broaden the evidence we consider and help us distinguish between meeting criteria and being fit for purpose.
The challenge may therefore not merely be to improve the test process. It may be to develop a sufficiently rich reference model for system quality.
Because before we can assess whether a system is good, we need a credible answer to a more fundamental question:
What, exactly, are we assessing it against?