Field Notes from Quality & Delivery Transformations

Category: Product Quality

Field Note: What Would We Assess a System Against?
In an earlier field note, I explored the idea that testing might be understood as a form of assessment. Testing generates evidence. Assessment uses evidence to reach a judgment. But this raises a more difficult question:

If we want to assess the quality of a system, what would we assess it against?

For a capability assessment, the answer is relatively clear. A reference model describes the capability we expect to find. An assessment model explains how evidence should be collected, interpreted and judged.

For system quality, the equivalent reference model is less obvious. Requirements and acceptance criteria may seem to provide the answer. They describe what the system should do and the conditions under which it can be accepted.

But are they sufficient as a reference model for system quality? I do not think they are.

Requirements are not a complete model of quality

Requirements describe selected expectations. They may address functionality, performance, security, usability and other quality concerns. In a well-developed specification, they can provide a substantial basis for testing and acceptance.

But requirements are still the result of selection. Some expectations are documented. Others remain implicit. Some stakeholders are represented more strongly than others. Some concerns are recognised early, while others become visible only during operation.

A system can satisfy its documented requirements and still be difficult to operate, hard to maintain, inaccessible, fragile under unusual conditions, unsuitable for future growth or inappropriate for its business context.

This does not necessarily mean that the requirements were badly written. It means that requirements describe what has been specified, not necessarily everything that matters.

Acceptance criteria make selected expectations testable

Acceptance criteria make expectations concrete. For example:

Ninety-five per cent of transactions must complete within two seconds.

This gives us something observable and testable. We can determine whether the system meets the threshold. But it does not answer the wider questions:
- Why are two seconds acceptable?
- Which transactions and operating conditions are included?
- What happens to the slowest five per cent?
- Will the threshold remain adequate as demand grows?
- Does it reflect actual user and business needs?
The criterion defines a threshold. It does not explain the significance of that threshold.

The same limitation applies to qualities that are difficult to reduce to a pass-or-fail statement. A system may be maintainable for its current team but difficult for another supplier to take over. It may be secure against the threats that were tested but exposed through dependencies outside the test scope. Acceptance criteria can express parts of these expectations. They cannot easily represent the entire quality picture.

A reference model also reveals what is missing

Testing against acceptance criteria primarily confirms whether specified expectations have been met. A system-quality assessment must also ask whether the relevant expectations were specified in the first place. A reference model should prompt questions such as:
- Have the relevant quality characteristics been considered?
- Are the important stakeholder groups represented?
- Have operational and lifecycle concerns been included?
- Are dependencies and consequences of failure understood?
- Have future changes in scale and use been considered?
- Which assumptions remain unvalidated?
This leads to an important distinction:

Acceptance criteria confirm what has been specified. A reference model also helps reveal what has not been specified.

That may be one of its most valuable contributions. It enables us to challenge the completeness of the test basis rather than only derive tests from it.

A taxonomy is not yet a reference model

Established quality models describe characteristics such as functional suitability, reliability, performance efficiency, usability, security, compatibility, maintainability and portability. These provide a useful starting point.

But a list of characteristics is not automatically an assessment reference model. For each quality area, a usable model should help us understand what the characteristic means, which questions should be asked, what strong or weak quality might look like, what evidence could support a conclusion and how the operating context affects the judgment.

Without this, the model tells us where to look but not how to judge what we find. It remains a classification system rather than an assessment foundation.

Reference model and assessment model

The distinction between a reference model and an assessment model is important. The reference model describes the quality space against which the system is assessed. The assessment model describes how the assessment is performed.

The reference model might identify recoverability as an important aspect of reliability. The assessment model would explain how to determine whether recoverability is adequate, perhaps through recovery tests, operational exercises, incident data, architecture analysis and interviews.

The reference model tells us that recoverability matters. The assessment model tells us how to reach a defensible conclusion about it.

A possible structure could be:
- quality characteristic;
- sub-characteristic;
- assessment question;
- evidence requirement;
- indicator or measure;
- acceptance criterion.
For example:
- Reliability
  - Recoverability
    
    Can acceptable service be restored after failure?
    
    Evidence from recovery tests and operational exercises
    
    Recovery time, data loss and restoration success
    
    Service restored within 30 minutes with no more than five minutes of data loss
The acceptance criterion remains important, but it becomes one element within a wider reasoning framework.

System quality is contextual

A reference model for system quality cannot describe only the intrinsic properties of the system. It must also address fitness for context. The same system may be suitable for one purpose and unsuitable for another. A service may be adequate for an internal pilot but unacceptable for a business-critical process.

The assessment therefore needs two connected perspectives. The first concerns the qualities the system possesses: functionality, reliability, security, usability, maintainability, operability and performance.

The second concerns whether those qualities are adequate for the intended context: business criticality, stakeholder needs, user groups, operational environment, expected scale, risk appetite, regulatory obligations, dependencies and consequences of failure.

A technically strong system can still be unfit for a particular context. A technically modest system can be entirely adequate for limited, low-risk use. Quality cannot be judged independently of purpose.

The model must support judgment and uncertainty

Acceptance criteria are commonly propositions that can be verified. A reference model must support a broader judgment.

It may need to conclude that a quality characteristic is strong, weak, adequate only under certain conditions, acceptable for limited use, insufficiently evidenced or dependent on unverified assumptions.

It must also distinguish between the quality of the system and confidence in the conclusion.

A system may appear reliable because no serious failures have been observed. But if it has never been exposed to realistic load or failure conditions, confidence in that conclusion should be low.

A failed test can provide strong evidence of weakness. A passed test may provide weak evidence of quality when the environment, data or scope were unrepresentative.

The model must therefore connect quality characteristics not only to criteria, but also to evidence and uncertainty.

Perhaps the real limitation is not the test process

When testing is criticized for being too narrow, attention often turns to the process. We conclude that testing should begin earlier, include more automation, involve more stakeholders or extend into production. These may all be valid improvements.

But perhaps testing often appears narrow because the model against which the system is judged is narrow. When requirements and acceptance criteria define the entire quality horizon, testing can only confirm expectations that have already been made visible.

A stronger reference model would change the questions we ask before it changes the tests we perform. It would expose missing quality concerns, broaden the evidence we consider and help us distinguish between meeting criteria and being fit for purpose.

The challenge may therefore not merely be to improve the test process. It may be to develop a sufficiently rich reference model for system quality.

Because before we can assess whether a system is good, we need a credible answer to a more fundamental question:

What, exactly, are we assessing it against?
2026-07-04

Field Note: What If Testing Is Just a Form of Assessment?

Over the past few weeks, while working on assessment models for QA capability and Data Quality, I stumbled upon an idea that has been difficult to shake off.

The observation itself is surprisingly simple: the structure of a QA capability assessment and a system test is essentially the same. The only thing that changes is the object being assessed.

At first, this felt like an interesting analogy. The more I thought about it, however, the more it began to feel like something more fundamental.

Starting with System Testing

Most of us who have worked in testing have been taught a familiar model:

Requirements define what the system should do.
Test cases verify those requirements.
Defects are reported.
A release decision is made.

Nothing controversial there.

But viewed through a different lens, the process can be described in another way:

Requirements define a reference model.
Testing is an assessment process.
Test results are evidence.
Defects are findings.
The test report is an evaluation.
The release recommendation is decision support.

In other words, system testing can be described as a quality assessment of a software system. That may sound like a semantic distinction, but it has interesting consequences.

Looking Beyond Testing

Around the same time, I was working with assessment models for QA capability and Data Quality.

Those assessments typically look like this:

Define a reference model.
Gather evidence through interviews, reviews and metrics.
Identify gaps.
Evaluate current capability.
Recommend improvements.

Structurally, the process is identical. The only difference is that the object being assessed is no longer a software system. Instead of assessing a product, we assess:

A QA organization
A data platform
An operating model
A security posture
An architecture

The pattern remains the same:

Object → reference model → assessment activities → evidence → findings → evaluation → decision support

Once you see it, it becomes difficult to unsee.

Is QA Already Solving This?

One could argue that Quality Assurance and later Quality Engineering have already attempted to broaden testing into something larger. There is certainly truth in that:

Testing evolved from defect detection. QA expanded the focus toward process quality. Quality Engineering expanded it further toward prevention, automation and quality built into delivery. Yet even Quality Engineering remains largely anchored in the vocabulary of testing:

Test automation
Test coverage
Test strategy
Test management
Test environments

The underlying mental model often remains:

Software → testing → quality

What if the model should instead be:

Object → assessment → evidence → quality evaluation

That is a subtle but important shift.

A Different Perspective on Testing

If assessment becomes the primary concept, testing becomes one assessment technique among many.

Consider the following examples:

Object	Assessment Technique
Source Code	Unit Testing
Software System	System Testing
Architecture	Architecture Review
Data Platform	Data Profiling
Security Posture	Penetration Testing
QA Organization	Interviews and Document Review

Different techniques, same assessment pattern.

This perspective also changes the questions we ask.

Instead of how many test cases have we executed, we ask: have we collected sufficient evidence?

Instead of what is our test coverage, we ask: which quality attributes have been assessed?

Instead of how many defects remain, we ask: what is our confidence level and residual risk?

These are fundamentally different conversations. They are also conversations that executives tend to understand much better than traditional testing metrics.

A Broader Quality Assessment Framework?

Perhaps the most interesting implication is organizational rather than technical. Many consulting organizations offer services such as:

System testing
Data quality assessments
Security reviews
Architecture assessments
QA capability assessments

These are often positioned as separate offerings with separate methodologies.

But what if they are all instances of the same underlying discipline? A generic quality assessment framework could potentially provide:

A common assessment model
A common reference model structure
A common evidence model
A common reporting approach
A common maturity model

The assessed object changes. The assessment pattern does not.

An Open Question

I do not yet know whether this idea is genuinely new or merely a reframing of concepts that already exist within Quality Engineering.

What I do know is that it has changed the way I think about testing. For years, I viewed testing as a discipline concerned with finding defects.

Today, I increasingly see testing as one technique for generating evidence about quality. That may sound like a small distinction. I suspect it is not.

If the observation holds, then perhaps testing is not the overarching discipline after all. Perhaps quality assessment is. And testing is simply one of its most visible techniques.

2026-06-20

Field Note: Revisiting Quality
For a long time, I have had a somewhat complicated relationship with the word “quality.” Like many people who started their careers in testing and QA, I gradually became uncomfortable with the label. Not because quality is unimportant, but because the term often seemed too narrow.

When people hear “quality,” they frequently think about testing. When they hear “testing,” they often think about defects. And when they think about defects, the conversation quickly becomes operational rather than strategic.

Over time, I found myself increasingly interested in topics that appeared to sit outside the traditional quality domain:
- Architecture
- Data
- Governance
- Operating models
- Organizational capability
- Decision-making
These seemed like larger and more interesting questions. Or so I thought.

An Unexpected Observation

Recently, while working with different kinds of assessments, I started noticing a recurring pattern. The object being assessed varied considerably:
- A software system
- A data platform
- An organizational capability
- A process
- An architecture
Yet the assessment itself always seemed to follow the same structure.

First, there was an idea of what “good” looked like. Then there was a way of collecting evidence. Then findings were identified. Finally, a judgment was made about fitness, capability, risk or readiness. The terminology changed, but the pattern did not.

The Question Behind the Question

At first, I thought I was moving away from quality. Now I am no longer sure. Perhaps I was not moving away from quality at all. Perhaps I was moving away from a narrow interpretation of quality. Traditional quality discussions often focus on products:
- Does the software work?
- Does it meet requirements?
- How many defects remain?
Important questions, certainly. But they are not the only quality questions.

We also ask:
- Is the architecture sustainable?
- Is the data trustworthy?
- Is the organization capable?
- Is the process effective?
- Is the operation resilient?
These are quality questions too. We simply tend to classify them differently.

Quality as Fitness for Purpose

The more I think about it, the more useful the classic definition of quality becomes:

Fitness for purpose.

The phrase is deceptively simple. It does not limit quality to software. It does not limit quality to testing. It does not even limit quality to technology. Anything can be evaluated in terms of fitness for purpose:
- A system.
- A dataset.
- A process.
- An organization.
- An operating model.
- An architecture.
The assessed object changes, but the underlying question remains remarkably consistent.

Assessment as the Missing Link

This realization led me to another thought: perhaps quality is not primarily about testing. Perhaps quality is primarily about assessment. Assessment is the mechanism through which we determine fitness for purpose.

Testing is one assessment technique. Reviews are another. Interviews are another. Metrics analysis is another. Profiling is another.

The specific techniques matter less than the purpose they serve. They generate evidence. That evidence supports an evaluation. That evaluation supports a decision.

Viewed this way, testing becomes part of a much larger family of activities.

A Wider Perspective

Ironically, the more I tried to move away from quality, the more frequently I encountered it. Not the operational version of quality, but the broader version. The version concerned with evidence, confidence, risk and fitness for purpose. The version that appears whenever people need to make informed decisions about systems, data, organizations or processes.

Perhaps quality is not a niche after all. Perhaps it only appears that way when viewed through the lens of testing.

When viewed through the lens of assessment, quality seems to show up almost everywhere.

An Open Thought

I am not yet sure where this line of reasoning leads. It may turn out to be nothing more than a useful way of organizing ideas. Or it may suggest that testing, quality assurance, data quality, architecture reviews and capability assessments are all manifestations of the same underlying pattern. For now, I am content to leave that question open.

What I find interesting is that an attempt to move beyond quality has unexpectedly led me back to it. Only this time, it looks much bigger than before.
2026-06-19

Category: Product Quality

Field Note: What Would We Assess a System Against?

Requirements are not a complete model of quality

Acceptance criteria make selected expectations testable

A reference model also reveals what is missing

A taxonomy is not yet a reference model

Reference model and assessment model

System quality is contextual

The model must support judgment and uncertainty

Perhaps the real limitation is not the test process

Field Note: What If Testing Is Just a Form of Assessment?

Starting with System Testing

Looking Beyond Testing

Is QA Already Solving This?

A Different Perspective on Testing

A Broader Quality Assessment Framework?

An Open Question

Field Note: Revisiting Quality

An Unexpected Observation

The Question Behind the Question

Quality as Fitness for Purpose

Assessment as the Missing Link

A Wider Perspective

An Open Thought