The evaluation imperative: Measuring what matters in AI

Our Thinking | Insight

Published

10 April, 2026

Authors

Joshua Sidgwick Charlotte Bradley Heidi Wilcoxon

5 Minute Read

Share Insight

Idea In Brief

Don’t confuse AI adoption with real impact

Uptake is easy to track, but as a metric it can hide poor-quality use and even incentivise low-value “workslop” that only creates more work.

Always evaluate AI across multiple domains

A useful evaluation spans adoption capability, system performance, governance and risk, value delivery, and competitiveness, tailored to your goals and context.

Use systems thinking to measure real change

Treat AI as part of a wider system of people, processes, data, and governance, and test for flow-on effects like shifting bottlenecks and overreliance on outputs.

Do you know how your organisation will measure the impact of AI?

Right now, most don’t. Some organisations, for fear of being left behind by their competitors, are rolling out AI tools rapidly, without much thought as to how they will measure the impacts it is having. Other organisations are holding back, concerned by the risks, both regulatory and reputational. This concern is often underpinned by a strong intuition that the reality of these tools just doesn’t match the hype.

Both these organisational archetypes are missing the opportunity to systematically measure the impact that AI is having for their organisation, so that they can best realise the gains and mitigate or prevent the harms. After all, AI is rapidly changing how we work. It is driving economy-wide changes. It is a complex system change, and it is happening now.

It is imperative that organisations understand the impacts it is having to best unlock the value that it can produce and to ensure that investments in AI tools actually produce value for money. As we explain in this article, this often requires a systems-thinking approach.

Adoption metrics are just the tip of the iceberg

Where organisations are conducting formal evaluations of the impact of AI, many focus on uptake of AI tools. This is unsurprising. And it is hardly a novel problem: a common challenge in all evaluations is that it tends to be much easier to measure inputs and activities than it is to measure outcomes and impacts.

One problem with this is that uptake or adoption of AI tools tells you nothing about whether these tools are being used well: for example, whether they are enhancing productivity, improving quality and lifting organisational capability.

More significantly, incentives matter. An overemphasis on measuring uptake of AI tools risks being counterproductive. It can lead to rises in so-called "workslop: – a now-common term in the popular lexicon – where AI is used to do a lot of stuff in organisations that does not create real value, and which can actually create a lot of unnecessary work for employees, who have to navigate and interpret AI-generated content.

None of this is to say that adoption and uptake are not important – especially in risk averse organisations beset by institutional inertia – but it is only one piece of the puzzle.

AI evaluations should focus on a range of domains

Evaluating the impact of AI requires considering a range of domains. This recognises that AI is a complex technology, with potential to greatly enhance individual and organisational productivity, but also with considerable risks if it is not ethically and responsible deployed.

The appropriate domains will differ depending on the context, but the list below provides a rough guide for the general focus areas that AI Evaluations need to consider.

They can be articulated and operationalised in different ways depending on the objective of your evaluation; for example, as a maturity model (if you are conducting a developmental evaluation) or in terms of a value chain (if your concern is to make the case for further investment).

Evaluation domain

Focus question

Example methods or metrics

Adoption

Confidence, experience and capability

Are employees adopting and effectively using AI tools in their daily work?

User analytics give insight into metrics like frequency of use and feature adoption.

User surveys supplement these numbers with qualitative insights into confidence, experience and perceived usefulness.

Pairing quantitative usage data with targeted survey questions gives a more complete picture of how adoption is translating into capability.

System performance

Reliability and integration

Do the tools function reliably in your organisation’s operating context?

Benchmarks and error rate tracking provide objective measures of reliability, but these need to be contextualised within your organisation's operating environment.

A/B testing or time-on-task comparisons can demonstrate whether tools are genuinely improving workflows.

Governance & risk

Governance, risk and responsible use

Is the use of the tools being managed in a safe, responsible and complaint way?

Internal policy reviews and compliance checks establish whether use is occurring within agreed boundaries.

Stakeholder interviews and awareness assessments help gauge whether those boundaries are understood in practice.

Value

Strategic alignment, value and impact

What is your current strategic intent and are AI tools delivering outcomes aligned with this?

Outcome mapping against objectives provides a qualitative picture of alignment.

Quantitative measures such as increases in efficiency or productivity can provide greater rigour.

Subjective measures of quality or customer satisfaction can round out the picture.

Competition

Competitiveness and future readiness

Are your adoption and use of AI tools competitive given your peers in the sector?

Sector benchmarking through maturity models or industry surveys offers a relative positioning.

Capability gap analysis – mapping your current AI use against emerging applications in your sector – can highlight where you're falling behind or pulling ahead.

The importance of a systems-thinking approach

Evaluating the impact of AI in your organisation is often best enabled through a systems-thinking approach. This involves treating a system itself as the object of an evaluation, rather than a single program, tool, or intervention.

This approach is well suited to AI because of the complex ways that AI tools can affect organisational outcomes. A systems-level evaluation begins from the recognition that AI models don't exist in a vacuum. AI models are deployed in applications, which are deployed in systems that include the people, processes, data and governance arrangements that shape how AI is actually used.

A systems-thinking approach to an AI evaluation involves asking questions about how people adapt their workflows, how decision-making processes change, whether new risks or dependencies emerge, and whether the intended benefits actually materialise in practice.

Our experience conducting system-level evaluations

System-level evaluations are a core part of Nous’ repertoire. We have considerable experience conducting them for clients across the private, not-for-profit, and government sectors. We have evaluated some of society's most complex policy reforms, programs and investments. For instance, we have evaluated:

Large-scale policy and system changes in the health sector, such as the introduction of new medical technologies and funding that affect people's health, experiences, and lives.
Considerable investments in energy systems, such as new sources of power.
Complex global systems like the introduction of earth-observing technologies on industries and cities across the Asia-Pacific region, and the value of Australia's space sector.
We have also conducted an evaluation of the world's largest whole-of-government Copilot trial.

Our approach to evaluations is documented here.

What are you waiting for?

System-level evaluation may sound complicated, but at a practical level it involves being attentive to the various flow on effects of the use of AI tools. For example, if you have recently deployed Copilot – or Claude, ChatGPT Enterprise, or something similar – and are seeking to understand its effects, a systems-based approach involves asking questions like:

Is AI-assisted work actually better, or just faster?
Does what you're seeing connect to the outcomes your organisation actually cares about?
Have efficiency gains in one part of a workflow created a bottleneck somewhere else?
Are employees sufficiently critical of Copilot outputs (e.g., meeting and document summaries)?
Is there evidence that Copilot is becoming a substitute rather than a complement for thought?

By seriously considering questions like these, and embracing systems thinking, organisations can begin to understand the true impact of their AI tools.

Get in touch to discuss how your organisation can evaluate the impact of AI.

Connect with Joshua Sidgwick, Charlotte Bradley, and Heidi Wilcoxon on LinkedIn.

This is the second article in our series on AI evaluation. Read the first article here and the third article here.

Authors

Joshua Sidgwick

Director

With a passion for data and strategy, Joshua excels in partnering on high stakes problems. From delivering pandemic statistics to pioneering national data infrastructure, he is experienced in translating data into strategic advantage.

Charlotte Bradley

Manager

Charlotte blends analytical rigour with creativity and compassion to deliver strategic outcomes across sectors.

Heidi Wilcoxon

Principal

Heidi has 20 years of experience in public, private and not-for-profit environments in Australia, the UK and the Middle East. Heidi is an expert in public health who has worked as a project lead and manager.

Featured

World leading higher education benchmarking

AI is changing the way we work - let us help you unlock its value

Featured

Towards a sustainable disability services sector report

Shaping the future of digital health

The evaluation imperative: Measuring what matters in AI

Published

Authors

RELATED TOPICS

Share Insight

Idea In Brief

Don’t confuse AI adoption with real impact

Always evaluate AI across multiple domains

Use systems thinking to measure real change

Adoption metrics are just the tip of the iceberg

AI evaluations should focus on a range of domains

The importance of a systems-thinking approach

Our experience conducting system-level evaluations

What are you waiting for?

Authors

Joshua Sidgwick

Director

With a passion for data and strategy, Joshua excels in partnering on high stakes problems. From delivering pandemic statistics to pioneering national data infrastructure, he is experienced in translating data into strategic advantage.

Charlotte Bradley

Manager

Charlotte blends analytical rigour with creativity and compassion to deliver strategic outcomes across sectors.

Heidi Wilcoxon

Principal

Heidi has 20 years of experience in public, private and not-for-profit environments in Australia, the UK and the Middle East. Heidi is an expert in public health who has worked as a project lead and manager.

Latest Insights