Benchmarking Clojure Code with Criterium

How to benchmark Clojure code with Criterium, interpret JVM-aware timing output, and avoid misleading microbenchmarks.

Criterium is the standard Clojure answer when you want a serious microbenchmark instead of a quick time call. It exists because benchmarking on the JVM is full of traps: warm-up effects, JIT compilation, GC state, measurement overhead, and code that looks fast or slow only because the benchmark was poorly shaped.

Used well, Criterium gives you a better signal about a narrow question such as “is this transducer pipeline faster than the intermediate-sequence version?” Used badly, it gives you very precise numbers for a question that was never meaningful in the first place.

What Criterium Actually Solves

The project’s current README still describes 0.4.6 as the stable release and the newer 0.5.x line as alpha. That matters because many old pages teach benchmarking as if “run it a few times” were enough. Criterium goes further by addressing several JVM-specific problems:

  • repeated sampling instead of one-shot timings
  • a warm-up period so the JIT can optimize code paths
  • GC handling to reduce noise from previous program state
  • statistical reporting instead of a single vanity number
  • overhead estimation for very small benchmarks

That does not make every benchmark valid. It just removes several common sources of self-deception.

Start With a Modern Project Setup

For a stable setup today, prefer deps.edn and the stable Criterium release:

1{:paths ["src" "test"]
2 :deps {org.clojure/clojure {:mvn/version "1.12.0"}
3        criterium/criterium {:mvn/version "0.4.6"}}}

If you are working in an established Leiningen project, keep it there. But for new examples, deps.edn is the clearer default.

Benchmark a Concrete Question

Good benchmarking starts with a narrow hypothesis. For example:

“Does using transduce help this hot path enough to matter, or is the simpler sequence pipeline already fine?”

 1(ns myapp.bench
 2  (:require [criterium.core :as crit]))
 3
 4(def data (vec (range 100000)))
 5
 6(defn mapped-then-summed []
 7  (reduce + (map inc data)))
 8
 9(defn transduced-sum []
10  (transduce (map inc) + data))
11
12(comment
13  (crit/quick-bench (mapped-then-summed))
14  (crit/bench (transduced-sum)))

This is a decent Criterium question because:

  • the workload is clear
  • the input shape is stable
  • the comparison is between two implementations of the same job
  • the benchmark isolates computation rather than network or storage behavior

The goal is not to “win the benchmark.” The goal is to learn whether the trade-off is real enough to justify the less obvious implementation.

Read the Output Like an Engineer

A typical Criterium result gives you:

  • mean execution time
  • standard deviation
  • lower and upper quantiles
  • overhead information

That output should change your habits:

  • do not stare only at the mean
  • check whether the spread is wide
  • be suspicious of tiny differences on noisy workloads
  • remember that a microbenchmark can show a real local gain that still does not matter at system level

If two versions are close and the simpler one is easier to understand, the benchmark result may be telling you to stop optimizing.

Use quick-bench and bench for Different Jobs

Use quick-bench during exploration:

1(crit/quick-bench (transduced-sum))

Use bench when the comparison is important enough to spend more time on:

1(crit/bench (transduced-sum))

That split is practical. quick-bench helps you narrow candidates. bench is for results you want to trust more before making a code-level decision.

Benchmark Design Still Matters More Than The Tool

Criterium cannot rescue a bad experiment.

Common failure modes include:

  • benchmarking code that performs network I/O or database calls
  • benchmarking unrealistically tiny inputs
  • ignoring allocations when allocations are the real production issue
  • comparing two implementations with different semantics
  • measuring cold-start behavior when the production question is steady-state throughput

If your actual problem is API latency under load, queue backpressure, or cache churn, use system-level tools such as tracing, metrics, profiling, and load testing. Criterium is for microbenchmarks, not for replacing observability.

Watch Out For Very Fast Expressions

For very small expressions, measurement overhead can distort the result. Criterium explicitly documents this and exposes overhead estimation utilities for that reason. If you are benchmarking something extremely fast, the safer move is usually one of these:

  • benchmark a larger batch of work
  • reformulate the question around a meaningful unit
  • use profiling and allocation analysis alongside benchmarking

That is more honest than pretending a few nanoseconds in a noisy benchmark settle a real engineering decision.

A Better Benchmarking Workflow

    flowchart LR
	    A["Performance Complaint"] --> B["Form A Narrow Hypothesis"]
	    B --> C["Isolate The Hot Computation"]
	    C --> D["Run quick-bench"]
	    D --> E["Run bench On Final Candidates"]
	    E --> F["Validate With Profiling Or System Metrics"]

The key thing to notice is the last step. A microbenchmark result should usually be checked against reality before it changes a production design.

When Criterium Is The Right Tool

Use Criterium when you need to compare:

  • alternate implementations of a hot pure function
  • data-structure choices
  • transducer versus intermediate-sequence pipelines
  • serialization or parsing approaches in a controlled local context
  • allocation-sensitive transformations

It is a weaker fit when the dominant cost is:

  • I/O
  • remote dependencies
  • contention across the whole process
  • database query planning
  • end-user latency across multiple services

Key Takeaways

  • Criterium exists to make JVM microbenchmarks less misleading.
  • The stable release line is still 0.4.6, while 0.5.x is documented as alpha in the project README.
  • Ask a narrow engineering question before you benchmark anything.
  • Read quantiles and spread, not only the mean.
  • Use system-level measurement tools when the bottleneck is not a local computation.

References and Further Reading

Ready to Test Your Knowledge?

Loading quiz…
Revised on Thursday, April 23, 2026