Benchmarks

Benchmarks

Maestro has been extensively benchmarked across diverse quantum workloads, covering single-circuit execution, batched multi-tenant scenarios, GPU acceleration, and distributed quantum computing simulation. Full details are available in the Maestro paper (arXiv:2512.04216).

Summary of Results

Benchmark Key Result
Automode vs QCSim 9.2× speedup on heterogeneous 90-circuit batch
Automode vs Qiskit 8.4× speedup on the same batch
Automode vs Qiskit Auto 1.8× speedup — deeper optimization and broader circuit detection
MPS fidelity Adaptive bond dimension achieves ≥ 0.95 mirror fidelity
GPU acceleration Significant speedup for wide state vector circuits
DQC simulation Circuits with 1000+ qubits simulated via p-block mode
External validation NPL (UK National Physical Laboratory) confirmed competitive performance

General Benchmarks

Circuits tested include GHZ states (entanglement generation), QFT (interference), random Clifford+T (universal gate sets), and QAOA layers of increasing depth (variational workloads).

MPS Simulation

MPS simulation uses adaptive bond dimension (χ) to balance speed and accuracy:

  1. Start with a low bond dimension (χ = 4)
  2. Simulate the circuit and measure mirror fidelity
  3. Double χ until fidelity ≥ 0.95

This ensures quality standards while minimizing runtime. Maestro’s MPS engine (QCSim) is benchmarked against Qiskit Aer’s MPS, with results showing competitive or superior performance across circuit widths.

GPU Acceleration

GPU offloading provides significant speedups for state vector simulations, but the advantage depends on circuit characteristics:

  • Wide circuits (many qubits): GPU advantage is clear
  • Narrow circuits: CPU may be faster due to GPU transfer overhead
  • GPU acceleration applies to both state vector and MPS backends

Currently, GPU offloading is a user-configurable toggle. Future versions will automate this decision based on predicted transfer overhead.

Batched Circuit Execution (Torture Test)

The most representative benchmark is the Torture Test — a batch of 90 heterogeneous circuits designed to span the full complexity spectrum:

  • Clifford circuits: Composed entirely of Clifford gates (ideal for stabilizer simulation)
  • Low-entanglement circuits: GHZ states, shallow QAOA (p=1), hardware-efficient ansätze (ideal for MPS)
  • High-entanglement circuits: Densely entangled circuits requiring state vector or high-bond-dimension MPS

Comparison

Four policies were compared:

Policy Strategy
QCSim State vector for ≤30 qubits, MPS otherwise (χ tuned for fidelity ≥ 0.95)
Qiskit Same strategy using Qiskit Aer backends
Qiskit Auto Qiskit Aer’s automatic backend selection
Maestro Auto Feature-based prediction selecting optimal backend per circuit

Results

Maestro Auto achieved 9.2× speedup over QCSim, 8.4× over Qiskit, and 1.8× over Qiskit Auto.

The key insight: in heterogeneous batches, a few highly entangled circuits typically dominate the total runtime. Maestro addresses this by matching every individual circuit to its optimal backend — routing Clifford circuits to the stabilizer simulator (orders-of-magnitude faster), MPS-friendly circuits to MPS, and reserving state vector for circuits that truly need it. The prediction overhead is negligible compared to execution time.

Distributed Quantum Computing

Maestro’s p-block simulation mode enables simulating distributed quantum computing (DQC) by partitioning circuits across multiple virtual QPUs (vQPUs) connected through entanglement links:

  • Qubits are allocated across vQPUs to minimize inter-node entangling gates
  • Each vQPU runs its own local simulator, reducing peak memory requirements
  • Results are validated against monolithic state vector simulations using Hellinger fidelity

Key Findings

  • Single vQPU: Performance matches monolithic simulation exactly
  • Multiple vQPUs: Communication overhead introduces runtime cost, but enables simulation of circuits far beyond single-device limits
  • Deep circuits (1000+ qubits): Achievable with constant runtime when the right simulator is selected per block
  • Limitation: High-entanglement circuits (e.g., W states) become bottlenecked by entanglement sharing — efficient protocols are needed for general-purpose DQC

External Validation

Maestro was independently benchmarked by the UK National Physical Laboratory (NPL) as part of the M4Q program. Their assessment:

"[The] Maestro framework [is] well-suited for HPC environments due to [its] ability to exploit parallelism through multithreading and multiprocessing. Features such as Maestro Auto for batched execution and distributed simulation strategies enable efficient scaling across clusters and reduce overhead compared to single-threaded runs."

HPC Integration

Maestro is integrated into two major European HPC centers:

  • CESGA (Spain): Integrated via the CUNQA platform
  • LRZ (Germany): Integrated as a QDMI (Quantum Device Management Interface) backend

For full benchmark methodology and figures, see the paper: Maestro: Intelligent Execution for Quantum Circuit Simulation (arXiv:2512.04216).