Skip to content

Advanced Usage

ModestBench provides two engines with different performance characteristics and statistical approaches.

Choose an engine based on your requirements:

Terminal window
# Tinybench engine (default) - fast development iteration
modestbench --engine tinybench
# Accurate engine - high-precision measurements
node --allow-natives-syntax ./node_modules/.bin/modestbench --engine accurate

Both engines now use IQR (Interquartile Range) outlier removal to filter extreme values caused by:

  • Garbage collection pauses
  • System interruptions
  • Background processes
  • OS scheduler variations

This results in more stable and reliable measurements compared to raw statistical analysis.

The accurate engine provides enhanced statistical analysis:

  1. V8 Optimization Guards: Uses V8 intrinsics (%NeverOptimizeFunction) to prevent JIT compiler interference with measurements
  2. IQR Outlier Removal: Automatically removes extreme outliers (beyond Q1 - 1.5×IQR and Q3 + 1.5×IQR)
  3. Comprehensive Statistics:
    • Mean, min, max execution times
    • Standard deviation and variance
    • Coefficient of Variation (CV): Measures relative variability (stdDev / mean × 100)
    • 95th and 99th percentiles
    • Margin of error (95% confidence interval)

The CV metric helps assess benchmark quality:

CV < 5% → Excellent (very stable)
CV 5-10% → Good (acceptable variance)
CV 10-20% → Fair (consider more samples)
CV > 20% → Poor (investigate noise sources)

Example output showing CV:

Terminal window
$ modestbench --engine accurate --allow-natives-syntax --reporter json
{
"name": "Array.push()",
"mean": 810050, // nanoseconds
"stdDev": 19842,
"cv": 2.45, // 2.45% - excellent stability
"marginOfError": 0.024,
"p95": 845200,
"p99": 862100
}

Real-world comparison using examples/bench:

Terminal window
# Tinybench (fast iteration)
$ modestbench --engine tinybench --reporter json
# Typical run time: 3-5 seconds for 5 benchmark files
# Accurate (high precision)
$ node --allow-natives-syntax ./node_modules/.bin/modestbench --engine accurate --reporter json
# Typical run time: 8-12 seconds for 5 benchmark files

The accurate engine takes ~2-3x longer but provides:

  • More consistent results between runs
  • Better outlier filtering with V8 guards
  • Higher confidence in micro-optimizations
Use CaseRecommended Engine
Development iterationtinybench
CI/CD regression teststinybench
Blog post/publicationaccurate
Library optimizationaccurate
Micro-benchmark comparisonaccurate
Algorithm selectionEither (results typically consistent)

Organize related benchmarks into separate suites with independent setup and teardown:

const state = {
data: [],
sortedData: [],
};
export default {
suites: {
Sorting: {
setup() {
state.data = generateTestData(1000);
},
teardown() {
state.data = [];
},
benchmarks: {
'Quick Sort': () => quickSort(state.data),
'Merge Sort': () => mergeSort(state.data),
'Bubble Sort': () => bubbleSort(state.data),
},
},
Searching: {
setup() {
state.sortedData = generateSortedData(10000);
},
teardown() {
state.sortedData = [];
},
benchmarks: {
'Binary Search': () => binarySearch(state.sortedData, 5000),
'Linear Search': () => linearSearch(state.sortedData, 5000),
'Jump Search': () => jumpSearch(state.sortedData, 5000),
},
},
},
};
  1. setup() - Called once before any tasks in the suite run
  2. Tasks execute - Each task runs with its configured iterations
  3. teardown() - Called once after all tasks complete

ModestBench fully supports asynchronous benchmarks:

export default {
suites: {
'Async Performance': {
benchmarks: {
// Simple async benchmark
'Promise.resolve()': async () => {
return await Promise.resolve('test');
},
// With configuration
'Fetch Simulation': {
async fn() {
const response = await simulateApiCall();
return response.json();
},
config: {
iterations: 100, // Fewer iterations for slow operations
},
},
},
},
},
};
export default {
suites: {
'Database Operations': {
async setup() {
this.db = await connectDatabase();
await this.db.seed();
},
async teardown() {
await this.db.close();
},
benchmarks: {
'Read Query': async function() {
return await this.db.query('SELECT * FROM users LIMIT 100');
},
'Write Query': async function() {
return await this.db.insert({ name: 'Test User' });
},
},
},
},
};

Tags cascade from file → suite → task levels:

export default {
// File-level tags (inherited by all suites and tasks)
tags: ['performance', 'core'],
suites: {
'String Operations': {
// Suite-level tags (inherited by all tasks in this suite)
tags: ['string', 'fast'],
benchmarks: {
// Task inherits: ['performance', 'core', 'string', 'fast', 'regex']
'RegExp Test': {
fn: () => /pattern/.test(str),
tags: ['regex'], // Task-specific tags
},
// Task inherits: ['performance', 'core', 'string', 'fast']
'String Includes': () => str.includes('pattern'),
},
},
'Array Operations': {
tags: ['array', 'slow'],
benchmarks: {
// Task inherits: ['performance', 'core', 'array', 'slow']
'Array spread': () => {
let arr = [];
for (let i = 0; i < 1000; i++) {
arr = [...arr, i];
}
return arr;
},
},
},
},
};
Terminal window
# Run only fast benchmarks
modestbench --tag fast
# Runs: 'RegExp Test', 'String Includes'
# Run string OR array benchmarks
modestbench --tag string --tag array
# Runs: All tasks in 'String Operations' and 'Array Operations'
# Exclude slow benchmarks
modestbench --exclude-tag slow
# Runs: Only 'String Operations' tasks
# Combine: run fast benchmarks except experimental
modestbench --tag fast --exclude-tag experimental

Suite setup() and teardown() only run if at least one task in the suite matches the filter:

export default {
suites: {
'Expensive Setup': {
setup() {
console.log('This only runs if at least one task will execute');
this.expensiveResource = createExpensiveResource();
},
teardown() {
console.log('This only runs if setup ran');
this.expensiveResource.destroy();
},
benchmarks: {
'Fast Task': {
fn() { /* ... */ },
tags: ['fast'],
},
'Slow Task': {
fn() { /* ... */ },
tags: ['slow'],
},
},
},
},
};
Terminal window
# Setup and teardown run (Fast Task matches)
modestbench --tag fast
# Setup and teardown DON'T run (Slow Task excluded)
modestbench --exclude-tag slow

Configure individual tasks with specific settings:

export default {
suites: {
'Custom Configs': {
benchmarks: {
// Default configuration
'Standard Task': () => someOperation(),
// Custom iterations
'High Sample Task': {
fn: () => criticalOperation(),
config: {
iterations: 10000,
warmup: 200,
},
},
// Custom timeout for slow operations
'Slow Operation': {
fn: async () => await slowAsyncOperation(),
config: {
timeout: 60000, // 60 seconds
iterations: 10, // Fewer samples
},
},
},
},
},
};

Use JavaScript config files for dynamic configuration:

modestbench.config.js
const isCI = process.env.CI === 'true';
const isProd = process.env.NODE_ENV === 'production';
export default {
iterations: isCI ? 5000 : 100,
warmup: isCI ? 100 : 0,
reporters: isCI ? ['json', 'csv'] : ['simple'], // Simple reporter for CI, auto-detect for local
quiet: isCI,
outputDir: isCI ? './benchmark-results' : undefined,
// Only run critical benchmarks in CI
tags: isCI ? ['critical'] : [],
// Exclude slow benchmarks in development
excludeTags: isProd ? [] : ['slow'],
};
name: Performance Tests
on: [push, pull_request]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: 20
- name: Install dependencies
run: npm ci
- name: Build project
run: npm run build
- name: Run benchmarks
run: |
modestbench \
--reporter json \
--reporter csv \
--output ./results \
--quiet \
--tag critical
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: benchmark-results
path: ./results/
- name: Check for regressions
run: node scripts/check-regression.js
scripts/check-regression.js
import { execSync } from 'child_process';
import { readFileSync } from 'fs';
// Run current benchmarks
execSync('modestbench --reporter json --output ./current', {
stdio: 'inherit',
});
const current = JSON.parse(
readFileSync('./current/results.json', 'utf8')
);
// Load baseline results
const baseline = JSON.parse(
readFileSync('./baseline/results.json', 'utf8')
);
let hasRegression = false;
// Check for significant regressions
for (const result of current.results) {
const baselineResult = baseline.results.find(
(r) => r.file === result.file && r.task === result.task
);
if (baselineResult) {
const regression =
(baselineResult.opsPerSecond - result.opsPerSecond) /
baselineResult.opsPerSecond;
if (regression > 0.1) {
// 10% regression threshold
console.error(
`❌ Performance regression in ${result.task}: ${(
regression * 100
).toFixed(1)}% slower`
);
console.error(` Baseline: ${baselineResult.opsPerSecond.toFixed(2)} ops/sec`);
console.error(` Current: ${result.opsPerSecond.toFixed(2)} ops/sec`);
hasRegression = true;
} else if (regression < -0.1) {
// 10% improvement
console.log(
`✅ Performance improvement in ${result.task}: ${(
Math.abs(regression) * 100
).toFixed(1)}% faster`
);
}
}
}
if (hasRegression) {
console.error('\n❌ Performance regressions detected!');
process.exit(1);
} else {
console.log('\n✅ No performance regressions detected!');
}

ModestBench automatically saves results to .modestbench/history/. Use the history commands for performance analysis, regression detection, and trend visualization.

List and filter historical benchmark runs:

Terminal window
# List recent runs
modestbench history list
# List with details (JSON format)
modestbench history list --format json
# Limit number of runs shown
modestbench history list --limit 10
# Filter by date range
modestbench history list --since "7 days ago"
modestbench history list --since 2025-01-01 --until 2025-12-31
# Filter by pattern (file path matching)
modestbench history list --pattern "**/*string*"
# Filter by tags
modestbench history list --tag performance --tag critical

ModestBench supports flexible date formats for filtering:

Terminal window
# ISO 8601 dates
modestbench history list --since 2025-10-01T00:00:00Z
# Relative dates
modestbench history list --since "1 week ago"
modestbench history list --since "3 days ago"
# Shorthand formats
modestbench history list --since 1d # 1 day ago
modestbench history list --since 2w # 2 weeks ago
modestbench history list --since 1m # 1 month ago
modestbench history list --since 6h # 6 hours ago

View detailed information about a specific benchmark run:

Terminal window
# Human-readable format
modestbench history show run-2025-10-07-001
# JSON format for parsing
modestbench history show run-2025-10-07-001 --format json
# Partial ID matching (like Git commits)
modestbench history show 5a63ucbo9w

The show command displays:

  • Run metadata (ID, date, duration, environment)
  • CPU and Node.js version information
  • Git branch and commit (if in a repository)
  • Task-by-task results with mean, margin of error, ops/sec, and coefficient of variation (CV)
  • File organization

Compare two benchmark runs with detailed task-by-task analysis:

Terminal window
# Compare two specific runs
modestbench history compare run-2025-10-07-001 run-2025-10-07-002
# JSON output for scripting
modestbench history compare run-2025-10-07-001 run-2025-10-07-002 --format json
# Using partial IDs
modestbench history compare 5a63ucbo9w 7f2k9x1m3p

Output Details:

  • Mean: Shows percent change in parentheses; higher values are highlighted in bright magenta
  • Min/Max: Arrows are dimmed; higher values highlighted
  • Iterations: “vs” is dimmed; higher iteration count is bolded
  • CV: Coefficient of Variation helps assess measurement consistency (higher = more variable)

JSON Output Structure:

{
"run1": {
"id": "run-2025-10-07-001",
"startTime": "2025-10-07T10:30:45.123Z",
"summary": { "totalFiles": 3, "totalTasks": 12, "passedTasks": 12, "failedTasks": 0 }
},
"run2": {
"id": "run-2025-10-07-002",
"startTime": "2025-10-07T11:45:12.789Z",
"summary": { "totalFiles": 3, "totalTasks": 12, "passedTasks": 12, "failedTasks": 0 }
},
"taskComparisons": [
{
"file": "benchmarks/string.bench.js",
"suite": "String Operations",
"task": "concat vs join",
"percentChange": -7.7,
"run1": {
"mean": 52000,
"min": 48000,
"max": 68000,
"iterations": 1000,
"cv": 2.1
},
"run2": {
"mean": 48000,
"min": 45000,
"max": 62000,
"iterations": 1000,
"cv": 1.9
}
}
]
}

Analyze performance trends across multiple runs with statistical analysis and visualizations:

Terminal window
# Show trends for all tasks
modestbench history trends
# Analyze last N runs only (default: 20)
modestbench history trends --limit 50
# Analyze ALL runs without limit
modestbench history trends --all
# Filter by date range
modestbench history trends --since 1w
# JSON format for custom analysis
modestbench history trends --format json
# Filter by pattern
modestbench history trends --pattern "**/*array*"

Trend Analysis Features:

  • Trend Icons: ▲ improving, ▼ degrading, → stable
  • Sparklines: Scaled to data points (longer lines = more runs)
  • Percent Change: Overall change from first to last run
  • Regression Detection:
    • High-confidence (5+ runs, 5%+ slower): Shown with red ▼
    • Low-confidence (2-4 runs, 5%+ slower): Shown with yellow ! for user awareness
  • Most Variable Task: Distribution histogram shows the task with highest measurement variability (most important to investigate)
  • Bar Chart: Empty buckets are omitted for clarity

Regression Detection Logic:

ModestBench uses a statistically-sound approach:

  1. Requires minimum 5 runs for high-confidence regression flagging
  2. Trend direction must be degrading (negative slope)
  3. Percent change must exceed 5% threshold
  4. Low-confidence warnings (yellow) shown for 2-4 runs with same conditions

This prevents false alarms from single outliers while still alerting to potential issues with limited data.

JSON Output Structure:

{
"runs": 12,
"summary": {
"totalTasks": 27,
"improvingTasks": 4,
"degradingTasks": 2,
"stableTasks": 21
},
"timespan": {
"start": "2025-10-13T10:00:00.000Z",
"end": "2025-10-24T15:30:00.000Z"
},
"trends": [
{
"task": "TypeScript Array Processing › Array.reduce()",
"trend": "improving",
"runs": 12,
"percentChange": -79.6,
"confidence": 95,
"statistics": {
"mean": 48500,
"median": 48000,
"variance": 16000,
"stdDeviation": 4000
},
"dataPoints": [
{ "date": "2025-10-13T10:00:00.000Z", "mean": 225000 },
{ "date": "2025-10-14T10:00:00.000Z", "mean": 198000 },
{ "date": "2025-10-24T15:30:00.000Z", "mean": 48000 }
]
}
],
"regressions": [
{
"task": "Sorting Algorithms › Quick Sort",
"percentChange": 5.3,
"runs": 12
}
],
"lowConfidenceRegressions": [
{
"task": "Async Operations › Fetch Simulation",
"percentChange": 3.2,
"runs": 4
}
]
}

Export benchmark history for external analysis or archival:

Terminal window
# Export to CSV for analysis
modestbench history export \
--format csv \
--output historical-data.csv
# Export to JSON
modestbench history export \
--format json \
--output historical-data.json
# Export filtered data
modestbench history export \
--since 1m \
--pattern "**/*critical*" \
--format json \
--output critical-benchmarks.json

Manage historical data storage:

Terminal window
# Clean runs older than 30 days
modestbench history clean --older-than 30d
# Keep only last 10 runs
modestbench history clean --keep 10
# Clean by size
modestbench history clean --max-size 100mb

Track performance trends over time in your CI pipeline:

.github/workflows/performance.yml
name: Performance Monitoring
on:
push:
branches: [main]
pull_request:
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node
uses: actions/setup-node@v3
with:
node-version: 20
- name: Install
run: npm ci
- name: Run benchmarks
run: modestbench --reporter json
- name: Check for regressions
run: |
# Compare with baseline
LATEST=$(modestbench history list --format json | jq -r '.[0].id')
BASELINE=$(modestbench history list --format json | jq -r '.[1].id')
# Get comparison data
modestbench history compare "$BASELINE" "$LATEST" --format json > comparison.json
# Check for regressions (>5% slower)
node scripts/check-trends.js

Regression Check Script:

scripts/check-trends.js
import { readFileSync } from 'fs';
import { execSync } from 'child_process';
// Get trends data
const trendsOutput = execSync(
'modestbench history trends --format json --limit 10',
{ encoding: 'utf8' }
);
const { regressions, lowConfidenceRegressions } = JSON.parse(trendsOutput);
let hasIssues = false;
if (regressions.length > 0) {
console.error('⚠️ Performance Regressions Detected:\n');
for (const regression of regressions) {
console.error(
`${regression.task}: ${regression.percentChange.toFixed(1)}% slower`
);
}
hasIssues = true;
}
if (lowConfidenceRegressions.length > 0) {
console.warn('⚡ Potential Regressions (insufficient data):\n');
for (const regression of lowConfidenceRegressions) {
console.warn(
` ! ${regression.task}: ${regression.percentChange.toFixed(1)}% slower (${regression.runs} runs)`
);
}
}
if (hasIssues) {
process.exit(1);
} else {
console.log('✅ No performance regressions detected');
}

Use modestbench programmatically in your own tools:

import { modestbench, HumanReporter } from 'modestbench';
// Initialize the engine
const engine = modestbench();
// Register reporters
engine.registerReporter('human', new HumanReporter());
// Execute benchmarks
const result = await engine.execute({
pattern: '**/*.bench.js',
iterations: 1000,
warmup: 50,
reporters: ['human'],
});
// Process results
if (result.summary.failedTasks > 0) {
console.error('Some benchmarks failed');
process.exit(1);
}

Extremely fast operations (<1ns) can cause overflow errors. modestbench handles this automatically:

export default {
suites: {
'Ultra Fast Operations': {
benchmarks: {
// ModestBench will automatically adjust time budget for very fast ops
'Variable Read': () => {
const x = 42;
return x;
},
// For ultra-fast operations, reduce iterations
'Constant Return': {
fn: () => 42,
config: {
iterations: 100, // Lower sample count
},
},
},
},
},
};

Benchmark results include memory information:

{
"environment": {
"memory": {
"total": 51539607552,
"totalGB": 48.0,
"free": 12884901888,
"freeGB": 12.0
}
}
}

Track memory usage across runs to identify memory-intensive operations.

Run benchmark files concurrently for faster execution:

Terminal window
modestbench --concurrent

Considerations:

  • Files run in parallel, but tasks within a file run sequentially
  • May cause resource contention on systems with limited CPU/memory
  • Results may vary between runs due to system load
  • Not recommended for accurate performance measurements

If benchmarks show high margin of error (>5%):

  1. Increase warmup iterations: --warmup 100
  2. Increase sample size: --iterations 2000
  3. Close other applications to reduce system load
  4. Use time-based limiting: --time 10000 --limit-by time

If benchmarks timeout:

  1. Increase timeout: --timeout 60000
  2. Reduce iterations: --iterations 10
  3. Check for infinite loops in benchmark code

If results vary significantly between runs:

  1. Use warmup iterations: --warmup 100
  2. Increase sample size: --iterations 5000
  3. Run in isolation (no other processes)
  4. Check for async operations completing outside benchmark scope

Each benchmark should test one specific operation:

// ❌ Bad: Testing multiple things
'Bad Benchmark': () => {
const arr = [];
for (let i = 0; i < 1000; i++) {
arr.push(i);
}
return arr.sort();
};
// ✅ Good: Isolated operations
'Array Push': () => {
const arr = [];
for (let i = 0; i < 1000; i++) {
arr.push(i);
}
return arr;
},
'Array Sort': () => {
const arr = Array.from({ length: 1000 }, (_, i) => i);
return arr.sort();
},

Keep benchmarks pure and repeatable:

// ❌ Bad: Modifying external state
let counter = 0;
'Bad Benchmark': () => {
counter++;
return counter;
};
// ✅ Good: No external state
'Good Benchmark': () => {
let counter = 0;
counter++;
return counter;
};

Enable warmup for operations that benefit from JIT optimization:

export default {
suites: {
'JIT-Optimized Operations': {
benchmarks: {
'Math Operations': {
fn: () => Math.sqrt(42) * Math.PI,
config: {
warmup: 100,
iterations: 5000,
},
},
},
},
},
};

Use tags to organize and filter benchmarks:

export default {
tags: ['core'], // Project-wide tag
suites: {
'Critical Path': {
tags: ['critical', 'fast'], // Important, quick benchmarks
benchmarks: { /* ... */ },
},
'Edge Cases': {
tags: ['edge-case', 'slow'], // Thorough but slow tests
benchmarks: { /* ... */ },
},
},
};