Advanced Usage
Benchmark Engines
Section titled “Benchmark Engines”ModestBench provides two engines with different performance characteristics and statistical approaches.
Engine Selection
Section titled “Engine Selection”Choose an engine based on your requirements:
# Tinybench engine (default) - fast development iterationmodestbench --engine tinybench
# Accurate engine - high-precision measurementsnode --allow-natives-syntax ./node_modules/.bin/modestbench --engine accurateStatistical Improvements
Section titled “Statistical Improvements”Both engines now use IQR (Interquartile Range) outlier removal to filter extreme values caused by:
- Garbage collection pauses
- System interruptions
- Background processes
- OS scheduler variations
This results in more stable and reliable measurements compared to raw statistical analysis.
AccurateEngine Statistical Features
Section titled “AccurateEngine Statistical Features”The accurate engine provides enhanced statistical analysis:
- V8 Optimization Guards: Uses V8 intrinsics (
%NeverOptimizeFunction) to prevent JIT compiler interference with measurements - IQR Outlier Removal: Automatically removes extreme outliers (beyond Q1 - 1.5×IQR and Q3 + 1.5×IQR)
- Comprehensive Statistics:
- Mean, min, max execution times
- Standard deviation and variance
- Coefficient of Variation (CV): Measures relative variability (
stdDev / mean × 100) - 95th and 99th percentiles
- Margin of error (95% confidence interval)
Coefficient of Variation (CV)
Section titled “Coefficient of Variation (CV)”The CV metric helps assess benchmark quality:
CV < 5% → Excellent (very stable)CV 5-10% → Good (acceptable variance)CV 10-20% → Fair (consider more samples)CV > 20% → Poor (investigate noise sources)Example output showing CV:
$ modestbench --engine accurate --allow-natives-syntax --reporter json{ "name": "Array.push()", "mean": 810050, // nanoseconds "stdDev": 19842, "cv": 2.45, // 2.45% - excellent stability "marginOfError": 0.024, "p95": 845200, "p99": 862100}Performance Comparison
Section titled “Performance Comparison”Real-world comparison using examples/bench:
# Tinybench (fast iteration)$ modestbench --engine tinybench --reporter json# Typical run time: 3-5 seconds for 5 benchmark files
# Accurate (high precision)$ node --allow-natives-syntax ./node_modules/.bin/modestbench --engine accurate --reporter json# Typical run time: 8-12 seconds for 5 benchmark filesThe accurate engine takes ~2-3x longer but provides:
- More consistent results between runs
- Better outlier filtering with V8 guards
- Higher confidence in micro-optimizations
Choosing the Right Engine
Section titled “Choosing the Right Engine”| Use Case | Recommended Engine |
|---|---|
| Development iteration | tinybench |
| CI/CD regression tests | tinybench |
| Blog post/publication | accurate |
| Library optimization | accurate |
| Micro-benchmark comparison | accurate |
| Algorithm selection | Either (results typically consistent) |
Multiple Suites
Section titled “Multiple Suites”Organize related benchmarks into separate suites with independent setup and teardown:
const state = { data: [], sortedData: [],};
export default { suites: { Sorting: { setup() { state.data = generateTestData(1000); }, teardown() { state.data = []; }, benchmarks: { 'Quick Sort': () => quickSort(state.data), 'Merge Sort': () => mergeSort(state.data), 'Bubble Sort': () => bubbleSort(state.data), }, },
Searching: { setup() { state.sortedData = generateSortedData(10000); }, teardown() { state.sortedData = []; }, benchmarks: { 'Binary Search': () => binarySearch(state.sortedData, 5000), 'Linear Search': () => linearSearch(state.sortedData, 5000), 'Jump Search': () => jumpSearch(state.sortedData, 5000), }, }, },};Suite Lifecycle
Section titled “Suite Lifecycle”- setup() - Called once before any tasks in the suite run
- Tasks execute - Each task runs with its configured iterations
- teardown() - Called once after all tasks complete
Async Operations
Section titled “Async Operations”ModestBench fully supports asynchronous benchmarks:
Async Functions
Section titled “Async Functions”export default { suites: { 'Async Performance': { benchmarks: { // Simple async benchmark 'Promise.resolve()': async () => { return await Promise.resolve('test'); },
// With configuration 'Fetch Simulation': { async fn() { const response = await simulateApiCall(); return response.json(); }, config: { iterations: 100, // Fewer iterations for slow operations }, }, }, }, },};Async Setup/Teardown
Section titled “Async Setup/Teardown”export default { suites: { 'Database Operations': { async setup() { this.db = await connectDatabase(); await this.db.seed(); },
async teardown() { await this.db.close(); },
benchmarks: { 'Read Query': async function() { return await this.db.query('SELECT * FROM users LIMIT 100'); },
'Write Query': async function() { return await this.db.insert({ name: 'Test User' }); }, }, }, },};Tagging and Filtering
Section titled “Tagging and Filtering”Tag Cascading
Section titled “Tag Cascading”Tags cascade from file → suite → task levels:
export default { // File-level tags (inherited by all suites and tasks) tags: ['performance', 'core'],
suites: { 'String Operations': { // Suite-level tags (inherited by all tasks in this suite) tags: ['string', 'fast'],
benchmarks: { // Task inherits: ['performance', 'core', 'string', 'fast', 'regex'] 'RegExp Test': { fn: () => /pattern/.test(str), tags: ['regex'], // Task-specific tags },
// Task inherits: ['performance', 'core', 'string', 'fast'] 'String Includes': () => str.includes('pattern'), }, },
'Array Operations': { tags: ['array', 'slow'],
benchmarks: { // Task inherits: ['performance', 'core', 'array', 'slow'] 'Array spread': () => { let arr = []; for (let i = 0; i < 1000; i++) { arr = [...arr, i]; } return arr; }, }, }, },};Filtering Examples
Section titled “Filtering Examples”# Run only fast benchmarksmodestbench --tag fast# Runs: 'RegExp Test', 'String Includes'
# Run string OR array benchmarksmodestbench --tag string --tag array# Runs: All tasks in 'String Operations' and 'Array Operations'
# Exclude slow benchmarksmodestbench --exclude-tag slow# Runs: Only 'String Operations' tasks
# Combine: run fast benchmarks except experimentalmodestbench --tag fast --exclude-tag experimentalSuite Lifecycle with Filtering
Section titled “Suite Lifecycle with Filtering”Suite setup() and teardown() only run if at least one task in the suite matches the filter:
export default { suites: { 'Expensive Setup': { setup() { console.log('This only runs if at least one task will execute'); this.expensiveResource = createExpensiveResource(); },
teardown() { console.log('This only runs if setup ran'); this.expensiveResource.destroy(); },
benchmarks: { 'Fast Task': { fn() { /* ... */ }, tags: ['fast'], }, 'Slow Task': { fn() { /* ... */ }, tags: ['slow'], }, }, }, },};# Setup and teardown run (Fast Task matches)modestbench --tag fast
# Setup and teardown DON'T run (Slow Task excluded)modestbench --exclude-tag slowCustom Task Configuration
Section titled “Custom Task Configuration”Configure individual tasks with specific settings:
export default { suites: { 'Custom Configs': { benchmarks: { // Default configuration 'Standard Task': () => someOperation(),
// Custom iterations 'High Sample Task': { fn: () => criticalOperation(), config: { iterations: 10000, warmup: 200, }, },
// Custom timeout for slow operations 'Slow Operation': { fn: async () => await slowAsyncOperation(), config: { timeout: 60000, // 60 seconds iterations: 10, // Fewer samples }, }, }, }, },};Environment-Specific Benchmarks
Section titled “Environment-Specific Benchmarks”Use JavaScript config files for dynamic configuration:
const isCI = process.env.CI === 'true';const isProd = process.env.NODE_ENV === 'production';
export default { iterations: isCI ? 5000 : 100, warmup: isCI ? 100 : 0, reporters: isCI ? ['json', 'csv'] : ['simple'], // Simple reporter for CI, auto-detect for local quiet: isCI, outputDir: isCI ? './benchmark-results' : undefined,
// Only run critical benchmarks in CI tags: isCI ? ['critical'] : [],
// Exclude slow benchmarks in development excludeTags: isProd ? [] : ['slow'],};CI/CD Integration
Section titled “CI/CD Integration”GitHub Actions
Section titled “GitHub Actions”name: Performance Testson: [push, pull_request]
jobs: benchmark: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3
- uses: actions/setup-node@v3 with: node-version: 20
- name: Install dependencies run: npm ci
- name: Build project run: npm run build
- name: Run benchmarks run: | modestbench \ --reporter json \ --reporter csv \ --output ./results \ --quiet \ --tag critical
- name: Upload results uses: actions/upload-artifact@v3 with: name: benchmark-results path: ./results/
- name: Check for regressions run: node scripts/check-regression.jsPerformance Regression Detection
Section titled “Performance Regression Detection”import { execSync } from 'child_process';import { readFileSync } from 'fs';
// Run current benchmarksexecSync('modestbench --reporter json --output ./current', { stdio: 'inherit',});
const current = JSON.parse( readFileSync('./current/results.json', 'utf8'));
// Load baseline resultsconst baseline = JSON.parse( readFileSync('./baseline/results.json', 'utf8'));
let hasRegression = false;
// Check for significant regressionsfor (const result of current.results) { const baselineResult = baseline.results.find( (r) => r.file === result.file && r.task === result.task );
if (baselineResult) { const regression = (baselineResult.opsPerSecond - result.opsPerSecond) / baselineResult.opsPerSecond;
if (regression > 0.1) { // 10% regression threshold console.error( `❌ Performance regression in ${result.task}: ${( regression * 100 ).toFixed(1)}% slower` ); console.error(` Baseline: ${baselineResult.opsPerSecond.toFixed(2)} ops/sec`); console.error(` Current: ${result.opsPerSecond.toFixed(2)} ops/sec`); hasRegression = true; } else if (regression < -0.1) { // 10% improvement console.log( `✅ Performance improvement in ${result.task}: ${( Math.abs(regression) * 100 ).toFixed(1)}% faster` ); } }}
if (hasRegression) { console.error('\n❌ Performance regressions detected!'); process.exit(1);} else { console.log('\n✅ No performance regressions detected!');}Historical Tracking
Section titled “Historical Tracking”ModestBench automatically saves results to .modestbench/history/. Use the history commands for performance analysis, regression detection, and trend visualization.
Viewing Run History
Section titled “Viewing Run History”List and filter historical benchmark runs:
# List recent runsmodestbench history list
# List with details (JSON format)modestbench history list --format json
# Limit number of runs shownmodestbench history list --limit 10
# Filter by date rangemodestbench history list --since "7 days ago"modestbench history list --since 2025-01-01 --until 2025-12-31
# Filter by pattern (file path matching)modestbench history list --pattern "**/*string*"
# Filter by tagsmodestbench history list --tag performance --tag criticalDate Range Filtering
Section titled “Date Range Filtering”ModestBench supports flexible date formats for filtering:
# ISO 8601 datesmodestbench history list --since 2025-10-01T00:00:00Z
# Relative datesmodestbench history list --since "1 week ago"modestbench history list --since "3 days ago"
# Shorthand formatsmodestbench history list --since 1d # 1 day agomodestbench history list --since 2w # 2 weeks agomodestbench history list --since 1m # 1 month agomodestbench history list --since 6h # 6 hours agoShow Specific Run
Section titled “Show Specific Run”View detailed information about a specific benchmark run:
# Human-readable formatmodestbench history show run-2025-10-07-001
# JSON format for parsingmodestbench history show run-2025-10-07-001 --format json
# Partial ID matching (like Git commits)modestbench history show 5a63ucbo9wThe show command displays:
- Run metadata (ID, date, duration, environment)
- CPU and Node.js version information
- Git branch and commit (if in a repository)
- Task-by-task results with mean, margin of error, ops/sec, and coefficient of variation (CV)
- File organization
Comparing Runs
Section titled “Comparing Runs”Compare two benchmark runs with detailed task-by-task analysis:
# Compare two specific runsmodestbench history compare run-2025-10-07-001 run-2025-10-07-002
# JSON output for scriptingmodestbench history compare run-2025-10-07-001 run-2025-10-07-002 --format json
# Using partial IDsmodestbench history compare 5a63ucbo9w 7f2k9x1m3pOutput Details:
- Mean: Shows percent change in parentheses; higher values are highlighted in bright magenta
- Min/Max: Arrows are dimmed; higher values highlighted
- Iterations: “vs” is dimmed; higher iteration count is bolded
- CV: Coefficient of Variation helps assess measurement consistency (higher = more variable)
JSON Output Structure:
{ "run1": { "id": "run-2025-10-07-001", "startTime": "2025-10-07T10:30:45.123Z", "summary": { "totalFiles": 3, "totalTasks": 12, "passedTasks": 12, "failedTasks": 0 } }, "run2": { "id": "run-2025-10-07-002", "startTime": "2025-10-07T11:45:12.789Z", "summary": { "totalFiles": 3, "totalTasks": 12, "passedTasks": 12, "failedTasks": 0 } }, "taskComparisons": [ { "file": "benchmarks/string.bench.js", "suite": "String Operations", "task": "concat vs join", "percentChange": -7.7, "run1": { "mean": 52000, "min": 48000, "max": 68000, "iterations": 1000, "cv": 2.1 }, "run2": { "mean": 48000, "min": 45000, "max": 62000, "iterations": 1000, "cv": 1.9 } } ]}Performance Trends Analysis
Section titled “Performance Trends Analysis”Analyze performance trends across multiple runs with statistical analysis and visualizations:
# Show trends for all tasksmodestbench history trends
# Analyze last N runs only (default: 20)modestbench history trends --limit 50
# Analyze ALL runs without limitmodestbench history trends --all
# Filter by date rangemodestbench history trends --since 1w
# JSON format for custom analysismodestbench history trends --format json
# Filter by patternmodestbench history trends --pattern "**/*array*"Trend Analysis Features:
- Trend Icons: ▲ improving, ▼ degrading, → stable
- Sparklines: Scaled to data points (longer lines = more runs)
- Percent Change: Overall change from first to last run
- Regression Detection:
- High-confidence (5+ runs, 5%+ slower): Shown with red ▼
- Low-confidence (2-4 runs, 5%+ slower): Shown with yellow ! for user awareness
- Most Variable Task: Distribution histogram shows the task with highest measurement variability (most important to investigate)
- Bar Chart: Empty buckets are omitted for clarity
Regression Detection Logic:
ModestBench uses a statistically-sound approach:
- Requires minimum 5 runs for high-confidence regression flagging
- Trend direction must be degrading (negative slope)
- Percent change must exceed 5% threshold
- Low-confidence warnings (yellow) shown for 2-4 runs with same conditions
This prevents false alarms from single outliers while still alerting to potential issues with limited data.
JSON Output Structure:
{ "runs": 12, "summary": { "totalTasks": 27, "improvingTasks": 4, "degradingTasks": 2, "stableTasks": 21 }, "timespan": { "start": "2025-10-13T10:00:00.000Z", "end": "2025-10-24T15:30:00.000Z" }, "trends": [ { "task": "TypeScript Array Processing › Array.reduce()", "trend": "improving", "runs": 12, "percentChange": -79.6, "confidence": 95, "statistics": { "mean": 48500, "median": 48000, "variance": 16000, "stdDeviation": 4000 }, "dataPoints": [ { "date": "2025-10-13T10:00:00.000Z", "mean": 225000 }, { "date": "2025-10-14T10:00:00.000Z", "mean": 198000 }, { "date": "2025-10-24T15:30:00.000Z", "mean": 48000 } ] } ], "regressions": [ { "task": "Sorting Algorithms › Quick Sort", "percentChange": 5.3, "runs": 12 } ], "lowConfidenceRegressions": [ { "task": "Async Operations › Fetch Simulation", "percentChange": 3.2, "runs": 4 } ]}Export Historical Data
Section titled “Export Historical Data”Export benchmark history for external analysis or archival:
# Export to CSV for analysismodestbench history export \ --format csv \ --output historical-data.csv
# Export to JSONmodestbench history export \ --format json \ --output historical-data.json
# Export filtered datamodestbench history export \ --since 1m \ --pattern "**/*critical*" \ --format json \ --output critical-benchmarks.jsonCleanup Old Data
Section titled “Cleanup Old Data”Manage historical data storage:
# Clean runs older than 30 daysmodestbench history clean --older-than 30d
# Keep only last 10 runsmodestbench history clean --keep 10
# Clean by sizemodestbench history clean --max-size 100mbUsing History in CI/CD
Section titled “Using History in CI/CD”Track performance trends over time in your CI pipeline:
name: Performance Monitoring
on: push: branches: [main] pull_request:
jobs: benchmark: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3
- name: Setup Node uses: actions/setup-node@v3 with: node-version: 20
- name: Install run: npm ci
- name: Run benchmarks run: modestbench --reporter json
- name: Check for regressions run: | # Compare with baseline LATEST=$(modestbench history list --format json | jq -r '.[0].id') BASELINE=$(modestbench history list --format json | jq -r '.[1].id')
# Get comparison data modestbench history compare "$BASELINE" "$LATEST" --format json > comparison.json
# Check for regressions (>5% slower) node scripts/check-trends.jsRegression Check Script:
import { readFileSync } from 'fs';import { execSync } from 'child_process';
// Get trends dataconst trendsOutput = execSync( 'modestbench history trends --format json --limit 10', { encoding: 'utf8' });
const { regressions, lowConfidenceRegressions } = JSON.parse(trendsOutput);
let hasIssues = false;
if (regressions.length > 0) { console.error('⚠️ Performance Regressions Detected:\n');
for (const regression of regressions) { console.error( ` ▼ ${regression.task}: ${regression.percentChange.toFixed(1)}% slower` ); }
hasIssues = true;}
if (lowConfidenceRegressions.length > 0) { console.warn('⚡ Potential Regressions (insufficient data):\n');
for (const regression of lowConfidenceRegressions) { console.warn( ` ! ${regression.task}: ${regression.percentChange.toFixed(1)}% slower (${regression.runs} runs)` ); }}
if (hasIssues) { process.exit(1);} else { console.log('✅ No performance regressions detected');}Programmatic API
Section titled “Programmatic API”Use modestbench programmatically in your own tools:
import { modestbench, HumanReporter } from 'modestbench';
// Initialize the engineconst engine = modestbench();
// Register reportersengine.registerReporter('human', new HumanReporter());
// Execute benchmarksconst result = await engine.execute({ pattern: '**/*.bench.js', iterations: 1000, warmup: 50, reporters: ['human'],});
// Process resultsif (result.summary.failedTasks > 0) { console.error('Some benchmarks failed'); process.exit(1);}Handling Fast Operations
Section titled “Handling Fast Operations”Extremely fast operations (<1ns) can cause overflow errors. modestbench handles this automatically:
export default { suites: { 'Ultra Fast Operations': { benchmarks: { // ModestBench will automatically adjust time budget for very fast ops 'Variable Read': () => { const x = 42; return x; },
// For ultra-fast operations, reduce iterations 'Constant Return': { fn: () => 42, config: { iterations: 100, // Lower sample count }, }, }, }, },};Memory Profiling Context
Section titled “Memory Profiling Context”Benchmark results include memory information:
{ "environment": { "memory": { "total": 51539607552, "totalGB": 48.0, "free": 12884901888, "freeGB": 12.0 } }}Track memory usage across runs to identify memory-intensive operations.
Concurrent Execution
Section titled “Concurrent Execution”Run benchmark files concurrently for faster execution:
modestbench --concurrentConsiderations:
- Files run in parallel, but tasks within a file run sequentially
- May cause resource contention on systems with limited CPU/memory
- Results may vary between runs due to system load
- Not recommended for accurate performance measurements
Troubleshooting
Section titled “Troubleshooting”High Margin of Error
Section titled “High Margin of Error”If benchmarks show high margin of error (>5%):
- Increase warmup iterations:
--warmup 100 - Increase sample size:
--iterations 2000 - Close other applications to reduce system load
- Use time-based limiting:
--time 10000 --limit-by time
Timeouts
Section titled “Timeouts”If benchmarks timeout:
- Increase timeout:
--timeout 60000 - Reduce iterations:
--iterations 10 - Check for infinite loops in benchmark code
Inconsistent Results
Section titled “Inconsistent Results”If results vary significantly between runs:
- Use warmup iterations:
--warmup 100 - Increase sample size:
--iterations 5000 - Run in isolation (no other processes)
- Check for async operations completing outside benchmark scope
Best Practices
Section titled “Best Practices”1. Isolate Benchmarks
Section titled “1. Isolate Benchmarks”Each benchmark should test one specific operation:
// ❌ Bad: Testing multiple things'Bad Benchmark': () => { const arr = []; for (let i = 0; i < 1000; i++) { arr.push(i); } return arr.sort();};
// ✅ Good: Isolated operations'Array Push': () => { const arr = []; for (let i = 0; i < 1000; i++) { arr.push(i); } return arr;},'Array Sort': () => { const arr = Array.from({ length: 1000 }, (_, i) => i); return arr.sort();},2. Avoid Side Effects
Section titled “2. Avoid Side Effects”Keep benchmarks pure and repeatable:
// ❌ Bad: Modifying external statelet counter = 0;'Bad Benchmark': () => { counter++; return counter;};
// ✅ Good: No external state'Good Benchmark': () => { let counter = 0; counter++; return counter;};3. Use Warmup for JIT
Section titled “3. Use Warmup for JIT”Enable warmup for operations that benefit from JIT optimization:
export default { suites: { 'JIT-Optimized Operations': { benchmarks: { 'Math Operations': { fn: () => Math.sqrt(42) * Math.PI, config: { warmup: 100, iterations: 5000, }, }, }, }, },};4. Tag Strategically
Section titled “4. Tag Strategically”Use tags to organize and filter benchmarks:
export default { tags: ['core'], // Project-wide tag
suites: { 'Critical Path': { tags: ['critical', 'fast'], // Important, quick benchmarks benchmarks: { /* ... */ }, },
'Edge Cases': { tags: ['edge-case', 'slow'], // Thorough but slow tests benchmarks: { /* ... */ }, }, },};