Performance & Load Testing: From Functional Testing to Performance Engineering

How to test performance, find bottlenecks, and ensure stability under load


Performance Testing

📍 You are here:

[✓] Article 1: QA Fundamentals
[✓] Article 2: QA Practice  
[✓] Article 3: DSA for QA
[✓] Article 4: Automation Frameworks
[✓] Article 5: CI/CD
[✓] Article 6: Test Design Techniques
[→] Article 7: Performance Testing ← Currently reading
[ ] Article 8: Landing Your Dream Job at Apple

Progress: 87% ✨

Your application works great on your laptop. But what happens when 10,000 users log in simultaneously? When the database grows to 10 million records? When the API receives 1,000 requests per second?

Welcome to Performance Testing — a critical skill that separates Senior QA from Middle.

From a real Apple Software Quality Engineer job posting:

“Experience with performance testing and analysis”

In this article you will learn:

  • Understanding performance metrics
  • Conducting load and stress testing
  • Using JMeter, K6, Gatling
  • Finding and analyzing bottlenecks
  • Integrating performance tests into CI/CD

📋 Table of Contents

  1. Types of Performance Testing
  2. Key Metrics
  3. JMeter: Industry Standard
  4. K6: Modern JavaScript Approach
  5. Gatling: Scala Power
  6. Real-World Scenarios
  7. Analyzing Results & Finding Bottlenecks
  8. Performance Testing in CI/CD
  9. Best Practices
  10. Learning Resources

🎯 Types of Performance Testing

1. Load Testing

Goal: Check how the system behaves under expected load

Example:

  • Normal: 1,000 concurrent users
  • Black Friday: 50,000 concurrent users

Questions:

  • Will the system handle the planned load?
  • What is the response time under normal load?
// K6 Load Test Example
import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
    stages: [
        { duration: '2m', target: 100 },  // Ramp up to 100 users
        { duration: '5m', target: 100 },  // Stay at 100 users
        { duration: '2m', target: 0 },    // Ramp down to 0
    ],
    thresholds: {
        http_req_duration: ['p(95)<500'], // 95% requests under 500ms
        http_req_failed: ['rate<0.01'],   // <1% errors
    },
};

export default function() {
    const res = http.get('https://api.example.com/products');
    
    check(res, {
        'status is 200': (r) => r.status === 200,
        'response time < 500ms': (r) => r.timings.duration < 500,
    });
    
    sleep(1);
}

2. Stress Testing

Goal: Find the breaking point of the system

Scenario: Gradually increase load until the system crashes

Questions:

  • When does the system start to degrade?
  • How does it recover after a crash?
  • Which components fail first?
// K6 Stress Test
export const options = {
    stages: [
        { duration: '2m', target: 100 },   // Normal load
        { duration: '5m', target: 200 },   // Around breaking point
        { duration: '2m', target: 300 },   // Beyond capacity
        { duration: '5m', target: 400 },   // Way over capacity
        { duration: '10m', target: 0 },    // Recovery
    ],
};

export default function() {
    const res = http.get('https://api.example.com/products');
    
    check(res, {
        'status is not 500': (r) => r.status !== 500,
    });
    
    sleep(1);
}

3. Spike Testing

Goal: Check reaction to sudden traffic spikes

Example:

  • An ad campaign launches
  • Viral social media post
  • Flash sale starts
// K6 Spike Test
export const options = {
    stages: [
        { duration: '10s', target: 100 },  // Normal
        { duration: '1m', target: 2000 },  // SPIKE!
        { duration: '3m', target: 2000 },  // Hold spike
        { duration: '10s', target: 100 },  // Return to normal
        { duration: '3m', target: 100 },   // Recovery
    ],
};

4. Soak Testing (Endurance Testing)

Goal: Check stability under prolonged load

Problems it finds:

  • Memory leaks
  • Database connection leaks
  • Disk space issues
  • Log file growth
// K6 Soak Test - runs for hours
export const options = {
    stages: [
        { duration: '2m', target: 400 },    // Ramp up
        { duration: '3h56m', target: 400 }, // Stay for ~4 hours
        { duration: '2m', target: 0 },      // Ramp down
    ],
};

📊 Key Performance Metrics

1. Response Time

Definition: Time from sending a request to receiving a response

Metrics:

  • Average Response Time - mean time
  • Median (50th percentile) - half the requests are faster
  • 90th percentile - 90% of requests are faster
  • 95th percentile - 95% of requests are faster
  • 99th percentile - 99% of requests are faster

Why percentiles matter more than average:

Example:
9 requests: 100ms each
1 request: 10,000ms (timeout)

Average: 1,090ms (looks bad)
95th percentile: 100ms (actually good for 95% of users)

Target values:

Operation TypeTarget
Page Load< 2 seconds
API GET< 200ms
API POST< 500ms
Search< 1 second
Database Query< 100ms

2. Throughput

Definition: Number of requests per unit of time

Metrics:

  • Requests per second (RPS)
  • Transactions per second (TPS)
  • Pages per minute

Example:

Good: 1000 RPS with 200ms response time
Bad: 1000 RPS with 5000ms response time

3. Error Rate

Definition: Percentage of failed requests

Target values:

  • Production: < 0.1% (99.9% success)
  • Staging: < 1%
  • Critical APIs: < 0.01% (99.99% success)

Error types:

  • 4xx errors (client errors)
  • 5xx errors (server errors)
  • Timeouts
  • Connection errors

4. Concurrent Users

Definition: Number of users active simultaneously

Important to understand the difference:

Total Users: 10,000
Active Users: 3,000 (online now)
Concurrent Users: 500 (making requests NOW)

Calculation formula:

Concurrent Users = (Total Users × Usage Percentage) / Think Time

5. Apdex Score

Definition: Application Performance Index (0 to 1)

Formula:

Apdex = (Satisfied + (Tolerating / 2)) / Total Samples

Where:
- Satisfied: Response Time ≤ T
- Tolerating: T < Response Time ≤ 4T
- Frustrated: Response Time > 4T

Example:

T = 500ms (target)

100 requests:
- 70 under 500ms (Satisfied)
- 20 between 500ms-2000ms (Tolerating)
- 10 over 2000ms (Frustrated)

Apdex = (70 + 20/2) / 100 = 0.8 (Good)

Rating:

  • 1.0 - 0.94 = Excellent
  • 0.93 - 0.85 = Good
  • 0.84 - 0.70 = Fair
  • 0.69 - 0.50 = Poor
  • < 0.50 = Unacceptable

🔨 JMeter: Industry Standard

Why JMeter?

Advantages:

  • ✅ Industry standard (20+ years)
  • ✅ Huge community
  • ✅ GUI for test creation
  • ✅ Supports many protocols
  • ✅ Free and open-source

Disadvantages:

  • ❌ Java-based (heavy)
  • ❌ GUI not suitable for CI/CD
  • ❌ Steep learning curve

Basic Installation

# Download JMeter
wget https://dlcdn.apache.org//jmeter/binaries/apache-jmeter-5.6.3.tgz
tar -xzf apache-jmeter-5.6.3.tgz
cd apache-jmeter-5.6.3/bin

# Run GUI
./jmeter

# Run in CLI (for CI/CD)
./jmeter -n -t test-plan.jmx -l results.jtl -e -o report/

Creating a Simple Test

Test Plan Structure:

Test Plan
├── Thread Group (Users)
│   ├── HTTP Request (API call)
│   ├── HTTP Header Manager
│   ├── JSON Assertions
│   └── Response Time Assertions
├── Listeners (Results)
│   ├── View Results Tree
│   ├── Summary Report
│   └── Aggregate Report

Example: API Load Test

Thread Group Settings:

Number of Threads: 100
Ramp-up Period: 60 seconds
Loop Count: 10

HTTP Request:

Server: api.example.com
Port: 443
Protocol: https
Method: GET
Path: /api/v1/products

Assertions:

// Response Code
Response Code: 200

// Response Time
Response Time: <= 500ms

// JSON Body
$.data.length > 0

Advanced: Parametrization

CSV Data Set Config:

users.csv:
email,password
user1@test.com,pass123
user2@test.com,pass456
user3@test.com,pass789

HTTP Request with variables:

POST /api/login
Body:
{
    "email": "${email}",
    "password": "${password}"
}

⚡ K6: Modern JavaScript Approach

Why K6?

Advantages:

  • ✅ JavaScript (familiar syntax for QA)
  • ✅ CLI-first (great for CI/CD)
  • ✅ Light and fast
  • ✅ Built-in JSON results
  • ✅ Cloud integration

Disadvantages:

  • ❌ No GUI
  • ❌ Fewer protocols than JMeter
  • ❌ Relatively young (2017)

Installation

# macOS
brew install k6

# Linux
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6

# Windows
choco install k6

# Docker
docker pull grafana/k6

Basic Test

simple-test.js:

import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
    vus: 10,        // Virtual Users
    duration: '30s', // Test duration
};

export default function() {
    const res = http.get('https://api.example.com/products');
    
    check(res, {
        'status is 200': (r) => r.status === 200,
        'response time < 200ms': (r) => r.timings.duration < 200,
    });
    
    sleep(1);
}

Run:

k6 run simple-test.js

Advanced: Scenarios

multi-scenario.js:

import http from 'k6/http';
import { sleep } from 'k6';

export const options = {
    scenarios: {
        // Scenario 1: Browse products
        browse: {
            executor: 'constant-vus',
            vus: 50,
            duration: '5m',
            exec: 'browseProducts',
        },
        
        // Scenario 2: Search
        search: {
            executor: 'ramping-vus',
            startVUs: 0,
            stages: [
                { duration: '2m', target: 20 },
                { duration: '5m', target: 20 },
                { duration: '2m', target: 0 },
            ],
            exec: 'searchProducts',
        },
        
        // Scenario 3: Checkout (heavy operation)
        checkout: {
            executor: 'constant-arrival-rate',
            rate: 10,
            timeUnit: '1s',
            duration: '5m',
            preAllocatedVUs: 50,
            exec: 'checkoutFlow',
        },
    },
    
    thresholds: {
        'http_req_duration{scenario:browse}': ['p(95)<500'],
        'http_req_duration{scenario:search}': ['p(95)<1000'],
        'http_req_duration{scenario:checkout}': ['p(95)<2000'],
        'http_req_failed': ['rate<0.01'],
    },
};

export function browseProducts() {
    http.get('https://api.example.com/products');
    sleep(1);
}

export function searchProducts() {
    http.get('https://api.example.com/products/search?q=laptop');
    sleep(2);
}

export function checkoutFlow() {
    // Login
    const loginRes = http.post('https://api.example.com/auth/login', {
        email: 'test@example.com',
        password: 'pass123',
    });
    
    const token = loginRes.json('token');
    
    // Add to cart
    http.post('https://api.example.com/cart', 
        JSON.stringify({ productId: 123, quantity: 1 }),
        { headers: { 'Authorization': `Bearer ${token}` } }
    );
    
    // Checkout
    http.post('https://api.example.com/checkout',
        JSON.stringify({ paymentMethod: 'card' }),
        { headers: { 'Authorization': `Bearer ${token}` } }
    );
    
    sleep(3);
}

Custom Metrics

import http from 'k6/http';
import { Trend, Counter } from 'k6/metrics';

// Custom metrics
const loginDuration = new Trend('login_duration');
const checkoutErrors = new Counter('checkout_errors');

export default function() {
    const start = Date.now();
    const res = http.post('https://api.example.com/login', {...});
    const duration = Date.now() - start;
    
    loginDuration.add(duration);
    
    if (res.status !== 200) {
        checkoutErrors.add(1);
    }
}

K6 in CI/CD

GitHub Actions:

name: Performance Tests

on: [push]

jobs:
  k6-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Run K6 test
        uses: grafana/k6-action@v0.3.1
        with:
          filename: performance-tests/load-test.js
          flags: --out json=results.json
      
      - name: Upload results
        uses: actions/upload-artifact@v3
        with:
          name: k6-results
          path: results.json

🎯 Gatling: Scala Power

Why Gatling?

Advantages:

  • ✅ Very high performance
  • ✅ Excellent reports (best in industry)
  • ✅ Great for HTTP/WebSocket
  • ✅ Reusable simulation code

Disadvantages:

  • ❌ Scala (harder for QA without programming background)
  • ❌ Fewer protocols
  • ❌ Smaller community than JMeter

Installation

# Download
wget https://repo1.maven.org/maven2/io/gatling/highcharts/gatling-charts-highcharts-bundle/3.10.3/gatling-charts-highcharts-bundle-3.10.3-bundle.zip
unzip gatling-charts-highcharts-bundle-3.10.3-bundle.zip
cd gatling-charts-highcharts-bundle-3.10.3

# Run recorder (to record scenarios)
./bin/recorder.sh

# Run simulation
./bin/gatling.sh

Basic Simulation

BasicSimulation.scala:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class BasicSimulation extends Simulation {
  
  val httpProtocol = http
    .baseUrl("https://api.example.com")
    .acceptHeader("application/json")
    .userAgentHeader("Gatling Performance Test")
  
  val scn = scenario("Basic Load Test")
    .exec(
      http("Get Products")
        .get("/api/products")
        .check(status.is(200))
        .check(jsonPath("$.data").exists)
        .check(responseTimeInMillis.lte(500))
    )
    .pause(1)
  
  setUp(
    scn.inject(
      rampUsers(100).during(60.seconds) // 100 users over 60 seconds
    ).protocols(httpProtocol)
  ).assertions(
    global.responseTime.max.lt(5000),
    global.successfulRequests.percent.gt(95)
  )
}

Advanced: E-commerce Simulation

EcommerceSimulation.scala:

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class EcommerceSimulation extends Simulation {
  
  val httpProtocol = http
    .baseUrl("https://example.com")
    .acceptHeader("application/json")
  
  // Feeders (test data)
  val userFeeder = csv("users.csv").random
  val productFeeder = csv("products.csv").circular
  
  // Scenarios
  val browse = scenario("Browse Products")
    .exec(
      http("Homepage")
        .get("/")
        .check(status.is(200))
    )
    .pause(2)
    .exec(
      http("Product List")
        .get("/products")
        .check(status.is(200))
        .check(jsonPath("$.products[*].id").findAll.saveAs("productIds"))
    )
    .pause(3)
  
  val purchase = scenario("Purchase Flow")
    .feed(userFeeder)
    .feed(productFeeder)
    .exec(
      http("Login")
        .post("/api/auth/login")
        .body(StringBody("""{"email":"${email}","password":"${password}"}"""))
        .check(status.is(200))
        .check(jsonPath("$.token").saveAs("authToken"))
    )
    .pause(1)
    .exec(
      http("Add to Cart")
        .post("/api/cart")
        .header("Authorization", "Bearer ${authToken}")
        .body(StringBody("""{"productId":"${productId}","quantity":1}"""))
        .check(status.is(200))
    )
    .pause(2)
    .exec(
      http("Checkout")
        .post("/api/checkout")
        .header("Authorization", "Bearer ${authToken}")
        .body(StringBody("""{"paymentMethod":"card"}"""))
        .check(status.is(200))
    )
  
  // Load injection
  setUp(
    browse.inject(
      rampUsers(200).during(5.minutes),
      constantUsersPerSec(20).during(10.minutes)
    ),
    purchase.inject(
      rampUsers(50).during(5.minutes),
      constantUsersPerSec(5).during(10.minutes)
    )
  ).protocols(httpProtocol)
   .assertions(
     global.responseTime.percentile3.lt(1000),
     global.successfulRequests.percent.gt(99)
   )
}

🔍 Real-World Performance Scenarios

Scenario 1: E-commerce Black Friday

Requirement:

  • Normal: 5,000 concurrent users
  • Black Friday: 100,000 concurrent users
  • 0:00 AM spike expected

Test Strategy:

// K6 Black Friday Simulation
export const options = {
    stages: [
        // Pre-midnight: normal traffic
        { duration: '30m', target: 5000 },
        
        // Midnight spike!
        { duration: '2m', target: 100000 },
        
        // Hold peak
        { duration: '1h', target: 100000 },
        
        // Gradual decrease
        { duration: '30m', target: 50000 },
        { duration: '1h', target: 20000 },
        { duration: '2h', target: 5000 },
    ],
    
    thresholds: {
        'http_req_duration{critical:yes}': ['p(99)<1000'], // Critical pages
        'http_req_duration{critical:no}': ['p(99)<5000'],  // Non-critical
        'http_req_failed': ['rate<0.1'],                   // 0.1% error rate OK
    },
};

Scenario 2: API Rate Limiting

Requirement:

  • API limit: 1000 requests/minute per user
  • Need to verify rate limiting works
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
    scenarios: {
        // Normal usage - should succeed
        normal: {
            executor: 'constant-arrival-rate',
            rate: 900,                    // 900 req/min (under limit)
            timeUnit: '1m',
            duration: '5m',
            preAllocatedVUs: 50,
            exec: 'normalRequest',
        },
        
        // Excessive usage - should be rate limited
        excessive: {
            executor: 'constant-arrival-rate',
            rate: 1500,                   // 1500 req/min (over limit!)
            timeUnit: '1m',
            duration: '5m',
            preAllocatedVUs: 100,
            exec: 'excessiveRequest',
        },
    },
};

export function normalRequest() {
    const res = http.get('https://api.example.com/data');
    check(res, {
        'normal request succeeds': (r) => r.status === 200,
    });
}

export function excessiveRequest() {
    const res = http.get('https://api.example.com/data');
    check(res, {
        'excessive request rate limited': (r) => r.status === 429,
        'has retry-after header': (r) => r.headers['Retry-After'] !== undefined,
    });
}

Scenario 3: Database Connection Pool

Problem: App has connection pool of 100 connections

Test: Find when pool exhaustion happens

export const options = {
    scenarios: {
        database_intensive: {
            executor: 'ramping-arrival-rate',
            startRate: 10,
            timeUnit: '1s',
            preAllocatedVUs: 200,
            maxVUs: 500,
            stages: [
                { duration: '5m', target: 50 },   // Slowly increase
                { duration: '5m', target: 100 },  // Around pool size
                { duration: '5m', target: 150 },  // Over pool size
                { duration: '5m', target: 200 },  // Way over
            ],
        },
    },
    
    thresholds: {
        'http_req_duration': ['p(95)<3000'],
        'http_req_failed': ['rate<0.05'],
    },
};

export default function() {
    // Database-heavy endpoint
    const res = http.post('https://api.example.com/reports/generate');
    
    check(res, {
        'no connection timeout': (r) => r.status !== 504,
        'no pool exhaustion error': (r) => !r.body.includes('connection pool'),
    });
    
    sleep(1);
}

📈 Analyzing Results & Finding Bottlenecks

Reading Performance Test Reports

Key sections to analyze:

  1. Summary Statistics
Requests: 10,000
Duration: 5 minutes
Success Rate: 98.5%
Avg Response Time: 450ms
p95 Response Time: 850ms
p99 Response Time: 2,150ms
Max Response Time: 5,200ms

Interpretation:

  • ✅ 98.5% success rate (good, > 99% ideal)
  • ✅ Average 450ms (acceptable for most APIs)
  • ⚠️ p99 2.15s (1.5% of users wait > 2s)
  • ❌ Max 5.2s (investigate!)
  1. Response Time Distribution
0-100ms:    15% (1,500 requests)
100-300ms:  45% (4,500 requests)
300-500ms:  25% (2,500 requests)
500-1000ms: 10% (1,000 requests)
1000-2000ms: 3% (300 requests)
2000ms+:     2% (200 requests) ← Investigate!
  1. Error Distribution
200 OK:        9,850 (98.5%)
400 Bad Request:  50 (0.5%)
429 Rate Limit:   75 (0.75%)
500 Server Error: 20 (0.2%) ← Critical!
504 Timeout:       5 (0.05%) ← Critical!

Common Bottleneck Patterns

Pattern 1: Gradual Performance Degradation

Symptom:

Minute 1: p95 = 300ms
Minute 2: p95 = 350ms
Minute 3: p95 = 450ms
Minute 4: p95 = 650ms
Minute 5: p95 = 950ms

Likely Causes:

  • Memory leak
  • Connection pool not releasing
  • Cache not clearing
  • Database query plans degrading

How to Diagnose:

# Monitor memory usage
kubectl top pods --namespace=production

# Check database connections
SELECT count(*) FROM pg_stat_activity;

# Monitor garbage collection
jstat -gc <pid> 1000

Pattern 2: Sudden Spike in Errors

Symptom:

Minute 1-3: 0% errors
Minute 4: 15% errors (500 Internal Server Error)
Minute 5: 35% errors

Likely Causes:

  • Thread pool exhausted
  • Database connection pool full
  • Memory out of bounds
  • Circuit breaker opened

How to Diagnose:

// Add detailed error logging in test
import { check } from 'k6';

export default function() {
    const res = http.get('https://api.example.com/products');
    
    if (res.status >= 400) {
        console.error(`Error at ${Date.now()}: ${res.status} - ${res.body}`);
    }
    
    check(res, {
        'status is 200': (r) => r.status === 200,
    });
}

Pattern 3: Bi-modal Response Time

Symptom:

80% of requests: 100-200ms
20% of requests: 5000-6000ms

No gradual distribution!

Likely Causes:

  • Cache hit vs cache miss
  • Read replica lag
  • Cold start (serverless)
  • DNS issues

How to Diagnose:

import http from 'k6/http';
import { check } from 'k6';

export default function() {
    const start = Date.now();
    const res = http.get('https://api.example.com/products/123');
    const duration = Date.now() - start;
    
    if (duration > 1000) {
        console.log(`Slow request: ${duration}ms`);
        console.log(`Headers: ${JSON.stringify(res.headers)}`);
        console.log(`Cache: ${res.headers['X-Cache-Status']}`);
    }
}

Tools for Bottleneck Investigation

1. Application Performance Monitoring (APM)

New Relic:

// Instrument your code
const newrelic = require('newrelic');

app.get('/api/products', async (req, res) => {
    // Custom transaction
    newrelic.startWebTransaction('GET /api/products', async () => {
        // Track database query
        newrelic.startSegment('database', true, async () => {
            const products = await db.query('SELECT * FROM products');
        });
        
        // Track external API call
        newrelic.startSegment('external-api', true, async () => {
            await fetch('https://external-api.com/data');
        });
        
        res.json(products);
    });
});

DataDog:

const tracer = require('dd-trace').init();

app.get('/api/products', async (req, res) => {
    const span = tracer.startSpan('get.products');
    
    try {
        const products = await getProducts();
        span.setTag('product.count', products.length);
        res.json(products);
    } catch (error) {
        span.setTag('error', true);
        throw error;
    } finally {
        span.finish();
    }
});

2. Database Query Analysis

PostgreSQL:

-- Enable query logging
ALTER DATABASE mydb SET log_min_duration_statement = 1000; -- Log queries > 1s

-- Find slow queries
SELECT 
    query,
    calls,
    total_time,
    mean_time,
    max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- Analyze specific query
EXPLAIN ANALYZE
SELECT * FROM products WHERE category = 'laptops';

MySQL:

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

-- Analyze slow queries
SELECT * FROM mysql.slow_log
ORDER BY query_time DESC
LIMIT 10;

3. Profiling Code

Node.js - Clinic.js:

# Install
npm install -g clinic

# Profile your app
clinic doctor -- node app.js

# Generate flame graph
clinic flame -- node app.js

# Analyze event loop
clinic bubbleprof -- node app.js

Python - cProfile:

import cProfile
import pstats

# Profile function
profiler = cProfile.Profile()
profiler.enable()

# Your code here
run_performance_tests()

profiler.disable()

# Print stats
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20)  # Top 20 slowest functions

🔄 Performance Testing in CI/CD

Integration Strategy

When to run performance tests:

  1. On Pull Request - Smoke performance tests
  2. Nightly - Full load tests
  3. Before Production Deploy - Stress tests
  4. Scheduled - Soak tests (weekly)

GitHub Actions Example

.github/workflows/performance-tests.yml:

name: Performance Tests

on:
  pull_request:
    branches: [ main ]
  schedule:
    - cron: '0 2 * * *'  # Every night at 2 AM
  workflow_dispatch:

jobs:
  smoke-test:
    runs-on: ubuntu-latest
    if: github.event_name == 'pull_request'
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Install K6
        run: |
          sudo gpg -k
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update
          sudo apt-get install k6
      
      - name: Run smoke test
        run: k6 run --out json=results.json tests/smoke-test.js
      
      - name: Check thresholds
        run: |
          if grep -q '"thresholds".*"failed":true' results.json; then
            echo "❌ Performance thresholds failed!"
            exit 1
          fi
      
      - name: Comment PR
        uses: actions/github-script@v6
        with:
          script: |
            const fs = require('fs');
            const results = JSON.parse(fs.readFileSync('results.json'));
            
            const comment = `
            ## 🚀 Performance Test Results
            
            **Requests:** ${results.metrics.http_reqs.values.count}
            **Success Rate:** ${(results.metrics.http_req_failed.values.rate * 100).toFixed(2)}%
            **Avg Response Time:** ${results.metrics.http_req_duration.values.avg.toFixed(0)}ms
            **p95:** ${results.metrics.http_req_duration.values['p(95)'].toFixed(0)}ms
            `;
            
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: comment
            });
  
  load-test:
    runs-on: ubuntu-latest
    if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
    
    steps:
      - uses: actions/checkout@v3
      
      - name: Install K6
        run: |
          sudo gpg -k
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update
          sudo apt-get install k6
      
      - name: Run full load test
        run: k6 run --out json=results.json tests/load-test.js
      
      - name: Upload to S3 (historical data)
        run: |
          aws s3 cp results.json s3://perf-test-results/$(date +%Y-%m-%d)/results.json
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
      
      - name: Send Slack notification
        if: always()
        run: |
          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \
            -H 'Content-Type: application/json' \
            -d '{
              "text": "Performance test completed",
              "attachments": [{
                "color": "${{ job.status == 'success' && 'good' || 'danger' }}",
                "fields": [
                  {"title": "Status", "value": "${{ job.status }}", "short": true},
                  {"title": "Branch", "value": "${{ github.ref }}", "short": true}
                ]
              }]
            }'

Performance Gates

Example thresholds:

// tests/thresholds.js
export const options = {
    thresholds: {
        // Response time thresholds
        'http_req_duration': [
            'p(95)<500',    // 95% under 500ms
            'p(99)<1000',   // 99% under 1s
            'max<5000',     // No request over 5s
        ],
        
        // Success rate
        'http_req_failed': [
            'rate<0.01',    // <1% errors
        ],
        
        // Throughput
        'http_reqs': [
            'count>10000',  // Must complete 10k requests
            'rate>100',     // >100 req/s
        ],
        
        // Custom business metrics
        'checkout_duration': [
            'p(95)<3000',   // Checkout under 3s
        ],
        'login_success_rate': [
            'rate>0.99',    // 99% successful logins
        ],
    },
};

Fail build if thresholds not met:

- name: Run K6 and check thresholds
  run: |
    k6 run tests/load-test.js
    EXIT_CODE=$?
    if [ $EXIT_CODE -ne 0 ]; then
      echo "❌ Performance thresholds failed!"
      exit 1
    fi

✅ Best Practices

1. Start Small, Scale Up

Wrong approach:

// Don't do this!
export const options = {
    vus: 10000,          // Too much too fast
    duration: '10s',
};

Right approach:

// Do this!
export const options = {
    stages: [
        { duration: '2m', target: 10 },    // Start small
        { duration: '5m', target: 50 },    // Gradual increase
        { duration: '10m', target: 100 },  // Target load
        { duration: '5m', target: 200 },   // Push it
        { duration: '5m', target: 0 },     // Ramp down
    ],
};

2. Use Realistic Think Time

Wrong:

export default function() {
    http.get('/api/products');
    http.post('/api/cart', {...});
    http.post('/api/checkout', {...});
    // No pauses - unrealistic!
}

Right:

export default function() {
    http.get('/api/products');
    sleep(randomBetween(2, 5));  // User reads products
    
    http.post('/api/cart', {...});
    sleep(randomBetween(1, 3));  // User reviews cart
    
    http.post('/api/checkout', {...});
    sleep(randomBetween(5, 10)); // User fills checkout form
}

function randomBetween(min, max) {
    return Math.random() * (max - min) + min;
}

3. Monitor System Resources

While running tests, monitor:

# CPU usage
top

# Memory
free -h

# Disk I/O
iostat -x 1

# Network
iftop

# Application logs
tail -f /var/log/application.log

# Database connections
watch 'psql -c "SELECT count(*) FROM pg_stat_activity;"'

4. Test in Production-Like Environment

Checklist:

  • Same server specs (CPU, RAM, disk)
  • Same network latency
  • Same database size
  • Same external dependencies
  • Same load balancer configuration
  • Same CDN/caching layer

5. Isolate Tests

Don’t:

// Don't mix different test types
export default function() {
    http.get('/api/products');      // Light operation
    http.post('/api/heavy-report'); // Heavy operation
    // Results will be confusing!
}

Do:

// Separate scenarios
export const options = {
    scenarios: {
        light_operations: {
            exec: 'lightOps',
            executor: 'constant-vus',
            vus: 100,
        },
        heavy_operations: {
            exec: 'heavyOps',
            executor: 'constant-vus',
            vus: 10,  // Fewer VUs for heavy ops
        },
    },
};

export function lightOps() {
    http.get('/api/products');
}

export function heavyOps() {
    http.post('/api/heavy-report');
}

6. Version Control Your Tests

performance-tests/
├── scenarios/
│   ├── smoke-test.js
│   ├── load-test.js
│   ├── stress-test.js
│   └── soak-test.js
├── helpers/
│   ├── auth.js
│   ├── utils.js
│   └── custom-metrics.js
├── data/
│   ├── users.csv
│   └── products.csv
├── config/
│   ├── staging.js
│   └── production.js
└── README.md

7. Document Baseline Performance

Create performance baseline document:

# Performance Baseline - v2.5.0

## Test Environment
- Date: 2024-01-15
- Server: AWS EC2 t3.large (2 vCPU, 8GB RAM)
- Database: RDS PostgreSQL db.t3.medium
- Load Balancer: Application Load Balancer

## Results

### API Endpoints
| Endpoint | p50 | p95 | p99 | Max | Success Rate |
|----------|-----|-----|-----|-----|--------------|
| GET /products | 120ms | 250ms | 450ms | 890ms | 99.95% |
| POST /checkout | 340ms | 780ms | 1200ms | 2100ms | 99.80% |
| POST /search | 210ms | 450ms | 890ms | 1500ms | 99.90% |

### System Metrics
- CPU Average: 45%
- CPU Peak: 78%
- Memory Average: 60%
- Memory Peak: 82%
- Disk I/O: < 50% capacity
- Network: < 30% capacity

### Load Handled
- Concurrent Users: 1000
- Requests per Second: 450
- Duration: 30 minutes
- Total Requests: 810,000

🎓 Learning Resources

📺 YouTube Channels

Performance Testing:

  1. Software Testing Mentor

    • JMeter tutorials
    • Comprehensive coverage
  2. Raghav Pal - Automation Step by Step

    • Performance testing playlist
    • JMeter, K6, Gatling
  3. K6 Official Channel

    • Modern load testing
    • Best practices

💻 Online Courses

JMeter:

  1. “Apache JMeter: Performance Testing Basics” - Udemy

    • 💰 ~$13
    • Beginner friendly
  2. “JMeter - Step by Step for Beginners” - Udemy

    • 💰 ~$14
    • Hands-on projects

K6:

  1. “Performance Testing with K6” - Test Automation University
    • 🆓 FREE!
    • Modern approach
    • Highly recommended

General Performance:

  1. “Web Performance Testing and Optimization” - LinkedIn Learning

    • 1 month free trial
    • Comprehensive coverage
  2. “Performance Testing Foundations” - LinkedIn Learning

    • Concepts and principles

📚 Books

1. “The Art of Application Performance Testing” - Ian Molyneaux

  • Comprehensive guide
  • Industry standard

2. “Performance Testing with JMeter 3” - Bayo Erinle

  • Hands-on approach
  • Real examples

3. “Web Performance in Action” - Jeremy Wagner

  • Frontend performance
  • Optimization techniques

🛠️ Tools & Platforms

Cloud Load Testing:

  1. BlazeMeter

    • Cloud-based JMeter
    • Free tier available
  2. K6 Cloud

    • Cloud K6 execution
    • 50 free tests/month
  3. Gatling Enterprise

    • Enterprise Gatling

Monitoring & APM:

  1. New Relic

    • Free tier
  2. DataDog

    • 14-day free trial
  3. Grafana + Prometheus

    • Open-source
    • Free forever

✅ Performance Testing Checklist

Before Testing:

  • Define success criteria (SLA, SLO)
  • Identify critical user journeys
  • Prepare test data
  • Set up monitoring
  • Coordinate with team (don’t surprise production!)
  • Back up test results location

During Testing:

  • Monitor system resources (CPU, memory, disk)
  • Watch for errors in logs
  • Check database connection pool
  • Monitor response times live
  • Take notes of anomalies

After Testing:

  • Analyze results thoroughly
  • Compare with baseline
  • Identify bottlenecks
  • Document findings
  • Share report with team
  • Create action items
  • Schedule follow-up tests

❓ Interview Questions & Answers

Question 1: “What is the difference between Load Testing and Stress Testing?”

Good Answer:

“Load Testing verifies the system performs well under expected load conditions - like testing if our e-commerce site handles 10,000 concurrent users during normal operations. Stress Testing pushes the system beyond its limits to find the breaking point - we keep increasing load until the system fails, then observe how it degrades and recovers. Load testing is about validation; stress testing is about finding limits.”

Question 2: “What metrics do you monitor during performance testing?”

Good Answer:

“I focus on five key metrics: Response Time (especially p95 and p99 percentiles, not just average), Throughput (requests per second), Error Rate (should be under 1%), Resource Utilization (CPU, memory, disk I/O), and Concurrent Users. I also track custom business metrics like checkout completion time or search latency.”

Question 3: “How would you integrate performance tests into CI/CD?”

Good Answer:

“I use a tiered approach: Quick smoke tests on every PR to catch obvious regressions, nightly load tests with realistic scenarios, weekly soak tests for memory leaks. I set up automated thresholds - if p95 response time exceeds 500ms or error rate goes above 1%, the pipeline fails. Results are posted to Slack and stored for historical comparison.”

Question 4: “You notice response times gradually increasing during a load test. What would you investigate?”

Good Answer:

“Gradual degradation usually indicates resource leaks. I’d check: memory usage for leaks, database connection pool for connections not being released, query execution times for degrading plans, and application logs for warnings. I’d also verify caching is working - maybe cache is filling up or eviction is broken.”

Question 5: “What tools have you used for performance testing?”

Good Answer:

“I primarily use K6 for its JavaScript syntax and CI/CD friendliness. For complex scenarios, I use JMeter with its extensive protocol support. For monitoring, I integrate with Grafana and Prometheus. I’ve also used Gatling for projects requiring detailed HTML reports. The choice depends on team expertise and CI/CD requirements.”


🎯 Key Takeaways

Performance Testing Types Summary

TypePurposeDurationLoad Pattern
LoadValidate expected load10-60 minSteady
StressFind breaking point30-60 minIncreasing
SpikeHandle sudden surges5-15 minSudden peaks
SoakFind memory leaks4-24 hoursConstant

Tool Comparison

ToolBest ForLearning CurveCI/CD Ready
JMeterComplex protocolsMedium⚠️ Needs CLI
K6Modern web appsEasy✅ Native
GatlingHigh performanceHard✅ Native

Quick Reference

Response Time Targets:

  • API GET: < 200ms (p95)
  • API POST: < 500ms (p95)
  • Page Load: < 2 seconds
  • Search: < 1 second

Error Rate Targets:

  • Production: < 0.1%
  • Critical APIs: < 0.01%

📍 What’s Next?

Tomorrow - THE FINALE!

🍎 Article 8: Landing Your Dream Job at Apple

  • Analyzing Apple job requirements
  • Resume optimization for FAANG
  • GitHub portfolio guide
  • Interview process (4 rounds)
  • Salary negotiation ($140K-$200K+)
  • First 90 days at Apple

This is the culmination of everything we’ve learned!


💡 Final Advice

Performance testing is not a one-time activity:

Week 1: Baseline tests
Week 2-4: Development
Week 4: Performance regression check
Week 5: Load test
Week 6: Stress test
Week 8: Soak test
Production: Continuous monitoring

Make it part of your development cycle, not an afterthought!


Was this article helpful? 👏

Questions? Feel free to ask in the comments!


Author: AAnnayev — Senior SDET

Tags: #PerformanceTesting #LoadTesting #JMeter #K6 #Gatling #QA #SDET