Architecture
System Design
Session Replay

Session Replay Architecture: How Modern Session Recording Systems Work

Understanding the complete architecture of session replay services—from client-side data collection through encryption, queuing, storage, and playback. A technical deep dive into how modern session recording systems are built.

TL;DR

Session replay systems use a multi-stage architecture: client-side SDKs capture user interactions, JavaScript loggers collect DOM and network data, Web Workers process data off the main thread, backend services encrypt data, message queues buffer traffic, and cloud storage (S3) persists recordings. Playback reverses this flow with decryption and streaming.

12 min read

LogNroll Team

Engineering & Architecture

Introduction

Session replay technology has become essential for understanding user behavior, debugging issues, and improving product experiences. But how do these systems actually work under the hood? What happens from the moment a user interacts with your website to when you watch their session replay?

This article breaks down the complete architecture of modern session replay services, explaining each component, data flow, and design decision that makes reliable, scalable session recording possible.

High-Level Architecture Overview

Modern session replay systems follow a three-stage architecture pattern:

Stage 1: Data Collection Pipeline

WebPage → NPM Module/JS File → JS Logger → WebWorker

User interactions are captured on the client, processed by a logger, and sent to a Web Worker for background processing.

Stage 2: Data Processing & Storage Pipeline

WebWorker → Backend (Encrypting) → Queue → S3

Processed data is encrypted by backend services, queued for reliability, and stored in cloud storage.

Stage 3: Playback Pipeline

User → Web Admin (Player) → Backend (Decrypting) → S3

When viewing a session, data is retrieved from storage, decrypted by the backend, and streamed to the player interface.

Architecture Components

NPM Module / JS File

Entry Point

Lightweight client-side SDK that initializes session recording

JS Logger

Data Collection

Captures DOM mutations, user interactions, and network requests

WebWorker

Background Processing

Offloads processing to background thread, prevents UI blocking

Backend (Encrypting)

Security Layer

Encrypts session data before storage for security and compliance

Queue

Message Broker

Buffers and manages data flow, handles high-volume traffic

S3 Storage

Data Persistence

Scalable object storage for encrypted session recordings

Stage 1: Data Collection Pipeline

1. WebPage Integration

The journey begins when a developer integrates a session replay SDK into their web application. This can be done via:

NPM Package Installation

npm install @lognroll/lib

// In your application
import LognRoll from '@lognroll/lib';

LognRoll.initSession('YOUR_APP_ID', {
  // Configuration options
});

Direct Script Tag

<script src="https://logger.lognroll.com/logger.lnr.1.0.1.js"></script>
<script>
  LognRoll.initSession('YOUR_APP_ID', {
    // Configuration options
  });
</script>

2. NPM Module / JS File

The SDK is a lightweight JavaScript module (typically 50-100KB gzipped) that provides:

  • Initialization API: Simple methods to start session recording with configuration options
  • Configuration Management: Privacy settings, sanitization rules, and feature toggles
  • Logger Initialization: Sets up the JavaScript logger that captures user interactions

Key Design Decision: The SDK is intentionally lightweight to minimize impact on page load times and user experience. All heavy processing happens in Web Workers.

3. JavaScript Logger

The JavaScript logger is the core data collection engine. It uses browser APIs to capture:

DOM Mutations

Uses MutationObserver to track changes to the DOM tree, capturing element additions, removals, and attribute modifications.

User Interactions

Captures mouse movements, clicks, keyboard input, scroll events, and form submissions through event listeners.

Network Requests

Intercepts fetch and XMLHttpRequest to record API calls, responses, and timing data.

Console Logs

Monitors console.log, console.error, and other console methods for debugging context.

Example: Logger Implementation

class Logger {
  private domObserver: DOMObserver;
  private networkTracker: NetworkTracker;
  private mouseTracker: MouseTracker;
  private keyboardTracker: KeyboardTracker;

  constructor() {
    // Initialize DOM mutation observer
    this.domObserver = new DOMObserver(this.api);
    
    // Track network requests
    this.networkTracker = new NetworkTracker(this.api);
    
    // Track user interactions
    this.mouseTracker = new MouseTracker(this.api);
    this.keyboardTracker = new KeyboardTracker(this.api);
  }

  // Captures DOM changes
  observeDOM() {
    const observer = new MutationObserver((mutations) => {
      mutations.forEach((mutation) => {
        this.api.sendEvent('DOM_MUTATION', {
          type: mutation.type,
          target: mutation.target,
          // ... sanitized data
        });
      });
    });
    
    observer.observe(document.body, {
      childList: true,
      attributes: true,
      subtree: true
    });
  }
}

4. Web Worker

Web Workers are critical for performance. They run in a separate thread, preventing the main UI thread from being blocked by data processing tasks.

Why Web Workers?

  • Non-blocking: Data serialization and batching happen off the main thread
  • Performance: UI remains responsive even during heavy data collection
  • Efficiency: Can process and batch multiple events before sending
  • Isolation: Worker errors don't crash the main application

Worker Responsibilities

  • • Receives events from the logger via postMessage
  • • Batches events into chunks (typically 50-100 events per batch)
  • • Serializes data using Protocol Buffers or JSON
  • • Sends batched data to backend via HTTP POST
  • • Handles retries and error recovery

Example: Web Worker Implementation

// worker.ts
const chunks: LogEvent[][] = [];

self.onmessage = (event) => {
  if (event.data.type === 'log') {
    const logEvent = event.data.data;
    chunks.push(logEvent);
    
    // Batch events and send when threshold reached
    if (chunks.length >= BATCH_SIZE) {
      sendBatch(chunks.splice(0, BATCH_SIZE));
    }
  }
};

async function sendBatch(batch: LogEvent[]) {
  const logPoints = new proto.LogPoints();
  logPoints.setItemsList(batch);
  const buffer = logPoints.serializeBinary();
  
  await fetch(`${config.baseUrl}/lognroll/${config.companyId}/${config.sid}`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/x-protobuf' },
    body: buffer
  });
}

Stage 2: Data Processing & Storage Pipeline

5. Backend (Encrypting)

When data arrives at the backend, it goes through several critical processing steps:

Encryption

All session data is encrypted before storage to ensure compliance with privacy regulations (GDPR, CCPA, HIPAA) and protect sensitive user information.

  • • Uses AES-256 encryption for data at rest
  • • Each session has a unique encryption key
  • • Keys are managed separately from encrypted data
  • • Supports field-level encryption for sensitive data

Data Validation & Sanitization

Backend services validate incoming data and apply additional sanitization rules:

  • • Validates data structure and format
  • • Applies server-side sanitization rules
  • • Removes or redacts sensitive patterns (SSNs, credit cards, etc.)
  • • Enforces data retention policies

Session Management

Backend maintains session metadata and state:

  • • Creates unique session IDs
  • • Tracks session start/end times
  • • Stores user metadata (IP, user agent, etc.)
  • • Manages session lifecycle

Example: Backend Encryption Service

// Backend encryption service
class EncryptionService {
  async encryptSessionData(sessionData: SessionData): Promise<EncryptedData> {
    // Generate or retrieve session-specific encryption key
    const encryptionKey = await this.getEncryptionKey(sessionData.sessionId);
    
    // Encrypt the session data
    const encrypted = await crypto.encrypt(
      JSON.stringify(sessionData),
      encryptionKey,
      'AES-256-GCM'
    );
    
    return {
      encryptedData: encrypted.ciphertext,
      iv: encrypted.iv,
      tag: encrypted.tag,
      sessionId: sessionData.sessionId
    };
  }
  
  async getEncryptionKey(sessionId: string): Promise<Buffer> {
    // Retrieve or generate encryption key from key management service
    // Keys are stored separately from encrypted data
    return await keyManagementService.getKey(sessionId);
  }
}

6. Message Queue

Message queues are essential for handling high-volume traffic and ensuring data reliability:

Why Use a Queue?

  • Traffic Spikes: Buffers data during traffic surges
  • Reliability: Ensures no data loss if storage is temporarily unavailable
  • Decoupling: Separates data ingestion from storage processing
  • Scalability: Allows horizontal scaling of storage workers

NATS / RabbitMQ

Popular message queue systems used for session replay data pipelines

Kafka

High-throughput distributed streaming platform for large-scale deployments

AWS SQS

Managed queue service for cloud-native architectures

Queue Message Structure

{
  "sessionId": "abc123",
  "companyId": "company_xyz",
  "encryptedData": "base64_encoded_encrypted_data",
  "metadata": {
    "timestamp": "2026-01-13T10:30:00Z",
    "eventCount": 150,
    "dataSize": 102400
  },
  "retryCount": 0
}

7. S3 Storage

Cloud object storage (like AWS S3, Google Cloud Storage, or DigitalOcean Spaces) is used for persistent storage:

Scalability

Handles petabytes of data with automatic scaling, no capacity planning needed

Durability

99.999999999% (11 nines) durability, with automatic replication across multiple availability zones

Cost-Effective

Pay only for storage used, with lifecycle policies for automatic archival

Access Control

Fine-grained access controls, encryption at rest, and audit logging

Storage Structure

s3://lognroll-sessions/
  ├── company_xyz/
  │   ├── 2026/
  │   │   ├── 01/
  │   │   │   ├── 13/
  │   │   │   │   ├── session_abc123_encrypted.bin
  │   │   │   │   ├── session_def456_encrypted.bin
  │   │   │   │   └── ...
  │   │   │   └── ...
  │   │   └── ...
  │   └── ...
  └── ...

Stage 3: Playback Pipeline

8. Web Admin (Player)

When a user wants to view a session replay, the process reverses:

Player Interface Features

  • Timeline Navigation: Jump to specific moments in the session
  • Event Inspector: View individual DOM mutations, clicks, and network requests
  • Search & Filter: Find sessions by user, URL, error type, or custom events
  • Performance Metrics: View page load times, network request timing, and resource usage
  • Privacy Controls: Mask sensitive data during playback

9. Backend (Decrypting)

When a playback request is made, the backend:

  1. Retrieves the encrypted session data from S3
  2. Fetches the decryption key from the key management service
  3. Decrypts the session data using AES-256-GCM
  4. Applies any additional privacy filters based on user permissions
  5. Streams the decrypted data to the player interface

Example: Decryption Service

// Backend decryption service
class DecryptionService {
  async getSessionForPlayback(sessionId: string): Promise<SessionData> {
    // 1. Retrieve encrypted data from S3
    const encryptedData = await s3Service.getObject(
      `sessions/${sessionId}/encrypted.bin`
    );
    
    // 2. Get decryption key
    const decryptionKey = await keyManagementService.getKey(sessionId);
    
    // 3. Decrypt the data
    const decrypted = await crypto.decrypt(
      encryptedData.ciphertext,
      decryptionKey,
      encryptedData.iv,
      encryptedData.tag,
      'AES-256-GCM'
    );
    
    // 4. Parse and return session data
    return JSON.parse(decrypted);
  }
}

Data Flow Diagram

WebPage
NPM Module
JS Logger
WebWorker
WebWorker
Backend (Encrypt)
Queue
S3
User
Web Admin
Backend (Decrypt)
S3

Key Design Decisions

Why Web Workers?

Web Workers prevent UI blocking during data processing. By offloading serialization, batching, and network requests to a background thread, the main application remains responsive even during heavy data collection periods.

Why Encrypt Before Storage?

Encryption at the backend ensures compliance with privacy regulations and protects sensitive user data. Even if storage is compromised, encrypted data remains secure without the decryption keys.

Why Use a Queue?

Message queues provide reliability and scalability. They buffer traffic spikes, ensure no data loss during storage outages, and allow horizontal scaling of storage workers independently from ingestion services.

Why S3 Instead of a Database?

Session replay data is large, append-only, and infrequently accessed. Object storage like S3 is optimized for this use case—it's cost-effective, highly durable, and scales automatically without database maintenance overhead.

Performance Optimizations

Client-Side Optimizations

  • Event Batching: Groups multiple events before sending to reduce HTTP requests
  • Throttling: Limits capture rate for high-frequency events (mouse movements)
  • Debouncing: Delays processing until user activity pauses
  • Compression: Uses Protocol Buffers or gzip compression to reduce payload size
  • Sampling: Records only a percentage of sessions to reduce volume

Backend Optimizations

  • Async Processing: Non-blocking I/O for encryption and storage operations
  • Connection Pooling: Reuses database and storage connections
  • Caching: Caches frequently accessed session metadata
  • Parallel Processing: Processes multiple sessions concurrently
  • CDN Integration: Serves session data from edge locations for faster playback

Security Considerations

Data Encryption

  • • All session data encrypted with AES-256-GCM before storage
  • • Encryption keys stored separately in a key management service
  • • Supports key rotation without data re-encryption
  • • Field-level encryption for sensitive data fields

Access Control

  • • Role-based access control (RBAC) for session viewing
  • • Audit logs for all session access attempts
  • • IP whitelisting and geolocation restrictions
  • • Session-level permissions and sharing controls

Privacy Compliance

  • • GDPR-compliant data handling and deletion
  • • Configurable data retention policies
  • • Client-side and server-side data sanitization
  • • User consent management and opt-out support

Scalability & Reliability

Horizontal Scaling

All components are designed for horizontal scaling:

  • • Stateless backend services
  • • Load-balanced API endpoints
  • • Distributed message queues
  • • Multi-region S3 storage

Fault Tolerance

Built-in resilience mechanisms:

  • • Automatic retries with exponential backoff
  • • Queue-based buffering during outages
  • • Redundant storage across availability zones
  • • Circuit breakers for external dependencies

Conclusion

Modern session replay systems use a sophisticated multi-stage architecture that balances performance, security, and scalability. By leveraging Web Workers for non-blocking processing, encryption for security, message queues for reliability, and cloud storage for scalability, these systems can handle millions of sessions while maintaining low latency and high availability.

Understanding this architecture helps developers make informed decisions when implementing session replay, optimizing performance, and ensuring compliance with privacy regulations.

Ready to Implement Session Replay?

LogNroll provides a complete session replay solution with all the architectural components discussed in this article. Get started with just a few lines of code.