Session Replay Architecture: How Modern Session Recording Systems Work
Understanding the complete architecture of session replay services—from client-side data collection through encryption, queuing, storage, and playback. A technical deep dive into how modern session recording systems are built.
TL;DR
Session replay systems use a multi-stage architecture: client-side SDKs capture user interactions, JavaScript loggers collect DOM and network data, Web Workers process data off the main thread, backend services encrypt data, message queues buffer traffic, and cloud storage (S3) persists recordings. Playback reverses this flow with decryption and streaming.
LogNroll Team
Engineering & Architecture
Introduction
Session replay technology has become essential for understanding user behavior, debugging issues, and improving product experiences. But how do these systems actually work under the hood? What happens from the moment a user interacts with your website to when you watch their session replay?
This article breaks down the complete architecture of modern session replay services, explaining each component, data flow, and design decision that makes reliable, scalable session recording possible.
High-Level Architecture Overview
Modern session replay systems follow a three-stage architecture pattern:
Stage 1: Data Collection Pipeline
WebPage → NPM Module/JS File → JS Logger → WebWorker
User interactions are captured on the client, processed by a logger, and sent to a Web Worker for background processing.
Stage 2: Data Processing & Storage Pipeline
WebWorker → Backend (Encrypting) → Queue → S3
Processed data is encrypted by backend services, queued for reliability, and stored in cloud storage.
Stage 3: Playback Pipeline
User → Web Admin (Player) → Backend (Decrypting) → S3
When viewing a session, data is retrieved from storage, decrypted by the backend, and streamed to the player interface.
Architecture Components
NPM Module / JS File
Lightweight client-side SDK that initializes session recording
JS Logger
Captures DOM mutations, user interactions, and network requests
WebWorker
Offloads processing to background thread, prevents UI blocking
Backend (Encrypting)
Encrypts session data before storage for security and compliance
Queue
Buffers and manages data flow, handles high-volume traffic
S3 Storage
Scalable object storage for encrypted session recordings
Stage 1: Data Collection Pipeline
1. WebPage Integration
The journey begins when a developer integrates a session replay SDK into their web application. This can be done via:
NPM Package Installation
npm install @lognroll/lib
// In your application
import LognRoll from '@lognroll/lib';
LognRoll.initSession('YOUR_APP_ID', {
// Configuration options
});Direct Script Tag
<script src="https://logger.lognroll.com/logger.lnr.1.0.1.js"></script>
<script>
LognRoll.initSession('YOUR_APP_ID', {
// Configuration options
});
</script>2. NPM Module / JS File
The SDK is a lightweight JavaScript module (typically 50-100KB gzipped) that provides:
- Initialization API: Simple methods to start session recording with configuration options
- Configuration Management: Privacy settings, sanitization rules, and feature toggles
- Logger Initialization: Sets up the JavaScript logger that captures user interactions
Key Design Decision: The SDK is intentionally lightweight to minimize impact on page load times and user experience. All heavy processing happens in Web Workers.
3. JavaScript Logger
The JavaScript logger is the core data collection engine. It uses browser APIs to capture:
DOM Mutations
Uses MutationObserver to track changes to the DOM tree, capturing element additions, removals, and attribute modifications.
User Interactions
Captures mouse movements, clicks, keyboard input, scroll events, and form submissions through event listeners.
Network Requests
Intercepts fetch and XMLHttpRequest to record API calls, responses, and timing data.
Console Logs
Monitors console.log, console.error, and other console methods for debugging context.
Example: Logger Implementation
class Logger {
private domObserver: DOMObserver;
private networkTracker: NetworkTracker;
private mouseTracker: MouseTracker;
private keyboardTracker: KeyboardTracker;
constructor() {
// Initialize DOM mutation observer
this.domObserver = new DOMObserver(this.api);
// Track network requests
this.networkTracker = new NetworkTracker(this.api);
// Track user interactions
this.mouseTracker = new MouseTracker(this.api);
this.keyboardTracker = new KeyboardTracker(this.api);
}
// Captures DOM changes
observeDOM() {
const observer = new MutationObserver((mutations) => {
mutations.forEach((mutation) => {
this.api.sendEvent('DOM_MUTATION', {
type: mutation.type,
target: mutation.target,
// ... sanitized data
});
});
});
observer.observe(document.body, {
childList: true,
attributes: true,
subtree: true
});
}
}4. Web Worker
Web Workers are critical for performance. They run in a separate thread, preventing the main UI thread from being blocked by data processing tasks.
Why Web Workers?
- • Non-blocking: Data serialization and batching happen off the main thread
- • Performance: UI remains responsive even during heavy data collection
- • Efficiency: Can process and batch multiple events before sending
- • Isolation: Worker errors don't crash the main application
Worker Responsibilities
- • Receives events from the logger via
postMessage - • Batches events into chunks (typically 50-100 events per batch)
- • Serializes data using Protocol Buffers or JSON
- • Sends batched data to backend via HTTP POST
- • Handles retries and error recovery
Example: Web Worker Implementation
// worker.ts
const chunks: LogEvent[][] = [];
self.onmessage = (event) => {
if (event.data.type === 'log') {
const logEvent = event.data.data;
chunks.push(logEvent);
// Batch events and send when threshold reached
if (chunks.length >= BATCH_SIZE) {
sendBatch(chunks.splice(0, BATCH_SIZE));
}
}
};
async function sendBatch(batch: LogEvent[]) {
const logPoints = new proto.LogPoints();
logPoints.setItemsList(batch);
const buffer = logPoints.serializeBinary();
await fetch(`${config.baseUrl}/lognroll/${config.companyId}/${config.sid}`, {
method: 'POST',
headers: { 'Content-Type': 'application/x-protobuf' },
body: buffer
});
}Stage 2: Data Processing & Storage Pipeline
5. Backend (Encrypting)
When data arrives at the backend, it goes through several critical processing steps:
Encryption
All session data is encrypted before storage to ensure compliance with privacy regulations (GDPR, CCPA, HIPAA) and protect sensitive user information.
- • Uses AES-256 encryption for data at rest
- • Each session has a unique encryption key
- • Keys are managed separately from encrypted data
- • Supports field-level encryption for sensitive data
Data Validation & Sanitization
Backend services validate incoming data and apply additional sanitization rules:
- • Validates data structure and format
- • Applies server-side sanitization rules
- • Removes or redacts sensitive patterns (SSNs, credit cards, etc.)
- • Enforces data retention policies
Session Management
Backend maintains session metadata and state:
- • Creates unique session IDs
- • Tracks session start/end times
- • Stores user metadata (IP, user agent, etc.)
- • Manages session lifecycle
Example: Backend Encryption Service
// Backend encryption service
class EncryptionService {
async encryptSessionData(sessionData: SessionData): Promise<EncryptedData> {
// Generate or retrieve session-specific encryption key
const encryptionKey = await this.getEncryptionKey(sessionData.sessionId);
// Encrypt the session data
const encrypted = await crypto.encrypt(
JSON.stringify(sessionData),
encryptionKey,
'AES-256-GCM'
);
return {
encryptedData: encrypted.ciphertext,
iv: encrypted.iv,
tag: encrypted.tag,
sessionId: sessionData.sessionId
};
}
async getEncryptionKey(sessionId: string): Promise<Buffer> {
// Retrieve or generate encryption key from key management service
// Keys are stored separately from encrypted data
return await keyManagementService.getKey(sessionId);
}
}6. Message Queue
Message queues are essential for handling high-volume traffic and ensuring data reliability:
Why Use a Queue?
- • Traffic Spikes: Buffers data during traffic surges
- • Reliability: Ensures no data loss if storage is temporarily unavailable
- • Decoupling: Separates data ingestion from storage processing
- • Scalability: Allows horizontal scaling of storage workers
NATS / RabbitMQ
Popular message queue systems used for session replay data pipelines
Kafka
High-throughput distributed streaming platform for large-scale deployments
AWS SQS
Managed queue service for cloud-native architectures
Queue Message Structure
{
"sessionId": "abc123",
"companyId": "company_xyz",
"encryptedData": "base64_encoded_encrypted_data",
"metadata": {
"timestamp": "2026-01-13T10:30:00Z",
"eventCount": 150,
"dataSize": 102400
},
"retryCount": 0
}7. S3 Storage
Cloud object storage (like AWS S3, Google Cloud Storage, or DigitalOcean Spaces) is used for persistent storage:
Scalability
Handles petabytes of data with automatic scaling, no capacity planning needed
Durability
99.999999999% (11 nines) durability, with automatic replication across multiple availability zones
Cost-Effective
Pay only for storage used, with lifecycle policies for automatic archival
Access Control
Fine-grained access controls, encryption at rest, and audit logging
Storage Structure
s3://lognroll-sessions/
├── company_xyz/
│ ├── 2026/
│ │ ├── 01/
│ │ │ ├── 13/
│ │ │ │ ├── session_abc123_encrypted.bin
│ │ │ │ ├── session_def456_encrypted.bin
│ │ │ │ └── ...
│ │ │ └── ...
│ │ └── ...
│ └── ...
└── ...Stage 3: Playback Pipeline
8. Web Admin (Player)
When a user wants to view a session replay, the process reverses:
Player Interface Features
- • Timeline Navigation: Jump to specific moments in the session
- • Event Inspector: View individual DOM mutations, clicks, and network requests
- • Search & Filter: Find sessions by user, URL, error type, or custom events
- • Performance Metrics: View page load times, network request timing, and resource usage
- • Privacy Controls: Mask sensitive data during playback
9. Backend (Decrypting)
When a playback request is made, the backend:
- Retrieves the encrypted session data from S3
- Fetches the decryption key from the key management service
- Decrypts the session data using AES-256-GCM
- Applies any additional privacy filters based on user permissions
- Streams the decrypted data to the player interface
Example: Decryption Service
// Backend decryption service
class DecryptionService {
async getSessionForPlayback(sessionId: string): Promise<SessionData> {
// 1. Retrieve encrypted data from S3
const encryptedData = await s3Service.getObject(
`sessions/${sessionId}/encrypted.bin`
);
// 2. Get decryption key
const decryptionKey = await keyManagementService.getKey(sessionId);
// 3. Decrypt the data
const decrypted = await crypto.decrypt(
encryptedData.ciphertext,
decryptionKey,
encryptedData.iv,
encryptedData.tag,
'AES-256-GCM'
);
// 4. Parse and return session data
return JSON.parse(decrypted);
}
}Data Flow Diagram
Key Design Decisions
Why Web Workers?
Web Workers prevent UI blocking during data processing. By offloading serialization, batching, and network requests to a background thread, the main application remains responsive even during heavy data collection periods.
Why Encrypt Before Storage?
Encryption at the backend ensures compliance with privacy regulations and protects sensitive user data. Even if storage is compromised, encrypted data remains secure without the decryption keys.
Why Use a Queue?
Message queues provide reliability and scalability. They buffer traffic spikes, ensure no data loss during storage outages, and allow horizontal scaling of storage workers independently from ingestion services.
Why S3 Instead of a Database?
Session replay data is large, append-only, and infrequently accessed. Object storage like S3 is optimized for this use case—it's cost-effective, highly durable, and scales automatically without database maintenance overhead.
Performance Optimizations
Client-Side Optimizations
- • Event Batching: Groups multiple events before sending to reduce HTTP requests
- • Throttling: Limits capture rate for high-frequency events (mouse movements)
- • Debouncing: Delays processing until user activity pauses
- • Compression: Uses Protocol Buffers or gzip compression to reduce payload size
- • Sampling: Records only a percentage of sessions to reduce volume
Backend Optimizations
- • Async Processing: Non-blocking I/O for encryption and storage operations
- • Connection Pooling: Reuses database and storage connections
- • Caching: Caches frequently accessed session metadata
- • Parallel Processing: Processes multiple sessions concurrently
- • CDN Integration: Serves session data from edge locations for faster playback
Security Considerations
Data Encryption
- • All session data encrypted with AES-256-GCM before storage
- • Encryption keys stored separately in a key management service
- • Supports key rotation without data re-encryption
- • Field-level encryption for sensitive data fields
Access Control
- • Role-based access control (RBAC) for session viewing
- • Audit logs for all session access attempts
- • IP whitelisting and geolocation restrictions
- • Session-level permissions and sharing controls
Privacy Compliance
- • GDPR-compliant data handling and deletion
- • Configurable data retention policies
- • Client-side and server-side data sanitization
- • User consent management and opt-out support
Scalability & Reliability
Horizontal Scaling
All components are designed for horizontal scaling:
- • Stateless backend services
- • Load-balanced API endpoints
- • Distributed message queues
- • Multi-region S3 storage
Fault Tolerance
Built-in resilience mechanisms:
- • Automatic retries with exponential backoff
- • Queue-based buffering during outages
- • Redundant storage across availability zones
- • Circuit breakers for external dependencies
Conclusion
Modern session replay systems use a sophisticated multi-stage architecture that balances performance, security, and scalability. By leveraging Web Workers for non-blocking processing, encryption for security, message queues for reliability, and cloud storage for scalability, these systems can handle millions of sessions while maintaining low latency and high availability.
Understanding this architecture helps developers make informed decisions when implementing session replay, optimizing performance, and ensuring compliance with privacy regulations.
Ready to Implement Session Replay?
LogNroll provides a complete session replay solution with all the architectural components discussed in this article. Get started with just a few lines of code.