From Chaos to Control: Engineering an OTA Operations Dashboard
Building centralized operations management for online travel agencies that handle 400+ properties with sub-second response times
Online travel agencies face a brutal operational reality: critical data scattered across Airbnb's API, Booking.com's extranet, Stripe dashboards, custom booking systems, Gmail, and QuickBooks. For a mid-sized OTA managing 400+ properties across Europe, this fragmentation was costing hours daily and creating dangerous gaps in availability management.
The technical challenge: aggregate real-time data from 7 different sources, provide actionable insights, automate workflows, and maintain sub-second response times. This post breaks down the architecture, integration patterns, and hard-won lessons from building this system.
System Architecture
High-Level Design
┌─────────────────────────────────────────────────┐
│ User Interface Layer │
│ (React SPA + Native Mobile Apps) │
└─────────────┬───────────────────────────────────┘
│
┌─────────────▼───────────────────────────────────┐
│ API Gateway Layer │
│ (GraphQL + REST, Auth, Rate Limiting) │
└─────────────┬───────────────────────────────────┘
│
┌─────────────▼───────────────────────────────────┐
│ Business Logic / Service Layer │
│ • Booking Service │
│ • Inventory Service │
│ • Analytics Service │
│ • Communication Service │
│ • Financial Service │
└─────────────┬───────────────────────────────────┘
│
┌─────────────▼───────────────────────────────────┐
│ Integration / Adapter Layer │
│ • Channel Manager Adapters │
│ • Payment Gateway Adapters │
│ • PMS Integrations │
└─────────────┬───────────────────────────────────┘
│
┌─────────────▼───────────────────────────────────┐
│ Data Layer │
│ • PostgreSQL (transactional) │
│ • Redis (caching, real-time state) │
│ • Elasticsearch (search, analytics) │
│ • S3 (documents, images) │
└─────────────────────────────────────────────────┘
Technology Stack
Frontend: React 18 with TypeScript, TanStack Query for server state, Recharts for visualisation, Tailwind CSS, React Native for mobile
Backend: Node.js with Express, GraphQL (Apollo Server), PostgreSQL, Redis for caching/pub-sub, Elasticsearch for search and analytics
Infrastructure: AWS (ECS, RDS, ElastiCache, S3), GitHub Actions for CI/CD, Datadog for monitoring, Terraform for IaC
Three Critical Integration Challenges
Challenge 1: Inconsistent External APIs
Every booking platform speaks a different language. Airbnb uses OAuth 2.0 with RESTful JSON and webhooks (50 requests/min limit). Booking.com uses legacy XML-RPC, SOAP for some endpoints, and custom authentication with limited real-time capabilities.
Solution: Adapter Pattern with Unified Interface
// Unified booking interface
interface BookingAdapter {
authenticate(): Promise<AuthToken>;
fetchBookings(startDate: Date, endDate: Date): Promise<Booking[]>;
updateAvailability(propertyId: string, availability: Availability[]): Promise<void>;
handleWebhook(payload: unknown): Promise<BookingEvent>;
}
// Airbnb adapter implementation
class AirbnbAdapter implements BookingAdapter {
private client: AirbnbAPIClient;
async fetchBookings(startDate: Date, endDate: Date): Promise<Booking[]> {
const response = await this.client.get('/reservations', {
params: {
start_date: startDate.toISOString(),
end_date: endDate.toISOString()
}
});
return response.data.reservations.map(this.transformAirbnbBooking);
}
private transformAirbnbBooking(airbnbData: AirbnbReservation): Booking {
return {
id: airbnbData.confirmation_code,
source: 'airbnb',
guestName: airbnbData.guest.name,
checkIn: new Date(airbnbData.start_date),
checkOut: new Date(airbnbData.end_date),
// ... other unified fields
};
}
}
// Booking.com adapter handles XML-RPC complexity internally
class BookingDotComAdapter implements BookingAdapter {
async fetchBookings(startDate: Date, endDate: Date): Promise<Booking[]> {
const xmlPayload = this.buildXMLRequest(startDate, endDate);
const response = await this.client.post('/xml', xmlPayload);
const parsed = await this.parseXMLResponse(response.data);
return parsed.map(this.transformBookingDotComBooking);
}
}
This isolates external API complexity and provides a consistent interface for the service layer.
Challenge 2: Real-Time Synchronisation
When a booking happens on Airbnb at 10:03 AM, availability must update across all channels within seconds to prevent double-bookings.
Solution: Event-Driven Architecture with Redis Pub/Sub
// Event publisher
class BookingEventPublisher {
private redis: Redis;
async publishBookingCreated(booking: Booking): Promise<void> {
const event = {
type: 'BOOKING_CREATED',
timestamp: Date.now(),
data: booking
};
await this.redis.publish('booking-events', JSON.stringify(event));
}
}
// Event consumer for availability sync
class AvailabilitySyncConsumer {
private adapters: Map<string, BookingAdapter>;
async handleBookingCreated(event: BookingEvent): Promise<void> {
const { propertyId, checkIn, checkOut } = event.data;
// Update availability across all channels except the source
const updatePromises = Array.from(this.adapters.entries())
.filter(([channel]) => channel !== event.data.source)
.map(([channel, adapter]) =>
adapter.updateAvailability(propertyId, [{
startDate: checkIn,
endDate: checkOut,
available: false
}])
);
await Promise.allSettled(updatePromises);
}
}
Challenge 3: Performance at Scale
With 400 properties and 3,000 bookings monthly, naive data fetching creates bottlenecks.
Solution: Multi-Layer Caching Strategy
class BookingService {
private db: PostgreSQL;
private cache: Redis;
private elasticsearch: ElasticsearchClient;
async getBookings(filters: BookingFilters): Promise<Booking[]> {
const cacheKey = this.generateCacheKey(filters);
// L1: Redis cache check
const cached = await this.cache.get(cacheKey);
if (cached) return JSON.parse(cached);
// L2: Elasticsearch for complex queries
if (this.isComplexQuery(filters)) {
const results = await this.elasticsearch.search({
index: 'bookings',
body: this.buildElasticsearchQuery(filters)
});
const bookings = results.hits.hits.map(hit => hit._source);
await this.cache.setex(cacheKey, 300, JSON.stringify(bookings));
return bookings;
}
// L3: Database for simple queries
const bookings = await this.db.query(
'SELECT * FROM bookings WHERE ...',
filters
);
await this.cache.setex(cacheKey, 300, JSON.stringify(bookings));
return bookings;
}
}
Performance Results: Average API response: 120ms, 95th percentile: 350ms, cache hit rate: 76%, database query reduction: 68%
Analytics Implementation
Real-Time Metrics Dashboard
interface AnalyticsQuery {
metric: 'revenue' | 'occupancy' | 'adr' | 'revpar';
groupBy: 'day' | 'week' | 'month' | 'property' | 'channel';
startDate: Date;
endDate: Date;
filters?: Record<string, unknown>;
}
class AnalyticsService {
async calculateMetrics(query: AnalyticsQuery): Promise<MetricResult[]> {
// Leverage Elasticsearch aggregations for fast analytics
const response = await this.elasticsearch.search({
index: 'bookings',
body: {
query: this.buildFilterQuery(query),
aggs: {
grouped_metrics: {
date_histogram: {
field: 'check_in_date',
calendar_interval: query.groupBy
},
aggs: {
total_revenue: { sum: { field: 'total_price' } },
booking_count: { value_count: { field: 'id' } },
unique_properties: { cardinality: { field: 'property_id' } }
}
}
}
}
});
return this.transformAggregationResults(response.aggregations);
}
}
Predictive Analytics
Lightweight ML pipeline for demand forecasting:
from sklearn.ensemble import RandomForestRegressor
import pandas as pd
class DemandPredictor:
def __init__(self):
self.model = RandomForestRegressor(n_estimators=100)
def prepare_features(self, bookings_df):
"""Engineer features from booking data"""
bookings_df['day_of_week'] = pd.to_datetime(bookings_df['check_in']).dt.dayofweek
bookings_df['month'] = pd.to_datetime(bookings_df['check_in']).dt.month
bookings_df['days_until_checkin'] = (
pd.to_datetime(bookings_df['check_in']) -
pd.to_datetime(bookings_df['booked_at'])
).dt.days
return bookings_df[[
'day_of_week', 'month', 'days_until_checkin', 'property_type'
]]
def predict_occupancy(self, property_id, date_range):
"""Predict occupancy rates for given date range"""
features = self.prepare_features_for_prediction(property_id, date_range)
return self.model.predict(features)
Security & Privacy
API Key Management
class SecretsManager {
private aws: AWS.SecretsManager;
private cache: Map<string, CachedSecret>;
async getAPIKey(platform: string): Promise<string> {
// Check cache first
const cached = this.cache.get(platform);
if (cached && !this.isExpired(cached)) {
return cached.value;
}
// Fetch from AWS Secrets Manager
const response = await this.aws.getSecretValue({
SecretId: `${process.env.STAGE}/booking-platforms/${platform}`
}).promise();
const secret = JSON.parse(response.SecretString);
this.cache.set(platform, {
value: secret.apiKey,
expiresAt: Date.now() + 3600000 // 1 hour
});
return secret.apiKey;
}
}
GDPR Compliance
class DataPrivacyService {
async anonymizeGuestData(bookingId: string): Promise<void> {
await this.db.transaction(async (trx) => {
// Anonymize PII
await trx('bookings')
.where({ id: bookingId })
.update({
guest_name: 'ANONYMIZED',
guest_email: 'anonymized@privacy.local',
guest_phone: null,
anonymized_at: new Date()
});
// Audit trail
await trx('privacy_logs').insert({
action: 'ANONYMIZE',
booking_id: bookingId,
timestamp: new Date()
});
});
}
}
Deployment & Monitoring
CI/CD Pipeline
name: Deploy to Production
on:
push:
branches: [main]
jobs:
test-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Tests
run: |
npm install
npm run test:unit
npm run test:integration
- name: Build Docker Image
run: docker build -t ota-dashboard:${{ github.sha }} .
- name: Push to ECR
run: |
aws ecr get-login-password | docker login --username AWS --password-stdin
docker push ota-dashboard:${{ github.sha }}
- name: Deploy to ECS
run: |
aws ecs update-service \
--cluster production \
--service ota-dashboard \
--force-new-deployment
Custom Metrics
import { StatsD } from 'hot-shots';
class MetricsCollector {
private statsd: StatsD;
recordBookingSync(channel: string, success: boolean, duration: number): void {
this.statsd.timing('booking.sync.duration', duration, {
channel,
success: success.toString()
});
this.statsd.increment('booking.sync.count', 1, {
channel,
result: success ? 'success' : 'failure'
});
}
}
Four Critical Lessons
1. Start Simple, Scale Gradually
Our initial architecture was over-engineered with microservices for every component. We consolidated to a modular monolith first, then extracted services as bottlenecks emerged. This saved months of unnecessary complexity.
2. External API Reliability Is Your Problem
Third-party APIs will fail. Build retry logic, circuit breakers, and graceful degradation:
class ResilientAPIClient {
async callWithRetry<T>(
fn: () => Promise<T>,
maxRetries: number = 3
): Promise<T> {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
if (i === maxRetries - 1) throw error;
await this.exponentialBackoff(i);
}
}
throw new Error('Max retries exceeded');
}
private async exponentialBackoff(attempt: number): Promise<void> {
const delay = Math.pow(2, attempt) * 1000;
await new Promise(resolve => setTimeout(resolve, delay));
}
}
3. Cache Invalidation Is Hard
We spent weeks debugging cache inconsistencies. The solution: aggressive TTLs and cache versioning. Don't try to be clever with cache invalidation logic early on.
4. Test Edge Cases Rigorously
The strangest bugs came from bookings spanning daylight saving time changes, multi-night stays with partial availability, currency conversion for international bookings, and time zone mismatches between systems. Build comprehensive test suites for these scenarios.
Production Performance
After 6 months in production:
System uptime: 99.94%
Average API response time: 127ms
Peak requests/second handled: 450
Data synchronisation latency: <2 seconds
Database query time (p95): 45ms
Future Roadmap
GraphQL Federation - Split monolithic schema into domain-specific subgraphs for better team autonomy
Real-time Collaboration - WebSocket support for multi-user dashboard updates
Advanced ML Models - Deep learning for dynamic pricing optimisation
Mobile-First Redesign - Progressive Web App with offline capability
Conclusion
Building an operations dashboard for OTAs is fundamentally an integration problem disguised as a UI problem. The real challenges: inconsistent APIs, real-time synchronisation requirements, performance at scale, and complex business logic.
This architecture handles 400+ properties reliably, but the principles scale to thousands. Focus on solid integration patterns, aggressive caching, event-driven synchronisation, and observability from day one.
Questions about implementation details? Drop them in the comments or reach out directly.