DNS Filter Architecture

This document describes the architecture of CodexDNS’s filter system, including current optimizations and future scalability options.

Current Architecture

In-Memory Storage (Industry Standard)

CodexDNS uses an in-memory hashmap approach for filter rule storage, which is the same architecture used by industry leaders like AdGuard and Pi-hole. This approach provides:

~65ns lookup time for exact domain matches (O(1) hashmap lookup)
Zero allocations after cache warm-up
Sub-millisecond latency for all filter checks

Data Structures

┌─────────────────────────────────────────────────────────────┐
│                    FilterService                             │
├─────────────────────────────────────────────────────────────┤
│  domainRules    map[string]*compiledRule  │ O(1) exact     │
│  wildcardTrie   *SuffixTrie               │ O(log n) suffix│
│  regexRules     []*compiledRegexRule      │ Individual     │
│  batchedRegex   []*BatchedRegex           │ Batched regex  │
│  stringPool     *StringPool               │ Memory savings │
└─────────────────────────────────────────────────────────────┘

Phase 1 Optimizations (Implemented)

1. Cache Statistics Endpoint

Endpoint: GET /api/filters/cache/stats

Returns detailed cache metrics:

{
  "domainRulesCount": 150000,
  "wildcardRulesCount": 5000,
  "regexRulesCount": 100,
  "totalCachedRules": 155100,
  "enabledListsCount": 5,
  "estimatedMemoryMB": 25.5,
  "wildcardTrieSize": 5000,
  "batchedRegexCount": 2,
  "batchedPatternsTotal": 100,
  "loadDurationMs": 1250,
  "rulesPerSecond": 124080
}

Endpoint: POST /api/filters/cache/reload

Forces cache reload and returns updated stats.

2. Lazy Loading

The cache is loaded on-demand when the first DNS query arrives:

func (s *FilterService) CheckDomain(domain string, clientIP string) *FilterResult {
    // Lazy load cache if not loaded
    if !s.IsCacheLoaded() {
        if err := s.EnsureCacheLoaded(); err != nil {
            // Handle error gracefully
        }
    }
    // ... continue with filter check
}

Benefits:

Faster application startup
Memory not allocated until needed
Graceful degradation on load errors

3. String Interning

Reduces memory duplication for common strings:

type StringPool struct {
    mu      sync.RWMutex
    strings map[string]string
}

func (p *StringPool) Intern(s string) string {
    // Returns shared reference to existing string
    // or creates new entry if not found
}

Expected memory savings: 15-25% for large rule sets with common domain suffixes.

Phase 2 Optimizations (Implemented)

1. Suffix Trie for Wildcard Matching

DNS wildcard patterns like *.example.com are now matched using a suffix trie instead of linear scan:

Domain: ads.tracking.example.com
Lookup path: com → example → tracking → ads → *

             root
              │
          ┌───┴───┐
         com     org
          │
       example
          │
          *  ──────► matches *.example.com

Performance improvement: O(n) → O(log n) for wildcard matching

2. Batched Regex Compilation

Multiple regex patterns from the same filter list are combined using alternation:

Before: 50 separate regex.MatchString() calls
After:  1 combined regex.MatchString() call

Combined pattern: (pattern1|pattern2|...|pattern50)

Rules per batch: 50 patterns max to avoid regex explosion Benefit: Reduces regex engine overhead by ~40% for lists with many regex rules

Phase 3 Future Scalability Options

When to Consider Redis

Redis is NOT recommended for single-instance deployments because:

Network latency: ~1ms per call vs ~65ns for in-memory
Additional infrastructure complexity
No significant memory savings (just moves data elsewhere)

Redis IS recommended when:

Running multiple CodexDNS instances that need shared filter state
Kubernetes/microservices deployment with horizontal scaling
Filter rules updated frequently from external sources

Implementation Plan for Redis (If Needed)

type DistributedFilterService struct {
    local  *FilterService     // Fast local cache
    redis  *redis.Client      // Shared state
    ttl    time.Duration      // Local cache TTL
}

func (d *DistributedFilterService) CheckDomain(domain string) *FilterResult {
    // Check local cache first (fast path)
    if result := d.local.CheckDomain(domain); result != nil {
        return result
    }
    
    // Check Redis for recently added rules (slow path)
    return d.checkRedis(domain)
}

Memory-Mapped Files for Very Large Rule Sets (>10M rules)

For deployments with >10 million rules where memory is constrained:

type MmapFilterStore struct {
    file *os.File
    data []byte           // Memory-mapped region
    index map[string]int  // Domain → offset in mmap
}

Benefits:

OS handles paging, only hot rules in memory
Persistent across restarts
Can handle billions of rules

Trade-offs:

More complex implementation
Slightly slower lookups (~100ns vs ~65ns)

Rule Deduplication

Many filter lists contain overlapping rules. Deduplication can reduce memory:

type DeduplicatedRuleStore struct {
    rules  map[uint64]*Rule  // Hash → unique rule
    index  map[string]uint64 // Domain → hash
}

Expected savings: 30-50% memory reduction for multiple overlapping lists.

Memory Usage Estimates

Rules Count	Estimated Memory	Load Time
100,000	~15-25 MB	~100ms
500,000	~75-125 MB	~500ms
1,000,000	~150-250 MB	~1s
5,000,000	~750 MB - 1.2 GB	~5s
10,000,000+	Consider mmap	~10s

Performance Benchmarks

Operation	Time (ns/op)	Allocations
Exact domain lookup	~65	0
Wildcard trie match	~200	0
Batched regex match	~500	0
Individual regex match	~800	0

API Reference

Get Cache Statistics

GET /api/filters/cache/stats
Authorization: Required

Response: CacheStats JSON

Reload Cache

POST /api/filters/cache/reload
Authorization: Required

Response: { "message": "Cache reloaded successfully", "stats": CacheStats }

Check Domain

POST /api/filters/check
Authorization: Required
Body: { "domain": "example.com", "client_ip": "192.168.1.100" }

Response: { "blocked": true, "reason": "Wildcard pattern matched (trie)" }

Configuration

No additional configuration is needed for the optimizations. They are enabled by default.

For Redis integration (future), add to config.json:

{
  "filter": {
    "distributed": true,
    "redis_url": "redis://localhost:6379",
    "local_cache_ttl": "5m"
  }
}

Safe Search Enforcement