DNS Filter Architecture
This document describes the architecture of CodexDNS’s filter system, including current optimizations and future scalability options.
Current Architecture
In-Memory Storage (Industry Standard)
CodexDNS uses an in-memory hashmap approach for filter rule storage, which is the same architecture used by industry leaders like AdGuard and Pi-hole. This approach provides:
- ~65ns lookup time for exact domain matches (O(1) hashmap lookup)
- Zero allocations after cache warm-up
- Sub-millisecond latency for all filter checks
Data Structures
┌─────────────────────────────────────────────────────────────┐
│ FilterService │
├─────────────────────────────────────────────────────────────┤
│ domainRules map[string]*compiledRule │ O(1) exact │
│ wildcardTrie *SuffixTrie │ O(log n) suffix│
│ regexRules []*compiledRegexRule │ Individual │
│ batchedRegex []*BatchedRegex │ Batched regex │
│ stringPool *StringPool │ Memory savings │
└─────────────────────────────────────────────────────────────┘
Phase 1 Optimizations (Implemented)
1. Cache Statistics Endpoint
Endpoint: GET /api/filters/cache/stats
Returns detailed cache metrics:
{
"domainRulesCount": 150000,
"wildcardRulesCount": 5000,
"regexRulesCount": 100,
"totalCachedRules": 155100,
"enabledListsCount": 5,
"estimatedMemoryMB": 25.5,
"wildcardTrieSize": 5000,
"batchedRegexCount": 2,
"batchedPatternsTotal": 100,
"loadDurationMs": 1250,
"rulesPerSecond": 124080
}
Endpoint: POST /api/filters/cache/reload
Forces cache reload and returns updated stats.
2. Lazy Loading
The cache is loaded on-demand when the first DNS query arrives:
func (s *FilterService) CheckDomain(domain string, clientIP string) *FilterResult {
// Lazy load cache if not loaded
if !s.IsCacheLoaded() {
if err := s.EnsureCacheLoaded(); err != nil {
// Handle error gracefully
}
}
// ... continue with filter check
}
Benefits:
- Faster application startup
- Memory not allocated until needed
- Graceful degradation on load errors
3. String Interning
Reduces memory duplication for common strings:
type StringPool struct {
mu sync.RWMutex
strings map[string]string
}
func (p *StringPool) Intern(s string) string {
// Returns shared reference to existing string
// or creates new entry if not found
}
Expected memory savings: 15-25% for large rule sets with common domain suffixes.
Phase 2 Optimizations (Implemented)
1. Suffix Trie for Wildcard Matching
DNS wildcard patterns like *.example.com are now matched using a suffix trie instead of linear scan:
Domain: ads.tracking.example.com
Lookup path: com → example → tracking → ads → *
root
│
┌───┴───┐
com org
│
example
│
* ──────► matches *.example.com
Performance improvement: O(n) → O(log n) for wildcard matching
2. Batched Regex Compilation
Multiple regex patterns from the same filter list are combined using alternation:
Before: 50 separate regex.MatchString() calls
After: 1 combined regex.MatchString() call
Combined pattern: (pattern1|pattern2|...|pattern50)
Rules per batch: 50 patterns max to avoid regex explosion Benefit: Reduces regex engine overhead by ~40% for lists with many regex rules
Phase 3 Future Scalability Options
When to Consider Redis
Redis is NOT recommended for single-instance deployments because:
- Network latency: ~1ms per call vs ~65ns for in-memory
- Additional infrastructure complexity
- No significant memory savings (just moves data elsewhere)
Redis IS recommended when:
- Running multiple CodexDNS instances that need shared filter state
- Kubernetes/microservices deployment with horizontal scaling
- Filter rules updated frequently from external sources
Implementation Plan for Redis (If Needed)
type DistributedFilterService struct {
local *FilterService // Fast local cache
redis *redis.Client // Shared state
ttl time.Duration // Local cache TTL
}
func (d *DistributedFilterService) CheckDomain(domain string) *FilterResult {
// Check local cache first (fast path)
if result := d.local.CheckDomain(domain); result != nil {
return result
}
// Check Redis for recently added rules (slow path)
return d.checkRedis(domain)
}
Memory-Mapped Files for Very Large Rule Sets (>10M rules)
For deployments with >10 million rules where memory is constrained:
type MmapFilterStore struct {
file *os.File
data []byte // Memory-mapped region
index map[string]int // Domain → offset in mmap
}
Benefits:
- OS handles paging, only hot rules in memory
- Persistent across restarts
- Can handle billions of rules
Trade-offs:
- More complex implementation
- Slightly slower lookups (~100ns vs ~65ns)
Rule Deduplication
Many filter lists contain overlapping rules. Deduplication can reduce memory:
type DeduplicatedRuleStore struct {
rules map[uint64]*Rule // Hash → unique rule
index map[string]uint64 // Domain → hash
}
Expected savings: 30-50% memory reduction for multiple overlapping lists.
Memory Usage Estimates
| Rules Count | Estimated Memory | Load Time |
|---|---|---|
| 100,000 | ~15-25 MB | ~100ms |
| 500,000 | ~75-125 MB | ~500ms |
| 1,000,000 | ~150-250 MB | ~1s |
| 5,000,000 | ~750 MB - 1.2 GB | ~5s |
| 10,000,000+ | Consider mmap | ~10s |
Performance Benchmarks
| Operation | Time (ns/op) | Allocations |
|---|---|---|
| Exact domain lookup | ~65 | 0 |
| Wildcard trie match | ~200 | 0 |
| Batched regex match | ~500 | 0 |
| Individual regex match | ~800 | 0 |
API Reference
Get Cache Statistics
GET /api/filters/cache/stats
Authorization: Required
Response: CacheStats JSON
Reload Cache
POST /api/filters/cache/reload
Authorization: Required
Response: { "message": "Cache reloaded successfully", "stats": CacheStats }
Check Domain
POST /api/filters/check
Authorization: Required
Body: { "domain": "example.com", "client_ip": "192.168.1.100" }
Response: { "blocked": true, "reason": "Wildcard pattern matched (trie)" }
Configuration
No additional configuration is needed for the optimizations. They are enabled by default.
For Redis integration (future), add to config.json:
{
"filter": {
"distributed": true,
"redis_url": "redis://localhost:6379",
"local_cache_ttl": "5m"
}
}