πŸ”₯ Duppix - Advanced Regex Engine for Dart

Pub Package Dart SDK Version License: MIT

Duppix is a comprehensive regex library that brings Oniguruma-compatible advanced features to Dart, including possessive quantifiers, atomic groups, named backreferences, recursive patterns, and much more.

✨ Why Duppix?

Dart's built-in RegExp is powerful but lacks many advanced features that other regex engines provide. Duppix fills this gap by implementing a hybrid approach:

  • πŸš€ Fast fallback: Simple patterns use Dart's optimized RegExp
  • 🎯 Advanced features: Complex patterns use our custom engine
  • πŸ”„ Full compatibility: Drop-in replacement for RegExp
  • πŸ“š Oniguruma compatible: Supports the same syntax as Ruby, PHP PCRE, and more

πŸ†š Feature Comparison

Feature Dart RegExp Duppix Example
Basic patterns βœ… βœ… \d+, [a-z]*
Named groups βœ… βœ… (?<name>\w+)
Backreferences ⚠️ Limited βœ… \1, \k<name>
Possessive quantifiers ❌ βœ… \d++, .*+
Atomic groups ❌ βœ… (?>...)
Recursive patterns ❌ βœ… (?R), (?0)
Subroutine calls ❌ βœ… (?1), (?&name)
Conditional patterns ❌ βœ… (?(1)yes|no)
Variable lookbehind ❌ βœ… (?<=\w{2,4})
Script runs ❌ βœ… (?script_run:...)

πŸš€ Quick Start

Add Duppix to your pubspec.yaml:

dependencies:
  duppix: ^1.0.0

Basic Usage

import 'package:duppix/duppix.dart';

void main() {
  // Works just like RegExp for simple patterns
  final basic = DuppixRegex(r'\d+');
  print(basic.firstMatch('Hello 123')?.group); // "123"
  
  // But supports advanced features too!
  final advanced = DuppixRegex(r'(?<word>\w+)\s+\k<word>');
  final match = advanced.firstMatch('hello hello world');
  print(match?.namedGroup('word')); // "hello"
}

🎯 Advanced Features

Named Backreferences

// Match repeated words
final regex = DuppixRegex(r'(?<word>\w+)\s+\k<word>');
final match = regex.firstMatch('hello hello world');
print(match?.namedGroup('word')); // "hello"

// Case-insensitive backreferences  
final regex2 = DuppixRegex(r'(?<tag>\w+).*?</\k<tag>>', 
                          options: DUPPIX_OPTION_IGNORECASE);

Possessive Quantifiers (No Backtracking)

// Atomic matching - no backtracking
final greedy = DuppixRegex(r'.*abc');     // Can backtrack
final possessive = DuppixRegex(r'.*+abc'); // Cannot backtrack

// Useful for performance optimization
final efficient = DuppixRegex(r'\d++[a-z]'); // Faster than \d+[a-z]

Atomic Groups

// Prevent backtracking within groups
final atomic = DuppixRegex(r'(?>.*?)end');
final match = atomic.firstMatch('start middle end');

Recursive Patterns

// Match balanced parentheses
final balanced = DuppixRegex(r'\((?:[^()]|(?R))*\)');
final match = balanced.firstMatch('(a(b(c)d)e)');
print(match?.group); // "(a(b(c)d)e)"

// Match nested structures
final nested = DuppixRegex(r'<(\w+)>(?:[^<>]|(?R))*</\1>');

Subroutine Calls

// Define reusable patterns
final regex = DuppixRegex(r'(?<digit>\d)(?<letter>[a-z])(?&digit)(?&letter)');
final match = regex.firstMatch('1a1a');

// Numbered subroutine calls
final numbered = DuppixRegex(r'(\d{2})-(?1)-(?1)'); // Match XX-XX-XX format

Conditional Patterns

// Match based on conditions
final conditional = DuppixRegex(r'(?(<tag>)yes|no)'); 
// Matches "yes" if named group "tag" was captured, "no" otherwise

Advanced Character Classes

// Script runs - ensure single Unicode script
final scriptRun = DuppixRegex(r'(?script_run:\w+)');

// Character class operations
final intersection = DuppixRegex(r'[a-z&&[^aeiou]]'); // Consonants only

πŸ› οΈ Options & Configuration

// Configure regex behavior
final options = DuppixOptions(
  ignoreCase: true,
  multiline: true,
  singleline: false,
  unicode: true,
  findLongest: false,
  debug: false,
);

final regex = DuppixRegex(r'pattern', options: options);

// Or use flags (Oniguruma compatible)
final flagged = DuppixRegex(r'pattern', 
                          options: DUPPIX_OPTION_IGNORECASE | DUPPIX_OPTION_MULTILINE);

πŸ”„ Migration from RegExp

Duppix is designed as a drop-in replacement for RegExp:

// Before (RegExp)
final oldRegex = RegExp(r'\d+');
final oldMatch = oldRegex.firstMatch('123');

// After (Duppix) - same API
final newRegex = DuppixRegex(r'\d+');
final newMatch = newRegex.firstMatch('123');

// All the same methods work
print(newRegex.hasMatch('123'));
print(newRegex.allMatches('1 2 3').length);
print(newRegex.replaceAll('a1b2c', 'X'));

πŸ“Š Performance

Duppix uses a smart hybrid approach:

  • Simple patterns β†’ Dart's optimized RegExp (fastest)
  • Advanced patterns β†’ Custom engine (feature-rich)
  • Automatic detection β†’ No manual configuration needed
// This uses fast RegExp fallback
final simple = DuppixRegex(r'\d+');

// This uses custom engine (detected automatically)  
final advanced = DuppixRegex(r'\d++'); // Possessive quantifier

πŸ§ͺ Testing

Run the comprehensive test suite:

dart test

Tests cover:

  • βœ… All basic RegExp functionality
  • βœ… Advanced Oniguruma features
  • βœ… Performance edge cases
  • βœ… Unicode support
  • βœ… Error handling
  • βœ… Legacy compatibility

πŸ“š Examples

Email Validation with Named Groups

final emailRegex = DuppixRegex(
  r'(?<local>[a-zA-Z0-9._%+-]+)@(?<domain>[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})'
);
final match = emailRegex.firstMatch('user@example.com');
print('Local: ${match?.namedGroup('local')}');   // "user"
print('Domain: ${match?.namedGroup('domain')}'); // "example.com"

URL Path Extraction with Subroutines

final urlRegex = DuppixRegex(
  r'(?<protocol>https?)://(?<domain>(?&subdomain)\.)*(?<tld>\w+)(?<path>/.*)?'
  r'(?<subdomain>\w+)'
);

Balanced Brackets Parser

final brackets = DuppixRegex(r'\{(?:[^{}]|(?R))*\}');
final json = '{"key": {"nested": "value"}}';
print(brackets.firstMatch(json)?.group); // Full JSON object

HTML Tag Matching with Backreferences

final htmlTag = DuppixRegex(r'<(?<tag>\w+)>.*?</\k<tag>>');
final html = '<div>Content</div>';
print(htmlTag.firstMatch(html)?.namedGroup('tag')); // "div"

πŸ”§ Implementation Status

βœ… Completed Features

  • Core regex engine architecture
  • Pattern parser with full Oniguruma syntax
  • Hybrid fallback system
  • Basic quantifiers (*, +, ?, {n,m})
  • Character classes and ranges
  • Named and numbered capture groups
  • Possessive quantifiers (*+, ++, ?+)
  • Atomic groups ((?>...))
  • Lookahead/lookbehind assertions
  • Backreferences (\1, \k
  • Subroutine calls ((?1), (?&name))
  • Recursive patterns ((?R))
  • Conditional patterns framework
  • Comprehensive error handling
  • Full RegExp API compatibility

🚧 In Progress

  • Unicode property support (\p{Letter}, \p{Script=Latin})
  • Anchor improvements (^, $, \b, \B)
  • Performance optimizations
  • Additional Oniguruma features

πŸ“‹ Roadmap

  • Variable-length lookbehind optimization
  • More Unicode features
  • JIT compilation for hot patterns
  • WASM acceleration
  • Additional language support

🀝 Contributing

We welcome contributions! Areas where help is needed:

  1. Unicode Properties - Implement \p{...} classes
  2. Performance - Optimize hot paths
  3. Documentation - More examples and guides
  4. Testing - Edge cases and real-world patterns
  5. Features - Additional Oniguruma compatibility

πŸ“„ License

MIT License - see LICENSE for details.

πŸ™ Acknowledgments

  • Oniguruma - The inspiration and syntax reference
  • Ruby - For pioneering advanced regex features
  • PCRE - For performance insights
  • Dart Team - For the excellent base RegExp implementation

Made with ❀️ for the Dart community

Duppix: Because your patterns deserve more power πŸš€

Libraries

duppix
Duppix - A comprehensive regex library with Oniguruma-compatible features