π₯ Duppix - Advanced Regex Engine for Dart
Duppix is a comprehensive regex library that brings Oniguruma-compatible advanced features to Dart, including possessive quantifiers, atomic groups, named backreferences, recursive patterns, and much more.
β¨ Why Duppix?
Dart's built-in RegExp
is powerful but lacks many advanced features that other regex engines provide. Duppix fills this gap by implementing a hybrid approach:
- π Fast fallback: Simple patterns use Dart's optimized
RegExp
- π― Advanced features: Complex patterns use our custom engine
- π Full compatibility: Drop-in replacement for
RegExp
- π Oniguruma compatible: Supports the same syntax as Ruby, PHP PCRE, and more
π Feature Comparison
Feature | Dart RegExp | Duppix | Example |
---|---|---|---|
Basic patterns | β | β | \d+ , [a-z]* |
Named groups | β | β | (?<name>\w+) |
Backreferences | β οΈ Limited | β | \1 , \k<name> |
Possessive quantifiers | β | β | \d++ , .*+ |
Atomic groups | β | β | (?>...) |
Recursive patterns | β | β | (?R) , (?0) |
Subroutine calls | β | β | (?1) , (?&name) |
Conditional patterns | β | β | (?(1)yes|no) |
Variable lookbehind | β | β | (?<=\w{2,4}) |
Script runs | β | β | (?script_run:...) |
π Quick Start
Add Duppix to your pubspec.yaml
:
dependencies:
duppix: ^1.0.0
Basic Usage
import 'package:duppix/duppix.dart';
void main() {
// Works just like RegExp for simple patterns
final basic = DuppixRegex(r'\d+');
print(basic.firstMatch('Hello 123')?.group); // "123"
// But supports advanced features too!
final advanced = DuppixRegex(r'(?<word>\w+)\s+\k<word>');
final match = advanced.firstMatch('hello hello world');
print(match?.namedGroup('word')); // "hello"
}
π― Advanced Features
Named Backreferences
// Match repeated words
final regex = DuppixRegex(r'(?<word>\w+)\s+\k<word>');
final match = regex.firstMatch('hello hello world');
print(match?.namedGroup('word')); // "hello"
// Case-insensitive backreferences
final regex2 = DuppixRegex(r'(?<tag>\w+).*?</\k<tag>>',
options: DUPPIX_OPTION_IGNORECASE);
Possessive Quantifiers (No Backtracking)
// Atomic matching - no backtracking
final greedy = DuppixRegex(r'.*abc'); // Can backtrack
final possessive = DuppixRegex(r'.*+abc'); // Cannot backtrack
// Useful for performance optimization
final efficient = DuppixRegex(r'\d++[a-z]'); // Faster than \d+[a-z]
Atomic Groups
// Prevent backtracking within groups
final atomic = DuppixRegex(r'(?>.*?)end');
final match = atomic.firstMatch('start middle end');
Recursive Patterns
// Match balanced parentheses
final balanced = DuppixRegex(r'\((?:[^()]|(?R))*\)');
final match = balanced.firstMatch('(a(b(c)d)e)');
print(match?.group); // "(a(b(c)d)e)"
// Match nested structures
final nested = DuppixRegex(r'<(\w+)>(?:[^<>]|(?R))*</\1>');
Subroutine Calls
// Define reusable patterns
final regex = DuppixRegex(r'(?<digit>\d)(?<letter>[a-z])(?&digit)(?&letter)');
final match = regex.firstMatch('1a1a');
// Numbered subroutine calls
final numbered = DuppixRegex(r'(\d{2})-(?1)-(?1)'); // Match XX-XX-XX format
Conditional Patterns
// Match based on conditions
final conditional = DuppixRegex(r'(?(<tag>)yes|no)');
// Matches "yes" if named group "tag" was captured, "no" otherwise
Advanced Character Classes
// Script runs - ensure single Unicode script
final scriptRun = DuppixRegex(r'(?script_run:\w+)');
// Character class operations
final intersection = DuppixRegex(r'[a-z&&[^aeiou]]'); // Consonants only
π οΈ Options & Configuration
// Configure regex behavior
final options = DuppixOptions(
ignoreCase: true,
multiline: true,
singleline: false,
unicode: true,
findLongest: false,
debug: false,
);
final regex = DuppixRegex(r'pattern', options: options);
// Or use flags (Oniguruma compatible)
final flagged = DuppixRegex(r'pattern',
options: DUPPIX_OPTION_IGNORECASE | DUPPIX_OPTION_MULTILINE);
π Migration from RegExp
Duppix is designed as a drop-in replacement for RegExp
:
// Before (RegExp)
final oldRegex = RegExp(r'\d+');
final oldMatch = oldRegex.firstMatch('123');
// After (Duppix) - same API
final newRegex = DuppixRegex(r'\d+');
final newMatch = newRegex.firstMatch('123');
// All the same methods work
print(newRegex.hasMatch('123'));
print(newRegex.allMatches('1 2 3').length);
print(newRegex.replaceAll('a1b2c', 'X'));
π Performance
Duppix uses a smart hybrid approach:
- Simple patterns β Dart's optimized
RegExp
(fastest) - Advanced patterns β Custom engine (feature-rich)
- Automatic detection β No manual configuration needed
// This uses fast RegExp fallback
final simple = DuppixRegex(r'\d+');
// This uses custom engine (detected automatically)
final advanced = DuppixRegex(r'\d++'); // Possessive quantifier
π§ͺ Testing
Run the comprehensive test suite:
dart test
Tests cover:
- β All basic RegExp functionality
- β Advanced Oniguruma features
- β Performance edge cases
- β Unicode support
- β Error handling
- β Legacy compatibility
π Examples
Email Validation with Named Groups
final emailRegex = DuppixRegex(
r'(?<local>[a-zA-Z0-9._%+-]+)@(?<domain>[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})'
);
final match = emailRegex.firstMatch('user@example.com');
print('Local: ${match?.namedGroup('local')}'); // "user"
print('Domain: ${match?.namedGroup('domain')}'); // "example.com"
URL Path Extraction with Subroutines
final urlRegex = DuppixRegex(
r'(?<protocol>https?)://(?<domain>(?&subdomain)\.)*(?<tld>\w+)(?<path>/.*)?'
r'(?<subdomain>\w+)'
);
Balanced Brackets Parser
final brackets = DuppixRegex(r'\{(?:[^{}]|(?R))*\}');
final json = '{"key": {"nested": "value"}}';
print(brackets.firstMatch(json)?.group); // Full JSON object
HTML Tag Matching with Backreferences
final htmlTag = DuppixRegex(r'<(?<tag>\w+)>.*?</\k<tag>>');
final html = '<div>Content</div>';
print(htmlTag.firstMatch(html)?.namedGroup('tag')); // "div"
π§ Implementation Status
β Completed Features
- Core regex engine architecture
- Pattern parser with full Oniguruma syntax
- Hybrid fallback system
- Basic quantifiers (*, +, ?, {n,m})
- Character classes and ranges
- Named and numbered capture groups
- Possessive quantifiers (*+, ++, ?+)
- Atomic groups ((?>...))
- Lookahead/lookbehind assertions
- Backreferences (\1, \k
- Subroutine calls ((?1), (?&name))
- Recursive patterns ((?R))
- Conditional patterns framework
- Comprehensive error handling
- Full RegExp API compatibility
π§ In Progress
- Unicode property support (\p{Letter}, \p{Script=Latin})
- Anchor improvements (^, $, \b, \B)
- Performance optimizations
- Additional Oniguruma features
π Roadmap
- Variable-length lookbehind optimization
- More Unicode features
- JIT compilation for hot patterns
- WASM acceleration
- Additional language support
π€ Contributing
We welcome contributions! Areas where help is needed:
- Unicode Properties - Implement \p{...} classes
- Performance - Optimize hot paths
- Documentation - More examples and guides
- Testing - Edge cases and real-world patterns
- Features - Additional Oniguruma compatibility
π License
MIT License - see LICENSE for details.
π Acknowledgments
- Oniguruma - The inspiration and syntax reference
- Ruby - For pioneering advanced regex features
- PCRE - For performance insights
- Dart Team - For the excellent base RegExp implementation
Made with β€οΈ for the Dart community
Duppix: Because your patterns deserve more power π
Libraries
- duppix
- Duppix - A comprehensive regex library with Oniguruma-compatible features