parse method
Future<Map<String, Object> >
parse({
- required Data scrapedData,
- required ScraperConfig scraperConfig,
- bool debug = false,
- ProxyAPIConfig? overrideProxyAPIConfig,
Main entry point for parsing scraped HTML data.
This method orchestrates the entire parsing process:
- Builds a parent-to-children relationship map from all parsers
- Identifies root parsers (those with '_root' as parent)
- Executes parsers in hierarchical order
- Applies transformations and cleaning to extracted data
- Returns the final structured data
Parameters:
scrapedData
: The scraped HTML data to parse as Data object containing url and Document object.scraperConfig
: Configuration containing parser definitionsdebug
: Enable debug logging for troubleshootingoverrideProxyAPIConfig
: Custom proxy API configuration (overrides http parser requests)
Returns:
- Map containing extracted data with parser IDs as keys
Implementation
Future<Map<String, Object>> parse({
required Data scrapedData,
required ScraperConfig scraperConfig,
bool debug = false,
ProxyAPIConfig? overrideProxyAPIConfig,
}) async {
/// Start performance monitoring
final Stopwatch stopwatch = Stopwatch()..start();
printLog('Parser: Using scraper config...', debug, color: LogColor.blue);
/// Get all parsers from the configuration
final List<Parser> allParsers = scraperConfig.parsers.toList();
/// Build parent-to-children relationship map for hierarchical parsing
final Map<String, List<Parser>> parentToChildren =
_buildParentToChildrenMap(allParsers);
/// Identify root parsers (those that start the parsing chain)
final List<Parser> rootParsers = parentToChildren['_root']?.toList() ?? [];
/// Initialize with the source URL
extractedData['url'] = scrapedData.url;
/// Execute the parsing hierarchy starting with root parsers
final Map<String, Object> parsedData = await _distributeParsers(
parentToChildren: parentToChildren,
parsers: rootParsers,
parentData: scrapedData,
overrideProxyAPIConfig: overrideProxyAPIConfig,
debug: debug,
);
/// Ensure URL is always present in the final result
parsedData.putIfAbsent('url', () => scrapedData.url.toString());
/// Log parsing performance
stopwatch.stop();
printLog(
'Parsing took ${stopwatch.elapsedMilliseconds} ms.',
debug,
color: LogColor.green,
);
return parsedData;
}