findScraperConfig function

ScraperConfig? findScraperConfig({
  1. required ScraperConfigMap scraperConfigMap,
  2. required Uri url,
})

Finds the appropriate scraper configuration for a given URL.

This function searches through the scraper configuration map to find a configuration that matches the URL's host and path patterns.

The matching process:

  1. Checks if the URL's host contains any of the configuration keys
  2. For matching hosts, checks each scraper configuration's path patterns
  3. Matches path patterns using exact string matching or regex patterns
  4. Returns the first matching configuration found

Parameters:

  • scraperConfigMap: Map of domain names to lists of scraper configurations
  • url: The URL to find a configuration for

Returns:

Example:

final config = findScraperConfig(
  scraperConfigMap: {
    'example.com': [ScraperConfig(pathPatterns: ['/products'], ...)]
  },
  url: Uri.parse('https://example.com/products/123'),
);

Implementation

ScraperConfig? findScraperConfig({
  required ScraperConfigMap scraperConfigMap,
  required Uri url,
}) {
  for (final host in scraperConfigMap.configs.keys) {
    // exact or subdomain
    final h = host.toLowerCase();
    final u = url.host.toLowerCase();
    final hostMatches = (u == h) || u.endsWith('.$h');
    if (!hostMatches) continue;

    final list = scraperConfigMap.configs[host];
    if (list == null || list.isEmpty) continue;

    // if useNth provided, check bounds and only test that config
    final nth = scraperConfigMap.useNth;
    if (nth != null) {
      if (nth < 0 || nth >= list.length) continue;
      final hit = _checkPathPatterns(list[nth], url);
      if (hit != null) return hit;
      continue;
    }

    // otherwise, try each config until one matches
    for (final cfg in list) {
      final hit = _checkPathPatterns(cfg, url);
      if (hit != null) return hit;
    }
  }
  return null;
}