scrape method
Future<Map<String, Object> >
scrape({
- required Uri url,
- ScraperConfig? scraperConfig,
- ScraperConfigMap? scraperConfigMap,
- bool debug = false,
- String? html,
- Map<
String, String> ? overrideCookies, - Map<
String, String> ? overrideHeaders, - String? overrideUserAgent,
- ProxyAPIConfig? overrideProxyAPIConfig,
Performs complete web scraping including HTML fetching and data extraction.
This is the main method that orchestrates the entire scraping process:
- If scraperConfig is provided, it will be used.
- If scraperConfig is not provided, scraperConfigMap will be used to find the appropriate scraper configuration for the URL
- If no scraper configuration is provided, throw an error
- Fetches the HTML content (if required)
- Parses the HTML using the configured parsers and returns the extracted data as a map
Parameters:
url
: The URL to scrapescraperConfig
: Scraper configuration for the URLscraperConfigMap
: Map of domain names to lists of scraper configurationsdebug
: Enable debug logging (default: false)html
: Pre-fetched HTML document (optional, avoids HTTP request if provided)overrideCookies
: Custom cookies to include in HTTP requests, will override cookies in scraper configoverrideHeaders
: Custom HTTP headers to include in requests, will override headers in scraper configoverrideUserAgent
: Custom user agent string (overrides scraper config setting)overrideProxyAPIConfig
: Custom proxy API configuration (overrides scraper config setting for base requests and http parser requests)
Returns:
- Map containing extracted data with parser IDs as keys
Throws:
- WebScraperError if URL is not supported or scraping fails
Implementation
Future<Map<String, Object>> scrape({
required Uri url,
ScraperConfig? scraperConfig,
ScraperConfigMap? scraperConfigMap,
bool debug = false,
String? html,
Map<String, String>? overrideCookies,
Map<String, String>? overrideHeaders,
String? overrideUserAgent,
ProxyAPIConfig? overrideProxyAPIConfig,
}) async {
/// Find the appropriate scraper configuration for this URL
ScraperConfig? config;
if (scraperConfig != null) {
config = scraperConfig;
} else if (scraperConfigMap != null) {
config = findScraperConfig(
url: url,
scraperConfigMap: scraperConfigMap,
);
}
if (config == null) {
throw WebScraperError(
'No scraper configuration provided or this url is not supported by scraperConfigMap');
}
/// Fetch the HTML content using the Scraper class
Scraper scraping = Scraper();
Data scrapedData = await scraping.scrape(
url: url,
html: html != null ? Document.html(html) : null,
debug: debug,
scraperConfig: config,
overrideCookies: overrideCookies,
overrideHeaders: overrideHeaders,
overrideUserAgent: overrideUserAgent,
overrideProxyAPIConfig: overrideProxyAPIConfig,
);
/// Parse the HTML content using the WebParser class
WebParser webParser = WebParser();
Map<String, Object> parsedData = await webParser.parse(
scrapedData: scrapedData,
scraperConfig: config,
debug: debug,
overrideProxyAPIConfig: overrideProxyAPIConfig,
);
return parsedData;
}