dart_web_scraper library
Dart Web Scraper Library
A comprehensive web scraping library for Dart that provides tools for extracting data from web pages using configurable parsers and transformations.
This library exports all the core components needed to build web scrapers:
- Core scraping and parsing classes
- Data models for configuration and results
- Utility functions and enums
- Debug and logging utilities
Classes
- CleanerRegistry
- Registry for managing custom cleaner functions by name.
- CropTransformationOptions
- Configuration options for trimming data from the beginning or end.
- Data
- Container for scraped data that includes both the source URL and extracted content.
- HttpParserOptions
- Configuration options for HTTP parser behavior.
- Parser
- Defines how to extract specific data from HTML or other sources.
- ParserOptions
- Configuration options specific to individual parser types.
- ProxyAPIConfig
- Configuration for routing HTTP requests through a proxy API.
- RegexReplaceTransformationOptions
- Configuration options for regex-based string replacement transformations.
- RegexTransformationOptions
- Configuration options for regex-based text transformations.
- ReplaceTransformationOptions
- Configuration options for string replacement transformations.
- Scraper
- Handles HTTP requests and HTML fetching for web scraping operations.
- ScraperConfig
- Configuration for targeting and scraping specific types of URLs.
- ScraperConfigMap
- SiblingParserOptions
- Configuration options for sibling parser behavior.
- StaticValueParserOptions
- Configuration options for static value parser behavior.
- StringBetweenParserOptions
- Configuration options for string between parser behavior.
- TableParserOptions
- Configuration options for table parser behavior.
- TransformationOptions
- Configuration for applying multiple transformations to extracted data.
- UrlCleaner
- Configuration for cleaning and normalizing URLs before making HTTP requests.
- UrlParamParserOptions
- Configuration options for urlParam parser behavior.
- WebParser
- Processes scraped HTML data using configurable parsers to extract structured information.
- WebScraper
- High-level web scraper that combines HTML fetching and data parsing.
Enums
- HttpMethod
- HTTP methods for making requests in HTTP parsers.
- HttpPayload
- Types of payload data for HTTP requests.
- HttpResponseType
- Types of HTTP responses that can be processed by HTTP parsers.
- LogColor
- Colors for debug logging output.
- ParserType
- Types of parsers available for data extraction.
- SiblingDirection
- Directions for sibling element extraction.
- TransformationType
- Types of transformations that can be applied to extracted data.
- UserAgentDevice
- Device types for user agent strings in HTTP requests.
Extensions
-
IterableExtension
on Iterable<
T> - Extension on Iterable to provide indexed search functionality.
Functions
-
cleanScraperConfigUrl(
Uri url, UrlCleaner? cleaner) → Uri - Cleans and normalizes a URL based on scraper configuration settings.
-
dumpResponseToFile(
{required String html, required bool debug}) → void - Web platform stub for dumping response to file.
-
findScraperConfig(
{required ScraperConfigMap scraperConfigMap, required Uri url}) → ScraperConfig? - Finds the appropriate scraper configuration for a given URL.
-
inject(
String name, Object data, Object input) → String - Injects data into a string template using slot placeholders.
-
printLog(
String message, bool debug, {LogColor color = LogColor.reset}) → void - Web platform implementation of debug logging.
Typedefs
Exceptions / Errors
- WebScraperError
- Custom exception class for web scraping operations.