dart_web_scraper library

Dart Web Scraper Library

A comprehensive web scraping library for Dart that provides tools for extracting data from web pages using configurable parsers and transformations.

This library exports all the core components needed to build web scrapers:

  • Core scraping and parsing classes
  • Data models for configuration and results
  • Utility functions and enums
  • Debug and logging utilities

Classes

CleanerRegistry
Registry for managing custom cleaner functions by name.
CropTransformationOptions
Configuration options for trimming data from the beginning or end.
Data
Container for scraped data that includes both the source URL and extracted content.
HttpParserOptions
Configuration options for HTTP parser behavior.
Parser
Defines how to extract specific data from HTML or other sources.
ParserOptions
Configuration options specific to individual parser types.
ProxyAPIConfig
Configuration for routing HTTP requests through a proxy API.
RegexReplaceTransformationOptions
Configuration options for regex-based string replacement transformations.
RegexTransformationOptions
Configuration options for regex-based text transformations.
ReplaceTransformationOptions
Configuration options for string replacement transformations.
Scraper
Handles HTTP requests and HTML fetching for web scraping operations.
ScraperConfig
Configuration for targeting and scraping specific types of URLs.
ScraperConfigMap
SiblingParserOptions
Configuration options for sibling parser behavior.
StaticValueParserOptions
Configuration options for static value parser behavior.
StringBetweenParserOptions
Configuration options for string between parser behavior.
TableParserOptions
Configuration options for table parser behavior.
TransformationOptions
Configuration for applying multiple transformations to extracted data.
UrlCleaner
Configuration for cleaning and normalizing URLs before making HTTP requests.
UrlParamParserOptions
Configuration options for urlParam parser behavior.
WebParser
Processes scraped HTML data using configurable parsers to extract structured information.
WebScraper
High-level web scraper that combines HTML fetching and data parsing.

Enums

HttpMethod
HTTP methods for making requests in HTTP parsers.
HttpPayload
Types of payload data for HTTP requests.
HttpResponseType
Types of HTTP responses that can be processed by HTTP parsers.
LogColor
Colors for debug logging output.
ParserType
Types of parsers available for data extraction.
SiblingDirection
Directions for sibling element extraction.
TransformationType
Types of transformations that can be applied to extracted data.
UserAgentDevice
Device types for user agent strings in HTTP requests.

Extensions

IterableExtension on Iterable<T>
Extension on Iterable to provide indexed search functionality.

Functions

cleanScraperConfigUrl(Uri url, UrlCleaner? cleaner) Uri
Cleans and normalizes a URL based on scraper configuration settings.
dumpResponseToFile({required String html, required bool debug}) → void
Web platform stub for dumping response to file.
findScraperConfig({required ScraperConfigMap scraperConfigMap, required Uri url}) ScraperConfig?
Finds the appropriate scraper configuration for a given URL.
inject(String name, Object data, Object input) String
Injects data into a string template using slot placeholders.
printLog(String message, bool debug, {LogColor color = LogColor.reset}) → void
Web platform implementation of debug logging.

Typedefs

CleanerFunction<T> = T Function(Data data, Map<String, Object> extractedData, bool debug)
Function signature for custom data cleaning functions.

Exceptions / Errors

WebScraperError
Custom exception class for web scraping operations.