MobileScraper class
A lightweight HTML scraper designed specifically for Flutter mobile apps. Works only on Android and iOS platforms.
Example usage:
final scraper = MobileScraper(
url: 'https://example.com',
config: ScraperConfig(timeout: Duration(seconds: 10)),
);
await scraper.load();
final results = scraper.queryAll(tag: 'h1');
Constructors
- MobileScraper.new({required String url, ScraperConfig config = ScraperConfig.defaultConfig})
- Creates a new MobileScraper instance with the given URL and optional configuration.
Properties
- config → ScraperConfig
-
Configuration options for the scraper
final
- hashCode → int
-
The hash code for this object.
no setterinherited
- isLoaded → bool
-
Checks if HTML content has been loaded.
no setter
- rawHtml → String?
-
Returns the raw HTML content.
no setter
- runtimeType → Type
-
A representation of the runtime type of the object.
no setterinherited
- url → String
-
The URL to scrape
final
Methods
-
cancel(
) → void - Cancels any ongoing load operation
-
dispose(
) → void - Disposes of resources used by the scraper
-
estimateReadingTime(
{int wordsPerMinute = 200}) → Duration - Estimate reading time for the page content
-
extractDescription(
) → String? - Extract only the description using smart detection
-
extractEmails(
) → List< String> - Extract email addresses found on the page
-
extractImages(
) → List< String> - Extract all images found on the page
-
extractLinks(
) → List< String> - Extract all links found on the page
-
extractPhoneNumbers(
) → List< String> - Extract phone numbers found on the page
-
extractPrices(
) → List< String> - Extract prices found on the page (useful for e-commerce)
-
extractSmartContent(
) → SmartContent - Extract all common content types automatically using smart extraction
-
extractSpecificContent(
) → Map< String, List< String> > - Extract specific content types (headings, links, images, etc.)
-
extractTitle(
) → String? - Extract only the page title using smart detection
-
formatContent(
ContentFormat format) → String - Format content according to specified format
-
getCleanContent(
{required String tag, String? className, String? id, ContentFormat format = ContentFormat.plainText}) → String - Get content with specific formatting for extracted elements
-
getReadableContent(
) → String - Get readable content similar to Readability.js This removes ads, navigation, and other clutter
-
getWordCount(
) → int - Get word count of the page content
-
isCached(
) → Future< bool> - Check if this URL is cached
-
load(
{bool useCache = true}) → Future< bool> - Loads HTML content from the specified URL with retry support and caching.
-
noSuchMethod(
Invocation invocation) → dynamic -
Invoked when a nonexistent method or property is accessed.
inherited
-
query(
{required String tag, String? className, String? id}) → String? - Extracts the first text content from HTML tags matching the specified criteria.
-
queryAll(
{required String tag, String? className, String? id}) → List< String> - Extracts all text content from HTML tags matching the specified criteria.
-
queryWithRegex(
{required String pattern, int group = 1}) → List< String> - Extracts text content using a regular expression pattern.
-
queryWithRegexFirst(
{required String pattern, int group = 1}) → String? - Extracts the first match using a regular expression pattern.
-
removeFromCache(
) → Future< void> - Remove this URL from cache
-
toMarkdown(
) → String - Convert the entire page content to Markdown format
-
toPlainText(
) → String - Convert the entire page content to clean plain text
-
toString(
) → String -
A string representation of this object.
inherited
Operators
-
operator ==(
Object other) → bool -
The equality operator.
inherited
Static Methods
-
clearAllCache(
) → Future< void> - Clear all cached content
-
getCacheStats(
) → CacheStats - Get cache statistics