queryAll method
Extracts all text content from HTML tags matching the specified criteria.
tag
- The HTML tag to search for (e.g., 'h1', 'p', 'div')
className
- Optional CSS class name to filter by
id
- Optional ID attribute to filter by
Returns a list of text content from matching elements.
Throws ScraperNotInitializedException if HTML content not loaded. Throws ParseException if parsing fails.
Implementation
List<String> queryAll({
required String tag,
String? className,
String? id,
}) {
if (_htmlContent == null) {
throw ScraperNotInitializedException();
}
try {
List<String> results = [];
String pattern = _buildTagPattern(tag, className: className, id: id);
RegExp regex = RegExp(pattern, caseSensitive: false, dotAll: true);
Iterable<RegExpMatch> matches = regex.allMatches(_htmlContent!);
for (RegExpMatch match in matches) {
String? content = match.group(1);
if (content != null) {
// Remove HTML tags from content and clean up whitespace
String cleanContent = _cleanHtmlContent(content);
if (cleanContent.isNotEmpty) {
results.add(cleanContent);
}
}
}
return results;
} catch (e) {
throw ParseException(
'Failed to parse HTML with tag pattern', _htmlContent, e);
}
}