extractData method
Parses HTML content and extracts data using CSS selectors in a memory-efficient way
html
is the HTML content to parse
selector
is the CSS selector to use
attribute
is the attribute to extract (optional)
asText
whether to extract the text content (default: true)
chunkSize
is the size of each chunk to process (default: 1024 * 1024 bytes)
Implementation
List<String> extractData({
required String html,
required String selector,
String? attribute,
bool asText = true,
int chunkSize = 1024 * 1024, // 1MB chunks
}) {
_logger.info(
'Starting memory-efficient extraction with selector: $selector',
);
if (attribute != null) {
_logger.info('Using attribute: $attribute');
}
// If the HTML is small enough, use the standard parser
if (html.length <= chunkSize) {
return _extractDataStandard(
html: html,
selector: selector,
attribute: attribute,
asText: asText,
);
}
// For large HTML, use a chunking approach
return _extractDataChunked(
html: html,
selector: selector,
attribute: attribute,
asText: asText,
chunkSize: chunkSize,
);
}