extractStructuredData method
Parses HTML content and extracts structured data using CSS selectors in a memory-efficient way
html
is the HTML content to parse
selectors
is a map of field names to CSS selectors
attributes
is a map of field names to attributes to extract (optional)
chunkSize
is the size of each chunk to process (default: 1024 * 1024 bytes)
Implementation
List<Map<String, String>> extractStructuredData({
required String html,
required Map<String, String> selectors,
Map<String, String?>? attributes,
int chunkSize = 1024 * 1024, // 1MB chunks
}) {
_logger.info(
'Starting memory-efficient structured extraction with selectors: ${selectors.toString()}',
);
if (attributes != null) {
_logger.info('Using attributes: ${attributes.toString()}');
}
// If the HTML is small enough, use the standard parser
if (html.length <= chunkSize) {
return _extractStructuredDataStandard(
html: html,
selectors: selectors,
attributes: attributes,
);
}
// For large HTML, use a chunking approach
return _extractStructuredDataChunked(
html: html,
selectors: selectors,
attributes: attributes,
chunkSize: chunkSize,
);
}