extractStructuredData method

List<Map<String, String>> extractStructuredData({
  1. required String html,
  2. required Map<String, String> selectors,
  3. Map<String, String?>? attributes,
  4. int chunkSize = 1024 * 1024,
})

Parses HTML content and extracts structured data using CSS selectors in a memory-efficient way

html is the HTML content to parse selectors is a map of field names to CSS selectors attributes is a map of field names to attributes to extract (optional) chunkSize is the size of each chunk to process (default: 1024 * 1024 bytes)

Implementation

List<Map<String, String>> extractStructuredData({
  required String html,
  required Map<String, String> selectors,
  Map<String, String?>? attributes,
  int chunkSize = 1024 * 1024, // 1MB chunks
}) {
  _logger.info(
    'Starting memory-efficient structured extraction with selectors: ${selectors.toString()}',
  );
  if (attributes != null) {
    _logger.info('Using attributes: ${attributes.toString()}');
  }

  // If the HTML is small enough, use the standard parser
  if (html.length <= chunkSize) {
    return _extractStructuredDataStandard(
      html: html,
      selectors: selectors,
      attributes: attributes,
    );
  }

  // For large HTML, use a chunking approach
  return _extractStructuredDataChunked(
    html: html,
    selectors: selectors,
    attributes: attributes,
    chunkSize: chunkSize,
  );
}