extractStructuredData method - MemoryEfficientParser class - memory_efficient_parser library

pivox package
documentation
features/web_scraping/memory_efficient_parser.dart
MemoryEfficientParser
extractStructuredData method

extractStructuredData method

List<Map<String, String>> extractStructuredData({

required String html,
required Map<String, String> selectors,
Map<String, String?>? attributes,
int chunkSize = 1024 * 1024,

})

Parses HTML content and extracts structured data using CSS selectors in a memory-efficient way

html is the HTML content to parse selectors is a map of field names to CSS selectors attributes is a map of field names to attributes to extract (optional) chunkSize is the size of each chunk to process (default: 1024 * 1024 bytes)

Implementation

List<Map<String, String>> extractStructuredData({
  required String html,
  required Map<String, String> selectors,
  Map<String, String?>? attributes,
  int chunkSize = 1024 * 1024, // 1MB chunks
}) {
  _logger.info(
    'Starting memory-efficient structured extraction with selectors: ${selectors.toString()}',
  );
  if (attributes != null) {
    _logger.info('Using attributes: ${attributes.toString()}');
  }

  // If the HTML is small enough, use the standard parser
  if (html.length <= chunkSize) {
    return _extractStructuredDataStandard(
      html: html,
      selectors: selectors,
      attributes: attributes,
    );
  }

  // For large HTML, use a chunking approach
  return _extractStructuredDataChunked(
    html: html,
    selectors: selectors,
    attributes: attributes,
    chunkSize: chunkSize,
  );
}