extractData method - MemoryEfficientParser class - memory_efficient_parser library

pivox package
documentation
features/web_scraping/memory_efficient_parser.dart
MemoryEfficientParser
extractData method

extractData method

List<String> extractData({

required String html,
required String selector,
String? attribute,
bool asText = true,
int chunkSize = 1024 * 1024,

})

Parses HTML content and extracts data using CSS selectors in a memory-efficient way

html is the HTML content to parse selector is the CSS selector to use attribute is the attribute to extract (optional) asText whether to extract the text content (default: true) chunkSize is the size of each chunk to process (default: 1024 * 1024 bytes)

Implementation

List<String> extractData({
  required String html,
  required String selector,
  String? attribute,
  bool asText = true,
  int chunkSize = 1024 * 1024, // 1MB chunks
}) {
  _logger.info(
    'Starting memory-efficient extraction with selector: $selector',
  );
  if (attribute != null) {
    _logger.info('Using attribute: $attribute');
  }

  // If the HTML is small enough, use the standard parser
  if (html.length <= chunkSize) {
    return _extractDataStandard(
      html: html,
      selector: selector,
      attribute: attribute,
      asText: asText,
    );
  }

  // For large HTML, use a chunking approach
  return _extractDataChunked(
    html: html,
    selector: selector,
    attribute: attribute,
    asText: asText,
    chunkSize: chunkSize,
  );
}