extractStructuredDataStream method
Extracts structured data from a URL using streaming for memory efficiency
url
is the URL to fetch
selectors
is a map of field names to CSS selectors
attributes
is a map of field names to attributes to extract (optional)
headers
are additional headers to send with the request
timeout
is the timeout for the request in milliseconds
retries
is the number of retry attempts
ignoreRobotsTxt
whether to ignore robots.txt rules (default: false)
chunkSize
is the size of each chunk to process (default: 1024 * 1024 bytes)
Implementation
Stream<Map<String, String>> extractStructuredDataStream({
required String url,
required Map<String, String> selectors,
Map<String, String?>? attributes,
Map<String, String>? headers,
int? timeout,
int? retries,
bool ignoreRobotsTxt = false,
int chunkSize = 1024 * 1024, // 1MB chunks
}) async* {
final htmlStream = await fetchHtmlStream(
url: url,
headers: headers,
timeout: timeout,
retries: retries,
ignoreRobotsTxt: ignoreRobotsTxt,
);
final dataStream = _streamingParser.extractStructuredDataStream(
htmlStream: htmlStream,
selectors: selectors,
attributes: attributes,
chunkSize: chunkSize,
);
await for (final item in dataStream) {
yield item;
}
}