π DBSCAN Dart Implementation
A high-performance, pure-Dart implementation of the DBSCAN clustering algorithm with spatial optimization. This library works with any data type that implements the SpatialPoint
interface, requiring only id()
, distanceTo(SpatialPoint)
, dimension()
, and atDimension(int)
methods-making it versatile for n-D, geospatial, or custom distance metrics.
β¨ Features
- π KD-Tree & Grid Optimization - Dramatically improves performance on large datasets through spatial indexing
- π§© Flexible Point Interface - Cluster any data type by implementing a simple interface
- π Efficient Expansion Algorithm - Uses optimized seed-set expansion with duplicate tracking
- π‘οΈ Border Point Handling - Smart border point detection prevents misclassification as noise
- π Comprehensive Results - Returns both per-cluster point collections and per-point cluster assignments
Credits and Attribution
This package is heavily based on the work done in mojixcoder/dbscan, with optimizations and adaptations for Dart.
For those interested in understanding the DBSCAN algorithm in depth:
Getting started
Add the package to your pubspec.yaml
:
dart pub add dbscan_dart
Usage
import 'package:dbscan_dart/dbscan_dart.dart';
void main() {
// Create some sample points
final points = <SpatialPoint>[
LatLngPoint(pointId: PointId(1), lat: 1, lng: 1),
LatLngPoint(pointId: PointId(2), lat: 2, lng: 2),
LatLngPoint(pointId: PointId(3), lat: 3, lng: 3),
LatLngPoint(pointId: PointId(5), lat: 5, lng: 5),
LatLngPoint(pointId: PointId(1000), lat: 1000, lng: 1000),
LatLngPoint(pointId: PointId(1001), lat: 1001, lng: 1001),
LatLngPoint(pointId: PointId(2000), lat: 2000, lng: 2000),
];
// Set DBSCAN parameters
const eps = 5;
const minPoints = 2;
// Create a DBScan instance
final dbscan = DBScan(eps: eps, minPoints: minPoints);
// Run DBSCAN
final result = dbscan.run(points: points);
// Print clusters
print('### Clusters ###');
result.clusters.forEach((clusterId, cluster) {
print('Cluster $clusterId: $cluster');
});
// Print point labels
print('### Point Labels ###');
result.labels.forEach((pointId, label) {
print('ID(${pointId.value}): ClusterID(${label.value})');
});
// Check if a point is noise
final pointToCheck = PointId(2000);
if (result.labels[pointToCheck] == ClusterLabel.noise) {
print('Point with ID ${pointToCheck.value} is noise');
}
}
Example output:
### Clusters ###
Cluster 1: [LatLngPoint(id: 1, lat: 1, lng: 1), LatLngPoint(id: 2, lat: 2, lng: 2), LatLngPoint(id: 3, lat: 3, lng: 3), LatLngPoint(id: 5, lat: 5, lng: 5)]
Cluster 2: [LatLngPoint(id: 1000, lat: 1000, lng: 1000), LatLngPoint(id: 1001, lat: 1001, lng: 1001)]
################
### Point Labels ###
ID(1): ClusterID(1)
ID(2): ClusterID(1)
ID(3): ClusterID(1)
ID(5): ClusterID(1)
ID(1000): ClusterID(2)
ID(1001): ClusterID(2)
ID(2000): ClusterID(-1) // Noise
####################
Point with ID 2000 is noise
API Overview
SpatialPoint Interface
Your data type must implement:
abstract class SpatialPoint {
PointId id();
double distanceTo(SpatialPoint other);
int dimension();
double atDimension(int d);
}
LatLngPoint Implementation
The library includes a LatLngPoint
implementation for geographical points:
final point = LatLngPoint(pointId: 1, lat: 37.7749, lng: -122.4194);
DBScan Algorithm
The main clustering algorithm:
final dbscan = DBScan(eps: 5, minPoints: 4);
final result = dbscan.run(points: points);
Parameters:
eps
: Neighborhood radius (maximum distance between two points to be considered neighbors)minPoints
: Minimum number of points required to form a dense region
Result Structure
The result contains:
clusters
: Map of cluster IDs to lists of points in each clusterlabels
: Map of point IDs to their assigned cluster labels
Additional information
How DBSCAN Works
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups together points that are closely packed together, marking as outliers points that lie alone in low-density regions.
The algorithm works by:
- For each point, finding all points within distance
eps
- If a point has at least
minPoints
neighbors, it's a "core point" - All points within
eps
distance of a core point are in the same cluster - Points that are not core points but are within
eps
of a core point are "border points" - Points that are neither core nor border points are "noise points"
Choosing Parameters
-
eps: The maximum distance between two points for them to be considered neighbors
- Too small: Many small clusters or noise points
- Too large: Clusters merge together
- Tip: Try visualizing your data and estimating typical cluster sizes
-
minPoints: The minimum number of points required to form a dense region
- Rule of thumb: At least dimension + 1 (e.g., 3 for 2D data)
- Higher values create more significant clusters
- Lower values create more clusters with fewer points each
Performance Considerations
- The implementation uses a KD-Tree for efficient range queries
- For very large datasets, consider sampling or chunking your data
- The algorithm's time complexity is approximately O(n log n) in the average case
Libraries
- dbscan_dart
- DBSCAN clustering algorithm implementation in Dart.