jaroWinklerSimilarity<E> function
Find the Jaro-Winkler similarity index between two list of items.
Parameters
sourceandtargetare two list of items.thresholdis the minimum Jaro distance above which the Winkler's increment is to be applied.maxPrefixSizeis the maximum prefix length to consider. If absent, the whole matching prefix is considered.prefixScaleis a constant scaling factor for how much the score is adjusted upwards for having common prefixes. The length of the considered common prefix is at most 4. If absent, the default prefix scale is used.
Details
The Jaro similarity index between two list of items is the weighted sum of percentage of matched items from each list and transposed items. Winkler increased this measure for matching initial characters.
See also: jaroSimilarity
If n is the length of source and m is the length of target,
Complexity: Time O(nm) | Space O(n+m)
Implementation
double jaroWinklerSimilarity<E>(
List<E> source,
List<E> target, {
int? maxPrefixSize,
double? prefixScale,
double threshold = 0.7,
}) {
double jaro = jaroSimilarity(source, target);
if (jaro > threshold) {
// maximum length to find prefix match
int len = min(source.length, target.length);
if (maxPrefixSize != null && len > maxPrefixSize) {
len = maxPrefixSize;
}
// Find matching prefix
int l = 0;
while (l < len && source[l] == target[l]) {
l++;
}
// Add Winkler bonus with jaro similarity index
double p = prefixScale ?? min(0.1, 1 / max(source.length, target.length));
jaro += l * p * (1 - jaro);
}
return jaro;
}