jaccardDistanceOf function
Returns the Jaccard distance between two strings.
Parameters
sourceis the variant listtargetis the prototype list- if
ignoreCaseis true, the character case shall be ignored. - if
ignoreWhitespaceis true, space, tab, newlines etc whitespace characters will be ignored. - if
ignoreNumbersis true, numbers will be ignored. - if
alphaNumericOnlyis true, only letters and digits will be matched. ngramis the size a single item group. If n = 1, each individual items are considered separately. If n = 2, two consecutive items are grouped together and treated as one.
Details
Jaccard distance measures the total number of characters that is present in
one string but not the other. It is calculated by subtracting the length of
intersection between the source and target set from their union.
See Also: tverskyIndex, jaccardIndex
Complexity: Time O(n log n) | Space O(n)
Implementation
int jaccardDistanceOf(
String source,
String target, {
int ngram = 1,
bool ignoreCase = false,
bool ignoreWhitespace = false,
bool ignoreNumbers = false,
bool alphaNumericOnly = false,
}) {
source = cleanupString(
source,
ignoreCase: ignoreCase,
ignoreWhitespace: ignoreWhitespace,
ignoreNumbers: ignoreNumbers,
alphaNumericOnly: alphaNumericOnly,
);
target = cleanupString(
target,
ignoreCase: ignoreCase,
ignoreWhitespace: ignoreWhitespace,
ignoreNumbers: ignoreNumbers,
alphaNumericOnly: alphaNumericOnly,
);
if (ngram < 2) {
return jaccardDistance(
source.codeUnits,
target.codeUnits,
);
} else {
return jaccardDistance(
splitStringToSet(source, ngram),
splitStringToSet(target, ngram),
);
}
}