crandas.string_metrics

String metrics for approximate string matching or comparison and in fuzzy string searching.

crandas.string_metrics.edit_distance(left, right, distance_type='levenshtein', **type_opts)

Computes the edit distance between two string columns.

By default, the Levenshtein edit distance is computed using function levenshtein_distance().

Parameters:
  • left (CSeries or str) – The string columns to compare.

  • right (CSeries or str) – The string columns to compare.

  • distance_type (str, optional) – The type of edit distance to compute. Currently, only 'levenshtein' is supported.

  • **type_opts – Additional options for the edit distance type.

Returns:

CSeries with the edit distances.

Return type:

CSeries

crandas.string_metrics.levenshtein_distance(left, right, score_cutoff=None) CSeries

Computes the edit distance between two string columns.

Compute the Levenshtein edit distance between two string columns, i.e., the minimum number of character insertions, deletions and substitutions required to transform one string into the other.

Parameters:
  • left (CSeries or str) – The string columns to compare.

  • right (CSeries or str) – The string columns to compare.

  • score_cutoff (int, optional) – Maximum edit distance to consider. If the edit distance is larger than score_cutoff, score_cutoff + 1 is returned. If None, no cutoff is applied.

Returns:

CSeries with the edit distances (integers).

Return type:

CSeries