Edit Distance Sample Clauses

Edit Distance. In S, concepts from Damereau-Levenshtein edit dis- tance (Damereau, 1964; Levenshtein, 1966) are ap- plied to model segmentation edit distance as two op- erations: substitutions and transpositions.4 These two operations represent full misses and near misses, respectively. Using these two operations, a new globally-optimal minimum edit distance is applied to a pair of sequences of sets of boundaries to model the sources of dissimilarity identified earlier.5 Near misses that are remedied by transposition are penalized as b PBs of error (where b is the number of boundaries transposed), as opposed to the 2b PBs of errors by which they would be penalized if they were considered to be two separate substitution op- erations. Transpositions can also be considered over n > 2 PBs (n-wise transpositions). This is useful if, for a specific task, near misses of up to n PBs are not to be penalized as full misses (default n = 2). The error represented by the two operations can also be scaled (i.e., weighted) from 1 PB each to a 4Beeferman et al. (1997, p. 42) briefly mention using an edit distance without transpositions, but discard it in favour of Pµ.‌ 5For multiple boundaries, an add/del operation is added, and transpositions are considered only within boundary types. fraction. The distance over which an n-wise trans- position occurred can also be used in conjunction with the scalar operation weighting so that a transpo- sition is weighted using the function in Equation 3. te(n, b) = b − (1/b)n−2 where n ≥ 2 and b > 0 (3) This transposition error function was chosen so that, in an n-wise transposition where n = 2 PBs and the number of boundaries transposed b = 2, the penalty would be 1 PB, and the maximum penalty as limn→∞ te(n) would be b PBs, or in this case 2 PBs (demonstrated later in Figure 5b).