• using struct::list::longestCommonSubsequence for diff/patch

    From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Fri Jul 18 09:10:25 2025
    From Newsgroup: comp.lang.tcl

    Out of curiosity I'm trying to use the `struct::list::longestCommonSubsequence` to create diff (which I call edits)/patch functions.

    Below is the code I'm using. There are two examples, both of which diff two lists to produce a "patch edits" list, and then take the first list and the patch edits in an attempt to reconstruct the second list. The first succeeds and the second fails.

    Clearly I'm doing something wrong. I feel confident that `edits` is correct since it really just uses the data from `struct::list::longestCommonSubsequence`; so I'm pretty sure my bug(s) is(are) in `patch`.

    #!/usr/bin/env tclsh9

    package require struct::list 1

    proc main {} {
    const FX_OLD [list the quick brown fox jumped over the lazy dogs]
    const FX_NEW [list the quick red fox hopped over the dogs today]
    _diff_patch FX_OLD $FX_OLD $FX_NEW

    const EE_OLD [list alpha pi xi gamma delta zeta theta \
    kappa rho tau lambda omicron nu]
    const NEW [list alpha beta gamma delta epsilon zeta eta theta iota \
    kappa lambda mu nu]
    _diff_patch EE_OLD $EE_OLD $NEW
    }

    proc _diff_patch {name old_lines new_lines} {
    puts "\n$name: [indexed_list $old_lines]"
    set edits [edits $old_lines $new_lines]
    puts "edits: $edits"
    set actual [patch $old_lines $edits]
    puts "expected: [indexed_list $new_lines]"
    puts "actual: [indexed_list $actual]\
    [expr {$actual eq $new_lines ? "OK" : "FAIL"}]"
    }

    proc indexed_list lst {
    set result [list]
    set i 0
    foreach x $lst {
    lappend result "$i:$x"
    incr i
    }
    join $result " "
    }

    proc edits {old_lines new_lines} {
    set edits [list]
    set lcs [::struct::list longestCommonSubsequence $old_lines $new_lines]
    foreach diff [::struct::list lcsInvert $lcs [llength $old_lines] \
    [llength $new_lines]] {
    lassign $diff action left right
    switch $action {
    added {
    foreach line [lrange $new_lines {*}$right] {
    lappend edits "+ $left $line"
    }
    }
    deleted {
    foreach line [lrange $old_lines {*}$left] {
    lappend edits "- $left"
    }
    }
    changed {
    foreach line [lrange $old_lines {*}$left] {
    lappend edits "- $left"
    }
    foreach line [lrange $new_lines {*}$right] {
    lappend edits "+ $left $line"
    }
    }
    }
    }
    return $edits
    }

    proc patch {lines edits} {
    foreach edit $edits {
    if {$edit eq ""} { continue }
    if {[scan $edit "%s %d %d%n" action index1 index2 i] > 0} {
    switch $action {
    "+" {
    set line [string range $edit [incr i] end]
    if {$index1 == $index2} {
    set lines [linsert $lines $index1 $line]
    } else {
    set lines [lreplace $lines $index1 $index2 $line]
    }
    }
    "-" { set lines [lreplace $lines $index1 $index2] }
    }
    } else {
    error "invalid patch edits line: '$edit'"
    }
    }
    return $lines
    }

    main
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Fri Jul 18 14:58:14 2025
    From Newsgroup: comp.lang.tcl

    * Mark Summerfield <m.n.summerfield@gmail.com>
    | Below is the code I'm using. There are two examples, both of which diff two
    | lists to produce a "patch edits" list, and then take the first list and the
    | patch edits in an attempt to reconstruct the second list. The first succeeds | and the second fails.

    | Clearly I'm doing something wrong. I feel confident that `edits` is correct
    | since it really just uses the data from
    | `struct::list::longestCommonSubsequence`; so I'm pretty sure my bug(s) is(are)
    | in `patch`.

    Since you are replacing 'lines' in [patch] in the edit-loop, the indices
    are wrong once the number of elements change due to an insertion or
    deletion. In your first example, you only ever insert/delete single
    elements (or equal-size chunks), but in the second example, the number
    of elements are different between old and new.

    I think you need to build the new list piecewise somehow or at least
    keep a copy of the original lines in order to apply the indices to it.

    Also I'm not sure whether it is ok to enter an operation more than once
    in [edit]. I.e.

    changed {1 2} {1 1}
    becomes
    {- 1 2} {- 1 2} {+ 1 2 beta}

    due to the 'foreach' iterating over {1 2}.

    R'
    --- Synchronet 3.21a-Linux NewsLink 1.2