• threads with package bug (probably timing error) on magicsplat9.01/02 and bawt 9.01

    From et99@et99@rocketship1.me to comp.lang.tcl on Wed Jul 16 20:22:30 2025
    From Newsgroup: comp.lang.tcl

    I have found a bug in tcl/tk 9.0x that occurs on windows when doing a [package require math] inside of several threads concurrently. I have tested several different distributions, a 9.01 and 9.02 magicsplat, a bawt 9.01 distro and tclkit, and as a control, 8.6.16 magicsplat (which does not fail).

    My guess is that some kind of optimization was done in tcl 9.0 with regards to the package command that has caused a race condition or perhaps a sync-ing bug. I can get the failure with other packages besides math, but it happens easier with the math package.

    I wrote up a ticket after 9.01 was out. I am posting this in hopes that someone who knows how to debug tcl on windows might have some insight or the ability to find the cause of the failure.

    With the bawt 9.01 tclkit, the failure results in an access violation crash instead always on the same instruction. Unfortunately, I don't have any symbols, so I can't tell where the bug occurs in the tcl source code.



    ----------------------- details (sorry for the length) ----------


    I have a test program and a windows batch script to run the test program in a loop.

    The /path of the test script and wish.exe needs to be changed in the batch script before trying out the code. If anyone does want to try it, use the most recent addition to the ticket found here:

    https://core.tcl-lang.org/tcl/tktview/61c01e0edb

    Running the batch is from a cmd.exe window, with 1 argument, 1-4, which selects which of my 4 distros to test. None of the paths would likely be correct on another system however.

    The problem that results is that [package require math] fails with the message "cannot find package". It can occur rarely or often, depending on which distro is used. For example, magicsplat 9.02 got 163 failures in 43050 runs or about .3% of the time. Calling the package require math a second time usually succeeds however. Calling a package require on a non-existent package, e.g. foobar, first often succeeds as well.

    Failures occur more often with bawt 9.01 however, for example, 6 failures in only 31 runs. I attribute this to a timing difference where bawt is using //zipfs in the auto_path, while magicsplat does not. On a much faster (2x) computer, running against magicsplat 9.01, the failure rate was about 5% of the time.

    When examining the global variables in the failing thread, the key appears to be that auto_path does not include the path that an ifneeded should have set up, while it does appear in threads that do not fail.

    I cannot get it to fail if I create only one thread (besides the main thread) and so my test script creates 3 threads. All the test code does in each thread is issue a [package require math] inside of a catch to trigger the error. After the error some diagnostic code is run, to output the auto_path and the error message to a tk dialog box.

    If it does not fail, it will simply vwait forever, and the main thread will check for any failures (reported in a tsv shared variable) and exit quickly if no errors so it can run again.

    If there is an error, and the tk message box is displayed with no interaction the process exits in 10 seconds, and another run occurs, and the batch script counts the runs/errors. There is a "no" button that can cancel the 10 second exit to allow inspection from the console.

    In looking at the pkgIndex.tcl script for math, there is the line:

    package ifneeded math 1.2.6 [list source [file join $dir math.tcl]]

    and by instrumenting these 2 scripts (the pkgIndex.tcl and math.tcl), it was determined that this ifneeded was indeed executed (in all 3 threads), however, the math.tcl script had not been sourced in the failing thread.

    That agrees with the setup of auto_path which is done in math.tcl:

    variable home [file join [pwd] [file dirname [info script]]]
    if {[lsearch -exact $::auto_path $home] == -1} {
    lappend ::auto_path $home
    }

    So, the *mystery* is that when it fails, it appears as if the package database is common to all the threads and that something is not being sync'd correctly, so that executing the ifneeded in one thread makes another thread think it's already been done.

    Anyhow, that's my guess, but I don't know if my thinking of a global common package database is actually correct.

    This is as far as I can get with this, since I don't know how to debug the package code on windows, and I don't have a full distro on linux, like magicsplat or bawt.

    I created the ticket (a while back) but there has been no additional entries on this (except by me). I don't know how serious this is for other users, but for me it means I can't reliably use threads code in version 9.

    The latest test code and batch script are in the most recent addition to the ticket dated 2025-07-16.

    The test script is a barebones whittling down of my tasks module, which is where I first found the failure. I could not get a failure using any simpler coding examples. However, my test script only calls tcl/tk core code plus the 2 package calls.

    I would appreciate knowing if others have similar results and/or suggestions.


    -eric
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Harald Oehlmann@wortkarg3@yahoo.com to comp.lang.tcl on Thu Jul 17 09:04:16 2025
    From Newsgroup: comp.lang.tcl

    Am 17.07.2025 um 05:22 schrieb et99:
    I have found a bug in tcl/tk 9.0x that occurs on windows when doing a [package require math] inside of several threads concurrently. I have
    tested several different distributions, a 9.01 and 9.02 magicsplat, a
    bawt 9.01 distro and tclkit, and as a control, 8.6.16 magicsplat (which
    does not fail).

    I am sorry that nobody picked it up.
    Symbol build and Visual Studio Debug->Attach to process is not soo complicated...

    I have added my ideas to the ticket.

    Sometimes, it is necessary to wave with all hands to get attention, sorry.

    Thanks for all,
    Harald
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Thu Jul 17 12:52:31 2025
    From Newsgroup: comp.lang.tcl

    On 7/17/2025 12:04 AM, Harald Oehlmann wrote:
    Am 17.07.2025 um 05:22 schrieb et99:
    I have found a bug in tcl/tk 9.0x that occurs on windows when doing a [package require math] inside of several threads concurrently. I have tested several different distributions, a 9.01 and 9.02 magicsplat, a bawt 9.01 distro and tclkit, and as a control, 8.6.16 magicsplat (which does not fail).

    I am sorry that nobody picked it up.
    Symbol build and Visual Studio Debug->Attach to process is not soo complicated...

    I have added my ideas to the ticket.

    Sometimes, it is necessary to wave with all hands to get attention, sorry.

    Thanks for all,
    Harald
    Thanks, I didn't want to shout too loudly :) I have added some thoughts to the ticket in response to some of the postings by others.

    One question in my mind: Is there some portion of the package database kept in C code that is not visible to script code? And if so, is that global to all threads, which would require some sort of locking?

    That could explain why it takes more than one thread to see a problem. Obviously, if a package require math (or any other one that fails) were failing in a single threaded program, people would have noticed long before now.

    I recall years ago when doing a package require Tk inside of multiple threads caused access violations on linux (but not windows). That was fixed when some additional locking was added. Perhaps this is a similar situation.



    -eric



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Harald Oehlmann@wortkarg3@yahoo.com to comp.lang.tcl on Fri Jul 18 08:03:47 2025
    From Newsgroup: comp.lang.tcl

    Am 17.07.2025 um 21:52 schrieb et99:
    On 7/17/2025 12:04 AM, Harald Oehlmann wrote:
    Am 17.07.2025 um 05:22 schrieb et99:
    I have found a bug in tcl/tk 9.0x that occurs on windows when doing a
    [package require math] inside of several threads concurrently. I have
    tested several different distributions, a 9.01 and 9.02 magicsplat, a
    bawt 9.01 distro and tclkit, and as a control, 8.6.16 magicsplat
    (which does not fail).

    I am sorry that nobody picked it up.
    Symbol build and Visual Studio Debug->Attach to process is not soo
    complicated...

    I have added my ideas to the ticket.

    Sometimes, it is necessary to wave with all hands to get attention,
    sorry.

    Thanks for all,
    Harald
    Thanks, I didn't want to shout too loudly :) I have added some thoughts
    to the ticket in response to some of the postings by others.

    One question in my mind: Is there some portion of the package database
    kept in C code that is not visible to script code? And if so, is that
    global to all threads, which would require some sort of locking?

    That could explain why it takes more than one thread to see a problem. Obviously, if a package require math (or any other one that fails) were failing in a single threaded program, people would have noticed long
    before now.

    I recall years ago when doing a package require Tk inside of multiple threads caused access violations on linux (but not windows). That was
    fixed when some additional locking was added. Perhaps this is a similar situation.



    -eric




    Yes, package code is in C.
    Please only use the ticket to comment.
    We have now Mega-Wizard Sergey on the track. Please try to support him.

    Thanks for all,
    Harald

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Fri Jul 18 14:12:53 2025
    From Newsgroup: comp.lang.tcl

    On 7/17/2025 11:03 PM, Harald Oehlmann wrote:

    Yes, package code is in C.
    Please only use the ticket to comment.
    We have now Mega-Wizard Sergey on the track. Please try to support him.

    Thanks for all,
    Harald


    Thanks for looking into this. And way to go Sergey!

    I see Sergey has closed the ticket with a ref count fix, which turned out to be in the file system area. I will await 9.0.3 and test again. My tasks module can give threads a real run for the money, as I have code that includes the launching of many dozens of threads, some even hundreds, making use of tsv variables and mutexes etc.

    I might also be one of only a few, who runs with Tk GUI windows in each thread. That can really push the limits.

    Thanks again Harald and Sergey.

    -eric





    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Harald Oehlmann@wortkarg3@yahoo.com to comp.lang.tcl on Sat Jul 19 09:32:14 2025
    From Newsgroup: comp.lang.tcl

    Am 18.07.2025 um 23:12 schrieb et99:
    On 7/17/2025 11:03 PM, Harald Oehlmann wrote:

    Yes, package code is in C.
    Please only use the ticket to comment.
    We have now Mega-Wizard Sergey on the track. Please try to support him.

    Thanks for all,
    Harald


    Thanks for looking into this. And way to go Sergey!

    I see Sergey has closed the ticket with a ref count fix, which turned
    out to be in the file system area. I will await 9.0.3 and test again. My tasks module can give threads a real run for the money, as I have code
    that includes the launching of many dozens of threads, some even
    hundreds, making use of tsv variables and mutexes etc.

    I might also be one of only a few, who runs with Tk GUI windows in each thread. That can really push the limits.

    Thanks again Harald and Sergey.

    -eric






    No, it would be great to test it now, sorry.
    If a bug is fixed, the original reporter should check it.

    Sorry,
    Harald
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Mon Jul 21 18:35:22 2025
    From Newsgroup: comp.lang.tcl

    On 7/19/2025 12:32 AM, Harald Oehlmann wrote:


    No, it would be great to test it now, sorry.
    If a bug is fixed, the original reporter should check it.

    Sorry,
    Harald

    Success!

    In order to test the new code, I needed to build a wish91.exe from latest source and copy that into a magicsplat bin directory so it could load the math and Thread packages.

    Since I didn't know how to do that, I took a stab at getting chatgpt to help. To my amazement and surprise, it knows how to build tcl/tk 9 and knows where to place all the files.

    It wrote scripts to download the latest source, build it into a temp directory, and copy in the new .exe and .dll's to a copy of the magicsplat 9.0.2 distro.

    Testing the new code from Sergey was a huge success! No failures in over 13000 runs.

    Let me know if anyone here is interested in the batch scripts that chatgpt wrote for me. I even got it to build with debug and it helped me to create a visual studio 2022 project file. I then started it in debug and was able to step through the C code.

    -eric

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Tue Jul 22 09:27:21 2025
    From Newsgroup: comp.lang.tcl

    * et99 <et99@rocketship1.me>
    | In order to test the new code, I needed to build a wish91.exe from
    | latest source and copy that into a magicsplat bin directory so it
    | could load the math and Thread packages.

    | Since I didn't know how to do that, I took a stab at getting chatgpt
    | to help. To my amazement and surprise, it knows how to build tcl/tk 9
    | and knows where to place all the files.

    [slightly getting OT]
    I wonder what Chatgpt would have done without the win/README file in the
    TCL distribution which holds the relevant information ;-)

    R', "I'm too old for this *beep*" (Sgt Murtaugh)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From et99@et99@rocketship1.me to comp.lang.tcl on Tue Jul 22 01:08:34 2025
    From Newsgroup: comp.lang.tcl

    On 7/22/2025 12:27 AM, Ralf Fassel wrote:

    [slightly getting OT]
    I wonder what Chatgpt would have done without the win/README file in the
    TCL distribution which holds the relevant information ;-)


    I started up wish in visual studio, hit continue, and with a console created a thread or two.

    It showed me how to see the active threads, but they weren't changing. It said:

    Hit pause, only then will Visual Studio refresh the list and show any new threads (like those created by thread::create in Tcl). I didn't tell it I was doing that, it had read my mind!

    After the pause, it stopped in some win32 code. I asked it how to stop at some tcl source code. It said, look on the stack and you'll find something you can double click on like,

    tcl91.dll!Tcl_DoOneEvent()

    I bet it has read the entire tcl wiki too!

    And this was just with the free tier. Put this in a robot in a few years and it'll probably cut my meat for me at dinner too w/o even asking!

    -e



    --- Synchronet 3.21a-Linux NewsLink 1.2