• OOS approach revisited

    From zbigniew2011@zbigniew2011@gmail.com (LIT) to comp.lang.forth on Mon Jun 23 05:09:47 2025
    From Newsgroup: comp.lang.forth

    1 VARIABLE X
    2 VARIABLE Y
    3 VARIABLE Z

    : TEST1 1000 0 DO 10000 0 DO X @ Y @ + Z ! LOOP LOOP ; ok
    : TEST2 1000 0 DO 10000 0 DO X Y Z +> LOOP LOOP ; ok
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 121 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 71 ok

    : TEST1 1000 0 DO 10000 0 DO 1 X +! 1 Y +! X @ Y @ + Z ! LOOP LOOP ;
    ok
    : TEST2 1000 0 DO 10000 0 DO X ++ Y ++ X Y Z +> LOOP LOOP ; ok
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 217 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 132 ok

    ' moves a sum of two variables into body of the third one.

    The results are rather promising, from one can see.

    --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From dxf@dxforth@gmail.com to comp.lang.forth on Tue Jun 24 23:28:13 2025
    From Newsgroup: comp.lang.forth

    On 23/06/2025 3:09 pm, LIT wrote:
    1 VARIABLE X
    2 VARIABLE Y
    3 VARIABLE Z

    : TEST1 1000 0 DO 10000 0 DO  X @ Y @ + Z !  LOOP LOOP ; ok
    : TEST2 1000 0 DO 10000 0 DO  X Y Z +>  LOOP LOOP ; ok
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 121 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 71 ok

    : TEST1 1000 0 DO 10000 0 DO  1 X +! 1 Y +! X @ Y @ + Z !  LOOP LOOP ;
    ok
    : TEST2 1000 0 DO 10000 0 DO  X ++ Y ++  X Y Z +>  LOOP LOOP ; ok
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 217 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 132 ok

    ' moves a sum of two variables into body of the third one.

    The results are rather promising, from one can see.

    The saving come from rolling @ @ + ! into a single very specialized
    function. But what about the loading of X Y and retrieving of Z which
    are unavoidable in practice? Should that not be included in the test?

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From zbigniew2011@zbigniew2011@gmail.com (LIT) to comp.lang.forth on Thu Jun 26 17:27:48 2025
    From Newsgroup: comp.lang.forth

    The saving come from rolling @ @ + ! into a single very specialized function. But what about the loading of X Y and retrieving of Z which
    are unavoidable in practice? Should that not be included in the test?

    Let's find out then:

    1 VARIABLE X
    2 VARIABLE Y
    3 VARIABLE Z

    : TEST1 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X @ Y @ + Z ! Z @ DROP
    LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X Y Z +> Z @ DROP
    LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 252 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 202 ok

    : TEST1 1000 0 DO 10000 0 DO
    I DUP X ! Y ! 1 X +! 1 Y +! X @ Y @ + Z ! Z @ DROP
    LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X ++ Y ++ X Y Z +> Z @ DROP
    LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 346 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 258 ok

    The difference is smaller - still it's significant.


    Another test - using the "drawing a box" example
    from "Thinking Forth" (and "simulated" LINE word):

    0 VARIABLE TOP
    0 VARIABLE LEFT
    0 VARIABLE BOTTOM
    0 VARIABLE RIGHT
    : LINE 2DROP 2DROP ;

    : BOX1 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
    LEFT @ TOP @ RIGHT @ TOP @ LINE
    RIGHT @ TOP @ RIGHT @ BOTTOM @ LINE
    RIGHT @ BOTTOM @ LEFT @ BOTTOM @ LINE
    LEFT @ BOTTOM @ LEFT @ TOP @ LINE ;

    : BOX2 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
    LEFT TOP RIGHT TOP LINE
    RIGHT TOP RIGHT BOTTOM LINE
    RIGHT BOTTOM LEFT BOTTOM LINE
    LEFT BOTTOM LEFT TOP LINE ;

    : TEST1 1000 0 DO 10000 0 DO I DUP 2DUP BOX1 LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO I DUP 2DUP BOX2 LOOP LOOP ;

    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 890 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 653 ok


    The difference is even more significant in case
    of multiplication:

    1 VARIABLE X
    2 VARIABLE Y
    3 VARIABLE Z

    : TEST1 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X @ Y @ * Z ! Z @ DROP
    LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X Y Z *> Z @ DROP
    LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 658 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 200 ok

    But this time better implementation has also
    its impact; fig-Forth's '*' is inefficient,
    and I coded '*>' of course directly in ML,
    simply using IMUL.

    --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Fri Jun 27 02:16:20 2025
    From Newsgroup: comp.lang.forth

    On Thu, 26 Jun 2025 17:27:48 +0000, LIT wrote:

    The saving come from rolling @ @ + ! into a single very specialized
    function. But what about the loading of X Y and retrieving of Z which
    are unavoidable in practice? Should that not be included in the test?

    Let's find out then:

    1 VARIABLE X
    2 VARIABLE Y
    3 VARIABLE Z

    : TEST1 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X @ Y @ + Z ! Z @ DROP
    LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X Y Z +> Z @ DROP
    LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 252 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 202 ok

    : TEST1 1000 0 DO 10000 0 DO
    I DUP X ! Y ! 1 X +! 1 Y +! X @ Y @ + Z ! Z @ DROP
    LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X ++ Y ++ X Y Z +> Z @ DROP
    LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 346 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 258 ok

    The difference is smaller - still it's significant.


    Another test - using the "drawing a box" example
    from "Thinking Forth" (and "simulated" LINE word):

    0 VARIABLE TOP
    0 VARIABLE LEFT
    0 VARIABLE BOTTOM
    0 VARIABLE RIGHT
    : LINE 2DROP 2DROP ;

    : BOX1 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
    LEFT @ TOP @ RIGHT @ TOP @ LINE
    RIGHT @ TOP @ RIGHT @ BOTTOM @ LINE
    RIGHT @ BOTTOM @ LEFT @ BOTTOM @ LINE
    LEFT @ BOTTOM @ LEFT @ TOP @ LINE ;

    : BOX2 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
    LEFT TOP RIGHT TOP LINE
    RIGHT TOP RIGHT BOTTOM LINE
    RIGHT BOTTOM LEFT BOTTOM LINE
    LEFT BOTTOM LEFT TOP LINE ;

    : TEST1 1000 0 DO 10000 0 DO I DUP 2DUP BOX1 LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO I DUP 2DUP BOX2 LOOP LOOP ;

    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 890 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 653 ok


    The difference is even more significant in case
    of multiplication:

    1 VARIABLE X
    2 VARIABLE Y
    3 VARIABLE Z

    : TEST1 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X @ Y @ * Z ! Z @ DROP
    LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
    I DUP X ! Y ! X Y Z *> Z @ DROP
    LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 658 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 200 ok

    But this time better implementation has also
    its impact; fig-Forth's '*' is inefficient,
    and I coded '*>' of course directly in ML,
    simply using IMUL.


    With so many DO..LOOPs involved, be careful not to
    measure more looping time than the multiplications.

    IIRC DO..LOOPs had been a hack for computers in the 60s.
    A rather ugly hack, born out of necessity, slow and
    often cumbersome to use. That it still persists in Forth
    half a century later speaks for Forth's progressiveness.

    --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From dxf@dxforth@gmail.com to comp.lang.forth on Fri Jun 27 13:39:53 2025
    From Newsgroup: comp.lang.forth

    On 27/06/2025 3:27 am, LIT wrote:
    ...
    Another test - using the "drawing a box" example
    from "Thinking Forth" (and "simulated" LINE word):

    I believe Paul wrote a version using stack ops.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From dxf@dxforth@gmail.com to comp.lang.forth on Fri Jun 27 17:29:30 2025
    From Newsgroup: comp.lang.forth

    On 27/06/2025 12:16 pm, minforth wrote:
    ...
    IIRC DO..LOOPs had been a hack for computers in the 60s.
    A rather ugly hack, born out of necessity, slow and
    often cumbersome to use. That it still persists in Forth
    half a century later speaks for Forth's progressiveness.

    Testing FOR NEXT on my DTC system showed 15% speed increase over
    DO LOOP. Putting 5 NOOPs (executes forth's address interpreter)
    in the innermost loop brought it down to 6%. Not worth it IMO.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From minforth@minforth@gmx.net to comp.lang.forth on Fri Jun 27 11:49:11 2025
    From Newsgroup: comp.lang.forth

    Am 27.06.2025 um 09:29 schrieb dxf:
    On 27/06/2025 12:16 pm, minforth wrote:
    ...
    IIRC DO..LOOPs had been a hack for computers in the 60s.
    A rather ugly hack, born out of necessity, slow and
    often cumbersome to use. That it still persists in Forth
    half a century later speaks for Forth's progressiveness.

    Testing FOR NEXT on my DTC system showed 15% speed increase over
    DO LOOP. Putting 5 NOOPs (executes forth's address interpreter)
    in the innermost loop brought it down to 6%. Not worth it IMO.


    It really depends on how counted loops are implemented.
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From zbigniew2011@zbigniew2011@gmail.com (LIT) to comp.lang.forth on Fri Jun 27 16:55:14 2025
    From Newsgroup: comp.lang.forth

    It really depends on how counted loops are implemented.
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.

    In that old fig-Forth it's rather short and simple:

    sqHeader '(LOOP)'
    XLOOP dw $ + 2
    mov BX,1
    XLOO1: add [BP],BX
    mov AX,[BP]
    sub AX,[BP+2]
    xor AX,BX
    js BRAN1
    add BP,4
    inc SI
    inc SI
    jmp NEXT

    It doesn't look that bad. Can it be
    done even shorter?

    --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Fri Jun 27 20:15:16 2025
    From Newsgroup: comp.lang.forth

    In article <bc63996456fe967e5c66d17cbbeb21c2@www.novabbs.com>,
    LIT <zbigniew2011@gmail.com> wrote:
    It really depends on how counted loops are implemented.
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.

    In that old fig-Forth it's rather short and simple:

    sqHeader '(LOOP)'
    XLOOP dw $ + 2
    mov BX,1
    XLOO1: add [BP],BX
    mov AX,[BP]
    sub AX,[BP+2]
    xor AX,BX
    js BRAN1
    add BP,4
    inc SI
    inc SI
    jmp NEXT

    It doesn't look that bad. Can it be
    done even shorter?

    My optimiser looks into the combination of DO and LOOP,
    transfers the returns stack into registers after inlining
    everything. It is near vfx performance.
    All experimental, but yes there is much to be gained.

    Groetjes Albert
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From minforth@minforth@gmx.net to comp.lang.forth on Fri Jun 27 22:35:32 2025
    From Newsgroup: comp.lang.forth

    Am 27.06.2025 um 20:15 schrieb albert@spenarnc.xs4all.nl:
    In article <bc63996456fe967e5c66d17cbbeb21c2@www.novabbs.com>,
    LIT <zbigniew2011@gmail.com> wrote:
    It really depends on how counted loops are implemented.
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.

    In that old fig-Forth it's rather short and simple:

    sqHeader '(LOOP)'
    XLOOP dw $ + 2
    mov BX,1
    XLOO1: add [BP],BX
    mov AX,[BP]
    sub AX,[BP+2]
    xor AX,BX
    js BRAN1
    add BP,4
    inc SI
    inc SI
    jmp NEXT

    It doesn't look that bad. Can it be
    done even shorter?

    My optimiser looks into the combination of DO and LOOP,
    transfers the returns stack into registers after inlining
    everything. It is near vfx performance.
    All experimental, but yes there is much to be gained.

    Must be tricky to do UNLOOP in a register-based loop. ;-)

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sat Jun 28 12:03:24 2025
    From Newsgroup: comp.lang.forth

    On 28/06/2025 2:55 am, LIT wrote:
    It really depends on how counted loops are implemented.
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.

    In that old fig-Forth it's rather short and simple:

           sqHeader '(LOOP)'
    XLOOP   dw   $ + 2
           mov  BX,1
    XLOO1:  add  [BP],BX
           mov  AX,[BP]
           sub  AX,[BP+2]
           xor  AX,BX
           js   BRAN1
           add  BP,4
           inc  SI
           inc  SI
           jmp  NEXT

    It doesn't look that bad. Can it be
    done even shorter?

    In fact 83-LOOP was designed to be faster (at the cost of a
    more complicated DO which isn't part of the loop).

    ; (+loop) ( n -- )

    xploo: pop ax
    add [bp],ax
    jno bran
    add si,cw

    ; UNLOOP ( -- )

    unloo: add bp,cw*2
    nextt

    Loops have been one of the most studied aspects of Forth.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sat Jun 28 11:34:10 2025
    From Newsgroup: comp.lang.forth

    In article <mc8dkkFeh4uU1@mid.individual.net>,
    minforth <minforth@gmx.net> wrote:
    Am 27.06.2025 um 20:15 schrieb albert@spenarnc.xs4all.nl:
    In article <bc63996456fe967e5c66d17cbbeb21c2@www.novabbs.com>,
    LIT <zbigniew2011@gmail.com> wrote:
    It really depends on how counted loops are implemented.
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.

    In that old fig-Forth it's rather short and simple:

    sqHeader '(LOOP)'
    XLOOP dw $ + 2
    mov BX,1
    XLOO1: add [BP],BX
    mov AX,[BP]
    sub AX,[BP+2]
    xor AX,BX
    js BRAN1
    add BP,4
    inc SI
    inc SI
    jmp NEXT

    It doesn't look that bad. Can it be
    done even shorter?

    My optimiser looks into the combination of DO and LOOP,
    transfers the returns stack into registers after inlining
    everything. It is near vfx performance.
    All experimental, but yes there is much to be gained.

    Must be tricky to do UNLOOP in a register-based loop. ;-)

    Indeed. One of my testcase is:
    ---------------------------
    \ Interfering LEAVEs and EXITs.
    : (TESTL) DO
    DUP IF LEAVE ELSE UNLOOP EXIT THEN SWAP
    2DUP IF LEAVE ELSE UNLOOP EXIT THEN 2SWAP
    LOOP ROT ;
    : testL (TESTL) 2OVER ;
    SEE (TESTL)
    'testL SHOW-IT
    -------------------------------------------------
    The result after inlining is:

    ---------------------------------------------


    : testL
    0 >R SWAP >R >R DUP
    0BRANCH [ 20 , ] ( between ? UNLOOP )
    BRANCH [ B0 , ] ( between ? UNLOOP )
    BRANCH [ 18 , ] ( between ? SWAP ) UNLOOP
    BRANCH [ 98 , ] ( between ROT 2OVER ) SWAP 2DUP
    0BRANCH [ 20 , ] ( between ? UNLOOP )
    BRANCH [ 58 , ] ( between ? UNLOOP )
    BRANCH [ 18 , ] ( between ? 2SWAP ) UNLOOP
    BRANCH [ 40 , ] ( between ROT 2OVER ) 2SWAP 1 (+LOOP)
    0BRANCH [ -D8 , ] ( between >R DUP ) UNLOOP ROT 2OVER
    ;

    --------------------------------------------
    then after some peepholing the assembly looks like
    Report about return stack usage
    new report
    2 8 1 0
    1 9 1 1
    0 10 2 2

    LEA, BP'| XO| [BP] 4294967272 L,
    Q: MOVI, X| R| BX| 0 IL,
    Q: MOV, X| F| BX'| XO| [BP] 16 L,
    POP|X, DX|
    POP|X, AX|
    PUSH|X, DX|
    Q: MOV, X| F| AX'| XO| [BP] 8 L,
    POP|X, BX|
    Q: MOV, X| F| BX'| XO| [BP] 0 L,
    POP|X, AX|
    PUSH|X, AX|
    Q: TEST, X| AX'| R| AX|
    J|X, Z| Y| 17 (RL,)
    POP|X, DX|
    POP|X, BX|
    POP|X, AX|
    PUSH|X, BX|
    PUSH|X, DX|
    PUSH|X, AX|
    LEA, BP'| XO| [BP] 24 L,
    JMP, 27 (RL,)
    LEA, BP'| XO| [BP] 24 L,
    JMP, 16 (RL,)
    JMP, 4294967263 (RL,)
    LEA, BP'| XO| [BP] 24 L,
    JMP, 0 (RL,)
    POP|X, BX|
    POP|X, CX|
    POP|X, AX|
    POP|X, DX|
    PUSH|X, DX|
    PUSH|X, AX|
    PUSH|X, CX|
    PUSH|X, BX|
    PUSH|X, DX|
    PUSH|X, AX|

    You see that here the elimination of BP (return stack) has not succeeded.
    Three BRANCH/0BRANCH have disappeared though.

    Simple cases are more succesful:
    ------------------------------------
    : test2aa 4 >R 2 >R 1 R> 3 R> ;
    'test2aa SHOW-IT
    : test2aa
    4 >R 2 >R 1 R> 3 R>
    ;
    Report about return stack usage
    new report
    1 8 1 1
    0 9 1 1

    PUSHI|X, 1 IL,
    PUSHI|X, 2 IL,
    PUSHI|X, 3 IL,
    QN: MOVI, X| R| AX| 4 IL,
    QN: MOVI, X| R| CX| 2 IL,
    PUSHI|X, 4 IL,
    ------------------------------------
    But the optimiser doesn't detect that moving into registers AX CX can
    be eleminated. (only DSP RSP and HIP - present in SP BP DI - are live
    at the end of a definition.





    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stephen Pelc@stephen@vfxforth.com to comp.lang.forth on Sat Jun 28 09:37:33 2025
    From Newsgroup: comp.lang.forth

    On 27 Jun 2025 at 22:35:32 CEST, "minforth" <minforth@gmx.net> wrote:

    Am 27.06.2025 um 20:15 schrieb albert@spenarnc.xs4all.nl:
    In article <bc63996456fe967e5c66d17cbbeb21c2@www.novabbs.com>,
    LIT <zbigniew2011@gmail.com> wrote:
    It really depends on how counted loops are implemented.
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.

    In that old fig-Forth it's rather short and simple:

    sqHeader '(LOOP)'
    XLOOP dw $ + 2
    mov BX,1
    XLOO1: add [BP],BX
    mov AX,[BP]
    sub AX,[BP+2]
    xor AX,BX
    js BRAN1
    add BP,4
    inc SI
    inc SI
    jmp NEXT

    It doesn't look that bad. Can it be
    done even shorter?

    My optimiser looks into the combination of DO and LOOP,
    transfers the returns stack into registers after inlining
    everything. It is near vfx performance.
    All experimental, but yes there is much to be gained.

    Must be tricky to do UNLOOP in a register-based loop. ;-)

    Here are the code generators for VFX x64 LOOP and UNLOOP.
    All the complexity is in the DO and ?DO code.

    : c_loop \ mrk> drbid -- ; compile code for LOOP ; SFP094
    c_shuffle reset-opt \ SFP097
    a[ INC r14 ]a use-a \ update index
    a[ INC r15 ]a use-a \ update limit-index-$8000.0000
    a[ JNO ]a <ares use-a \ resolve backward branch
    c_unloop \ remove DO ... LOOP state
    >RES \ resolve forward branch
    ;

    : c_unloop \ -- ; compile code for UNLOOP
    c_shuffle reset-opt
    a[ pop r14 \ restore old index
    pop r15 \ restore old index-limit-xorbit63
    pop rax \ discarded
    ]a use-a
    ;

    Stephen
    --
    Stephen Pelc, stephen@vfxforth.com
    Wodni & Pelc GmbH
    Vienna, Austria
    Tel: +44 (0)7803 903612, +34 649 662 974 http://www.vfxforth.com/downloads/VfxCommunity/
    free VFX Forth downloads
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Jun 28 10:23:51 2025
    From Newsgroup: comp.lang.forth

    minforth <minforth@gmx.net> writes:
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    Which "operators" do you have in mind, and what do you mean with
    "blazingly fast".

    Anyway, we have discussed this repeatedly, e.g., in <2022Feb13.231208@mips.complang.tuwien.ac.at> I wrote in reply to your
    posting <f4b89e0b-2ded-4b18-8dc1-bba6dcda47bbn@googlegroups.com>, and
    cited earlier discussions in the topic.

    |"minf...@arcor.de" <minforth@arcor.de> writes:
    [...]
    F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in=
    _CX register)
    and you'll happily count down from 5 to 1.
    |
    |Yes, but why would one do this? As we have established in an earlier |discussion (see below), the LOOP instruction is typically not faster
    |than a sequence of simpler instructions:
    |
    |<2018Jun6.184616@mips.complang.tuwien.ac.at>:
    ||minforth@arcor.de writes:
    FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter. ||>Should do speedy enough. ;-)
    ||
    ||Have you measured it? I have
    ||<2017Mar14.183125@mips.complang.tuwien.ac.at> ||<2017Mar15.141411@mips.complang.tuwien.ac.at> and compared the
    ||following loops:
    ||
    ||.L5: .L5:
    || subq $1, %rax loop .L5
    || jne .L5
    ||
    ||I found that for these loops Sandy Bridge, Haswell, and Skylake take
    ||~4 cycles per iteration using LOOP, and 1-2 cycles per iteration when
    ||using jne.
    |
    |<2018Jun7.141731@mips.complang.tuwien.ac.at>:
    ||cycles for 1000 iterations
    || K10 Excavator Zen
    ||Phenom II Athlon X4 845 Ryzen 1600X
    || 3021 1314 1051 loop
    || 2020 1484 1051 sub; jne
    || 2026 1489 1053 add; cmp; jne
    |
    |There is no performance advantage on modern AMD and Intel CPUs for the |instruction LOOP over a good implementation of the Forth word LOOP (as
    |in the third example).

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.

    You obviously ignore repeated refutations of your claims of superior performance for LOOP-instruction-based counted loops. Maybe you
    should implement and measure such a counted loop yourself and compare
    it to the LOOP word on SwiftForth and VFX Forth.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Sat Jun 28 14:26:08 2025
    From Newsgroup: comp.lang.forth

    In article <2025Jun28.122351@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    minforth <minforth@gmx.net> writes:
    Most CPUs have operators for register-based count-down loops
    that are blazingly fast.

    Which "operators" do you have in mind, and what do you mean with
    "blazingly fast".

    Anyway, we have discussed this repeatedly, e.g., in ><2022Feb13.231208@mips.complang.tuwien.ac.at> I wrote in reply to your >posting <f4b89e0b-2ded-4b18-8dc1-bba6dcda47bbn@googlegroups.com>, and
    cited earlier discussions in the topic.

    |"minf...@arcor.de" <minforth@arcor.de> writes:
    [...]
    F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in= >|> _CX register)
    and you'll happily count down from 5 to 1.
    |
    |Yes, but why would one do this? As we have established in an earlier >|discussion (see below), the LOOP instruction is typically not faster
    |than a sequence of simpler instructions:
    |
    |<2018Jun6.184616@mips.complang.tuwien.ac.at>:
    ||minforth@arcor.de writes:
    FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter. >||>Should do speedy enough. ;-)
    ||
    ||Have you measured it? I have >||<2017Mar14.183125@mips.complang.tuwien.ac.at> >||<2017Mar15.141411@mips.complang.tuwien.ac.at> and compared the
    ||following loops:
    ||
    ||.L5: .L5:
    || subq $1, %rax loop .L5
    || jne .L5
    ||
    ||I found that for these loops Sandy Bridge, Haswell, and Skylake take
    ||~4 cycles per iteration using LOOP, and 1-2 cycles per iteration when >||using jne.
    |
    |<2018Jun7.141731@mips.complang.tuwien.ac.at>:
    ||cycles for 1000 iterations
    || K10 Excavator Zen
    ||Phenom II Athlon X4 845 Ryzen 1600X
    || 3021 1314 1051 loop
    || 2020 1484 1051 sub; jne
    || 2026 1489 1053 add; cmp; jne
    |
    |There is no performance advantage on modern AMD and Intel CPUs for the >|instruction LOOP over a good implementation of the Forth word LOOP (as
    |in the third example).

    If they can be used within Forth-based loop constructs
    I would expect a greater speed increase than what you measured.

    You obviously ignore repeated refutations of your claims of superior >performance for LOOP-instruction-based counted loops. Maybe you
    should implement and measure such a counted loop yourself and compare
    it to the LOOP word on SwiftForth and VFX Forth.

    It is good to remember how the severe CISC instruction came about.
    Early Intel processors 8086 were severely cramped in memory space.
    While the 16 bit 68000 has a generous 32 bit space, they were restricted
    to 16 bit. They added a segmented memory and 1 byte version for two byte instructions until finally the 386 arrived.
    Compiler writers were not inclined to gain (or even loss) a moderate speed advantage to save a few bytes.
    There is more to it.
    The first thing e.g. I had to do in my optimiser to expand 1 byte
    instructions to not explode the possibilities I had to consider.


    - anton

    Groetjes Albert
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sat Jun 28 22:41:41 2025
    From Newsgroup: comp.lang.forth

    FOR/NEXT DO/LOOP wars have raged ever since Moore introduced the former.
    Here's a [cold] blast from the past...

    Minutes of the FIGGY BAR RT Conference.

    Date: 11/16/89 Time: 22:36EST

    <[Wil] W.BADEN1> DO/LOOP will never be as efficient as FOR/NEXT.
    <[Wil] W.BADEN1> FOR/NEXT will never be as effective as DO/LOOP.

    Date: 06/28/90 Time: 22:32EDT

    <[Wil] W.BADEN1> Chuck walked out of ANSI becuz it wudn't make
    FOR...NEXT required.
    <[Wil] W.BADEN1> That was 1988. And the operative word is "required."



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Jun 28 16:04:07 2025
    From Newsgroup: comp.lang.forth

    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    You obviously ignore repeated refutations of your claims of superior >performance for LOOP-instruction-based counted loops. Maybe you
    should implement and measure such a counted loop yourself and compare
    it to the LOOP word on SwiftForth and VFX Forth.

    Actually it is possible to implement the LOOP word such that it uses
    the LOOP instruction by modifying DO/?DO, I, and J to go along with
    them.

    E.g., VFX generates the following code for LOOP

    ( 0050A25D 49FFC6 ) INC R14
    ( 0050A260 49FFC7 ) INC R15
    ( 0050A263 71F8 ) JNO 0050A25D

    Here R14 contains the index (I), and R15 contains
    (index-limit) xor 2^63. This value is conditioned such that INC
    will set the overflow flag when index reaches limit.

    The benefit of the overflow-flag approach is only relevant for +LOOP.
    We will ignore that for now, but return to it later.

    Instead, you could keep limit-index in RCX. Then you could implement
    a VFX-style LOOP as follows:

    inc r14
    loop <target>

    The LOOP instruction will decrement RCX until it reaches 0, i.e.,
    until index equals limit.

    Ok, this still needs the additional INC instruction that you may want
    to avoid. So let's look at SwiftForth. Here LOOP compiles to

    4519C5 R14 INC 49FFC6
    4519C8 4519BC JNO 0F81EEFFFFFF

    So here R14 (not R15) contains (index-limit) xor 2^63. We will
    discuss I later.

    You could instead have limit-index in RCX, and then let the LOOP word
    generate

    loop <target>

    Look, Ma, LOOP implemented with LOOP!

    Now what about I? SwiftForth implements I by keeping limit xor 2^63
    in R15. Then I is R15+R14 (and SwiftForth's I is implemented to
    produce that computation).

    For our LOOP-instruction-base LOOP, we could keep limit in, say, R15.
    Then I is R15-RCX.

    Now what about +LOOP ? In the usual case the increment is a constant.
    For a positive increment, you can implement +LOOP as

    add rcx, increment
    jnc <target>

    For a negative increment, you can implement +LOOP as

    add rcx, increment
    jc <target>

    Only in the case of an increment that is unknown at compile time, you
    have to resort to something with jno, maybe

    mov r14, rcx
    add rcx, increment
    btc r14, 63
    add r14, increment
    jno <target>

    You can't have everything:-)

    Anyway, given that LOOP is slower than what SwiftForth uses for the
    LOOP word now on several CPUs and not faster on almost all others,
    nobody interested in performance will go there. But if there ever is
    a performance advantage to using the LOOP instruction, we have this
    option. So no need to resort to FOR ... NEXT even in that case.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Jun 28 17:46:49 2025
    From Newsgroup: comp.lang.forth

    minforth@gmx.net (minforth) writes:
    IIRC DO..LOOPs had been a hack for computers in the 60s.
    A rather ugly hack, born out of necessity, slow and
    often cumbersome to use.

    Could you elaborate on "a hack for computers in the 60s"?

    And while DO has an obvious shortcoming (partially addressed by ?DO),
    I have found that variations on ?DO..LOOP are quite helpful in keeping
    the number of items on the data stack manageable. They mean that I
    don't have to deal with the index and limit in the loop body, and that
    they are also out of the way, so I don't have to think about them in
    the loop body. And when I need the loop index, "I" gives it to me,
    like an automatically-defined local. To gain these advantages, I
    often prefer to massage the start and limit values more than
    otherwise. E.g., to walk through a cell array in the forward
    direction, I usually prefer

    ( x y addr u ) cells bounds ?do ( x1 y1 )
    ... i @ ...
    1 cells +loop

    rather than

    ( x y addr u ) 0 ?do ( x1 y1 addr )
    ... dup i th @ ...
    loop
    drop

    because I then can access x1 and y1 in the loop without ADDR being in
    the way.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From sean@sean@conman.org to comp.lang.forth on Sat Jun 28 20:04:56 2025
    From Newsgroup: comp.lang.forth

    It was thus said that the Great dxf <dxforth@gmail.com> once stated:
    <[Wil] W.BADEN1> Chuck walked out of ANSI becuz it wudn't make
    FOR...NEXT required.
    <[Wil] W.BADEN1> That was 1988. And the operative word is "required."

    What is the difference between FOR/NEXT and DO/LOOP? Don't they do the
    same thing?

    -spc
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Sat Jun 28 21:01:48 2025
    From Newsgroup: comp.lang.forth

    sean@conman.org writes:
    What is the difference between FOR/NEXT and DO/LOOP? Don't they do the
    same thing?

    FOR ... NEXT on one system does not do the same thing as FOR ... NEXT
    on some other systems, and they all behave different from DO ... LOOP.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From dxf@dxforth@gmail.com to comp.lang.forth on Sun Jun 29 13:04:39 2025
    From Newsgroup: comp.lang.forth

    On 29/06/2025 6:04 am, sean@conman.org wrote:
    It was thus said that the Great dxf <dxforth@gmail.com> once stated:
    <[Wil] W.BADEN1> Chuck walked out of ANSI becuz it wudn't make
    FOR...NEXT required.
    <[Wil] W.BADEN1> That was 1988. And the operative word is "required."

    What is the difference between FOR/NEXT and DO/LOOP? Don't they do the same thing?

    FOR NEXT was conceived as a bare-bones loop that down-counts to zero.
    DO LOOP is more flexible. While Moore was prepared to throw out the
    latter's features for a simpler implementation in microcode, it's fair
    to say most forthers were not.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Mon Jun 30 15:43:22 2025
    From Newsgroup: comp.lang.forth

    On 27-06-2025 04:16, minforth wrote:
    On Thu, 26 Jun 2025 17:27:48 +0000, LIT wrote:

    The saving come from rolling  @ @ + ! into a single very specialized
    function.  But what about the loading of X Y and retrieving of Z which
    are unavoidable in practice?  Should that not be included in the test?

    Let's find out then:

    1 VARIABLE X
    2 VARIABLE Y
    3 VARIABLE Z

    : TEST1 1000 0 DO 10000 0 DO
        I DUP X ! Y ! X @ Y @ + Z ! Z @ DROP
      LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
        I DUP X ! Y ! X Y Z +> Z @ DROP
      LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 252 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 202 ok

    : TEST1 1000 0 DO 10000 0 DO
        I DUP X ! Y ! 1 X +! 1 Y +! X @ Y @ + Z ! Z @ DROP
      LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
        I DUP X ! Y ! X ++ Y ++  X Y Z +> Z @ DROP
      LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 346 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 258 ok

    The difference is smaller - still it's significant.


    Another test - using the "drawing a box" example
    from "Thinking Forth" (and "simulated" LINE word):

    0 VARIABLE TOP
    0 VARIABLE LEFT
    0 VARIABLE BOTTOM
    0 VARIABLE RIGHT
    : LINE 2DROP 2DROP ;

    : BOX1 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
     LEFT  @ TOP    @ RIGHT @ TOP    @ LINE
     RIGHT @ TOP    @ RIGHT @ BOTTOM @ LINE
     RIGHT @ BOTTOM @ LEFT  @ BOTTOM @ LINE
     LEFT  @ BOTTOM @ LEFT  @ TOP    @ LINE ;

    : BOX2 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
     LEFT  TOP    RIGHT TOP    LINE
     RIGHT TOP    RIGHT BOTTOM LINE
     RIGHT BOTTOM LEFT  BOTTOM LINE
     LEFT  BOTTOM LEFT  TOP    LINE ;

    : TEST1 1000 0 DO 10000 0 DO  I DUP 2DUP BOX1  LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO  I DUP 2DUP BOX2  LOOP LOOP ;

    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 890 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 653 ok


    The difference is even more significant in case
    of multiplication:

    1 VARIABLE X
    2 VARIABLE Y
    3 VARIABLE Z

    : TEST1 1000 0 DO 10000 0 DO
        I DUP X ! Y ! X @ Y @ * Z ! Z @ DROP
      LOOP LOOP ;
    : TEST2 1000 0 DO 10000 0 DO
        I DUP X ! Y ! X Y Z *> Z @ DROP
      LOOP LOOP ;
    TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 658 ok
    TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 200 ok

    But this time better implementation has also
    its impact; fig-Forth's '*' is inefficient,
    and I coded '*>' of course directly in ML,
    simply using IMUL.


    With so many DO..LOOPs involved, be careful not to
    measure more looping time than the multiplications.

    IIRC DO..LOOPs had been a hack for computers in the 60s.
    A rather ugly hack, born out of necessity, slow and
    often cumbersome to use. That it still persists in Forth
    half a century later speaks for Forth's progressiveness.

    --

    Not the most beautiful code, but enough stuff to test:

    : LINE 2DROP 2DROP ;
    \ include lib/anstools.4th

    0 [if]
    VARIABLE TOP
    VARIABLE LEFT
    VARIABLE BOTTOM
    VARIABLE RIGHT

    : BOX ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
    LEFT @ TOP @ RIGHT @ TOP @ LINE
    RIGHT @ TOP @ RIGHT @ BOTTOM @ LINE
    RIGHT @ BOTTOM @ LEFT @ BOTTOM @ LINE
    LEFT @ BOTTOM @ LEFT @ TOP @ LINE ;
    [then]

    1 [if]
    aka r@ bottom
    aka r'@ top

    \ left right bottom top
    : BOX ( x1 y1 x2 y2)
    rot >r >r
    over over top tuck line
    dup top over bottom line
    over bottom tuck line
    r> over r> line
    ;
    [then]

    hide bottom
    hide top

    \ 1 2 4 8 box
    : TEST1 1000 0 DO 10000 0 DO I DUP 2DUP BOX LOOP LOOP ;
    test1

    \ 1 2 4 2 (TOS)
    \ 4 2 4 8 (TOS)
    \ 4 8 1 8 (TOS)
    \ 1 8 1 2 (TOS)

    Variable version: 0.950s
    Stack version: 0.848s

    Note: 4tH has *THREE* directly addressable Return Stack items. I used
    this to implement the Midpoint Circle algorithm in 4tH. So I could have
    made it even a bit more efficient.

    Note 4tH optimizes variables by appending the VALUE C-behavior to
    (known) variables:

    6| branch 30 BOX
    7| to 2
    8| to 3 RIGHT
    9| to 0
    10| to 1 LEFT
    11| value 1 LEFT
    12| value 0
    13| value 3 RIGHT
    14| value 0
    15| call 0 LINE
    16| value 3 RIGHT
    17| value 0
    18| value 3 RIGHT
    19| value 2
    20| call 0 LINE
    21| value 3 RIGHT
    22| value 2
    23| value 1 LEFT
    24| value 2
    25| call 0 LINE
    26| value 1 LEFT
    27| value 2
    28| value 1 LEFT
    29| value 0
    30| branch 0 LINE

    Using variables is *not* deliberately slow.

    Hans Bezemer

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Hans Bezemer@the.beez.speaks@gmail.com to comp.lang.forth on Tue Jul 1 13:53:52 2025
    From Newsgroup: comp.lang.forth

    On 28-06-2025 19:46, Anton Ertl wrote:
    And while DO has an obvious shortcoming (partially addressed by ?DO),
    I have found that variations on ?DO..LOOP are quite helpful in keeping
    the number of items on the data stack manageable. They mean that I
    don't have to deal with the index and limit in the loop body, and that
    they are also out of the way, so I don't have to think about them in
    the loop body. And when I need the loop index, "I" gives it to me,
    like an automatically-defined local.

    Wow.. I learned this about 20 years ago from the creator of the FIG
    Forth editor. You find it in the "c" and "delete" commands.

    And yeah - you're completely right: it works like a "read-only" local.
    The TORS can be used as a "r/w" local - with the additional penalty of a
    R pair (like 2OS comes with a SWAP SWAP penalty). BTW, knowing this
    gives you hints on how to organize your stacks.

    The DO..LOOP advantages - nah, not really. E.g. an "address" loop can be
    done like (a n = address count):

    OVER SWAP /ELEMENT * + >R
    BEGIN DUP R@ < WHILE ( ..) /ELEMENT + REPEAT R> DROP DROP

    No need for BOUNDS DO..LOOP ..
    FOR..NEXT is even easier:

    >R BEGIN R@ 0> WHILE ( ..) R> 1- >R REPEAT R> DROP

    So for a lot of applications, I don't really need DO..LOOP and its
    deeply flawed implementation. And since R@ and I are synonyms, I can
    even use I if I prefer I! :)

    Usually, if I think of it I could even do less DO..LOOP - but you know
    how it is when a pattern has entered your mind. It's like an ear worm.

    But anyways - that was the reason I defined R'@ and R"@ a long time ago.
    It still feels like cheating if I use them, but they're just synonyms of
    I' and J anyways - so they don't take up much space if you want to
    support them.

    As I've shown, in 4tH you can make the code much clearer by
    (temporarily) assigning synonyms to them.

    Now as I was playing with the idea there were a lot of people claiming
    "It makes no sense adding R'@ and R"@. Been there done that."

    But since I'm so easily convinced, I did it anyway. Can't say I
    regretted that, since it allows you to better balance the load between
    both stacks.

    Hans Bezemer

    P.S. If you want to do an equally misconceived FOR..NEXT as eForth has implemented, place the (..) payload directly behind BEGIN. That will
    make it do 11 iterations when you only asked for 10.



    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From dxf@dxforth@gmail.com to comp.lang.forth on Thu Jul 3 13:59:16 2025
    From Newsgroup: comp.lang.forth

    On 1/07/2025 9:53 pm, Hans Bezemer wrote:
    On 28-06-2025 19:46, Anton Ertl wrote:
    And while DO has an obvious shortcoming (partially addressed by ?DO),
    I have found that variations on ?DO..LOOP are quite helpful in keeping
    the number of items on the data stack manageable.  They mean that I
    don't have to deal with the index and limit in the loop body, and that
    they are also out of the way, so I don't have to think about them in
    the loop body.  And when I need the loop index, "I" gives it to me,
    like an automatically-defined local.

    Wow.. I learned this about 20 years ago from the creator of the FIG Forth editor. You find it in the "c" and "delete" commands.

    And yeah - you're completely right: it works like a "read-only" local.
    The TORS can be used as a "r/w" local - with the additional penalty of a R> >R pair (like 2OS comes with a SWAP SWAP penalty). BTW, knowing this gives you hints on how to organize your stacks.

    The DO..LOOP advantages - nah, not really. E.g. an "address" loop can be done like (a n = address count):

      OVER SWAP /ELEMENT * + >R
      BEGIN DUP R@ < WHILE ( ..) /ELEMENT + REPEAT R> DROP DROP

    No need for BOUNDS DO..LOOP ..
    FOR..NEXT is even easier:

      >R BEGIN R@ 0> WHILE ( ..) R> 1- >R REPEAT R> DROP

    So for a lot of applications, I don't really need DO..LOOP and its deeply flawed implementation. And since R@ and I are synonyms, I can even use I if I prefer I! :)
    ...

    When I need a 'counted' loop DO LOOP is always shorter/faster than a BEGIN REPEAT.
    I provide a couple FOR NEXTs in the distribution for the curious however they provide
    no practical advantage (same footprint and in the case of CP/M even slower). So for
    me the issue was settled long ago. I imagine it's the same for most forthers even if
    their circumstances mean they've opted differently.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Thu Jul 3 07:50:15 2025
    From Newsgroup: comp.lang.forth

    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    The DO..LOOP advantages - nah, not really. E.g. an "address" loop can be >done like (a n = address count):

    OVER SWAP /ELEMENT * + >R
    BEGIN DUP R@ < WHILE ( ..) /ELEMENT + REPEAT R> DROP DROP

    No need for BOUNDS DO..LOOP ..

    Great. You can write a counted loop without body with BEGIN ... WHILE
    ... REPEAT. I can write equivalent code shorter, and it will execute
    more efficiently:

    2DROP

    However, things become more interesting when the body does something,
    in particular when it needs access to additional data beyond just I,
    and maybe even modify some of these stack items. E.g., here's a
    definition from Gforth:

    : del-included-files ( addr u -- )
    included-files $@ cell MEM+DO
    I $@ 2over string-prefix? IF I 0 third $del THEN
    LOOP 2drop ;

    That's a relatively easy case, because the loop body does not modify
    addr nor u. You can find documentation on most used words through <https://net2o.de/gforth/Word-Index.html>. INCLUDED-FILES is a
    variable.

    Another one is:

    : usage# ( nt -- n ) \ gforth-internal
    \G count usage of the word @var{nt}
    0 wheres $@ where-struct MEM+DO
    over i where-nt @ = -
    LOOP nip ;

    Here the body reads one additional stack item ( nt ) and modifies
    another one (the count). WHERES is a variable, WHERE-STRUCT is the
    size of an element of the array that WHERES points to.

    So for a lot of applications, I don't really need DO..LOOP and its
    deeply flawed implementation.

    If your implementation of DO..LOOP is deeply flawed, maybe you should
    replace it with a better one.

    And since R@ and I are synonyms

    They are not, not in the standard, and not on a number of systems.
    E.g., consider the following program:

    : foo 10 5 do cr i . r@ . loop ; foo

    On Gforth and iForth, the result of I and R@ are the same, but on lxf,
    sf64, and vfx64, they are not:

    lxf sf64 vfx64
    I R@ I R@ I R@
    5 2147483643 5 0 5 8388608
    6 2147483644 6 0 6 8388608
    7 2147483645 7 0 7 8388608
    8 2147483646 8 0 8 8388608
    9 2147483647 9 0 9 8388608

    - anton






















    , I can
    even use I if I prefer I! :)

    Usually, if I think of it I could even do less DO..LOOP - but you know
    how it is when a pattern has entered your mind. It's like an ear worm.

    But anyways - that was the reason I defined R'@ and R"@ a long time ago.
    It still feels like cheating if I use them, but they're just synonyms of
    I' and J anyways - so they don't take up much space if you want to
    support them.

    As I've shown, in 4tH you can make the code much clearer by
    (temporarily) assigning synonyms to them.

    Now as I was playing with the idea there were a lot of people claiming
    "It makes no sense adding R'@ and R"@. Been there done that."

    But since I'm so easily convinced, I did it anyway. Can't say I
    regretted that, since it allows you to better balance the load between
    both stacks.

    Hans Bezemer

    P.S. If you want to do an equally misconceived FOR..NEXT as eForth has >implemented, place the (..) payload directly behind BEGIN. That will
    make it do 11 iterations when you only asked for 10.



    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
    EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From minforth@minforth@gmx.net (minforth) to comp.lang.forth on Thu Jul 3 19:33:30 2025
    From Newsgroup: comp.lang.forth

    On Sat, 28 Jun 2025 21:01:48 +0000, Anton Ertl wrote:

    sean@conman.org writes:
    What is the difference between FOR/NEXT and DO/LOOP? Don't they do the >>same thing?

    FOR ... NEXT on one system does not do the same thing as FOR ... NEXT
    on some other systems, and they all behave different from DO ... LOOP.


    Correct. Here are variants with iterators that even run on gforth 0.7.9:

    \ ====== <n> FOR# .. #TIMES
    ==================================================
    \ original: machine code
    \ demo variant: slow Forth

    : _ITERATE \ end xt
    swap
    BEGIN dup 0>
    WHILE over execute 1-
    REPEAT 2drop ;

    : FOR# postpone [: ; IMMEDIATE

    : #TIMES postpone ;] postpone _iterate ; IMMEDIATE

    \ ====== <n> FOR .. N M .. NEXT
    ==============================================
    \ original: machine code with circular iteration control stack
    \ advantage: avoids UNLOOP and doesn't clutter rstack
    \ demo variant: slow Forth, indices on rstack like DO..LOOP, no
    advantage

    \ for gforth:
    : rpick rp@ swap 1+ cells + @ ;

    : N 2 rpick ; ( -- inner-index )
    : M 6 rpick ; ( -- outer-index )

    : _N-ITERATE \ end xt --
    swap >r 0 >r
    BEGIN 2r@ u> \ end n f
    WHILE dup execute r> 1+ >r
    REPEAT 2r> 2drop drop ;

    : FOR postpone [: ; IMMEDIATE

    : NEXT postpone ;] postpone _n-iterate ; IMMEDIATE

    \ ====== <array> <#el> <elsize> FOR{ .. N M .. }NEXT
    =========================
    \ original: machine code with circular iteration control stack
    \ advantage: avoids UNLOOP and doesn't clutter rstack
    \ demo variant: slow Forth, indices on rstack like DO..LOOP, no
    advantage

    : _ARR-ITERATE \ adr els step xt --
    r dup >r * over + swap 2r> 2swap 2>r \ xt size -- s | r: end arr
    BEGIN 2r@ u>
    WHILE over execute dup r> + >r
    REPEAT 2r> 2drop 2drop ;

    : FOR{ postpone [: ; IMMEDIATE

    : }NEXT postpone ;] postpone _arr-iterate ; IMMEDIATE

    \ --- Tests

    : STARS
    20 FOR# '*' emit #TIMES ;
    : NUMBER1
    10 FOR n . NEXT ;
    : NUMBER2
    3 FOR cr 4 FOR n m 1+ * . NEXT NEXT ;
    : TYPE-CHARARRAY \ a u --
    dup >r pad swap move
    pad r> 1 FOR{ n c@ emit }NEXT ;

    CR STARS
    CR NUMBER1
    CR NUMBER2
    CR S" Fortune" TYPE-CHARARRAY

    --
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Gerry Jackson@do-not-use@swldwa.uk to comp.lang.forth on Mon Jul 7 07:54:19 2025
    From Newsgroup: comp.lang.forth

    On 03/07/2025 20:33, minforth wrote:
    On Sat, 28 Jun 2025 21:01:48 +0000, Anton Ertl wrote:

    sean@conman.org writes:
     What is the difference between FOR/NEXT and DO/LOOP?  Don't they do the >>> same thing?

    FOR ... NEXT on one system does not do the same thing as FOR ... NEXT
    on some other systems, and they all behave different from DO ... LOOP.


    Correct. Here are variants with iterators that even run on gforth 0.7.9:

    \ ====== <n> FOR# .. #TIMES ==================================================
    \ original: machine code
    \ demo variant: slow Forth

    : _ITERATE \ end xt
        swap
        BEGIN dup 0>
        WHILE over execute 1-
        REPEAT 2drop ;

    : FOR# postpone [: ; IMMEDIATE

    : #TIMES postpone ;] postpone _iterate ; IMMEDIATE

    \ ====== <n> FOR .. N M .. NEXT ==============================================

    I've found looping quotations useful but I like to include the quotation inside the loop e.g. (without the syntactic sugar and moving the
    iterator inside the quotation):

    : downcount begin [: dup 0> if dup . 1- then ;] over 0> while execute
    repeat 2drop ;
    10 downcount \ displays 10 9 8 7 6 5 4 3 2 1 ok

    Advantages are:
    1) The xt is not passed to the quotation and so doesn't get in the way.
    2) The xt is loaded as a literal when the quotation exits.
    3) The quotation can exit the loop early by EXITing with 0 on the stack.
    --
    Gerry
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From minforth@minforth@gmx.net to comp.lang.forth on Mon Jul 7 10:46:24 2025
    From Newsgroup: comp.lang.forth

    Am 07.07.2025 um 08:54 schrieb Gerry Jackson:
    On 03/07/2025 20:33, minforth wrote:
    On Sat, 28 Jun 2025 21:01:48 +0000, Anton Ertl wrote:

    sean@conman.org writes:
     What is the difference between FOR/NEXT and DO/LOOP?  Don't they do >>>> the
    same thing?

    FOR ... NEXT on one system does not do the same thing as FOR ... NEXT
    on some other systems, and they all behave different from DO ... LOOP.


    Correct. Here are variants with iterators that even run on gforth 0.7.9:

    \ ====== <n> FOR# .. #TIMES
    ==================================================
    \ original: machine code
    \ demo variant: slow Forth

    : _ITERATE \ end xt
         swap
         BEGIN dup 0>
         WHILE over execute 1-
         REPEAT 2drop ;

    : FOR# postpone [: ; IMMEDIATE

    : #TIMES postpone ;] postpone _iterate ; IMMEDIATE

    \ ====== <n> FOR .. N M .. NEXT
    ==============================================

    I've found looping quotations useful but I like to include the quotation inside the loop e.g. (without the syntactic sugar and moving the
    iterator inside the quotation):

    : downcount begin [: dup 0> if dup . 1- then ;] over 0> while execute
    repeat 2drop ;
    10 downcount \ displays 10 9 8 7 6 5 4 3 2 1  ok

    Advantages are:
    1) The xt is not passed to the quotation and so doesn't get in the way.
    2) The xt is loaded as a literal when the quotation exits.
    3) The quotation can exit the loop early by EXITing with 0 on the stack.

    That's the nice thing about Forth: you can mould it to your taste. :-)

    My initial motivation was that I wanted free access to the return
    stack, and to kick UNLOOP out. From my slow playhorse Forth
    (VM with address interpreter):

    \ ------ Template: n FOR .. NEXT and adr n elsize FOR> .. NEXT
    \ (negative elsize iterates backwards over array)
    \ Primitives:
    \ _ITERATE ( a n s xt -- ) M3 iterator (max. nesting depth 2)
    \ N ( -- n ) N! ( n -- ) inner index in single FOR..NEXT loop
    \ M ( -- n ) M! ( n -- ) inner index in nested FOR.FOR..NEXT.NEXT loops
    : FOR> postpone [: ; IMMEDIATE \ M3
    : FOR 0 postpone literal postpone swap 1 postpone literal
    postpone for> ; IMMEDIATE \ M3
    : NEXT postpone ;] postpone _iterate ; IMMEDIATE \ M3

    Demo session:

    +---------------------+
    ¦ Min3rd Core Forth ¦
    +---------------------+
    1020048 bytes free
    # : T1 10 FOR n . NEXT ; ok
    # t1 0 1 2 3 4 5 6 7 8 9 ok
    : t4 10 FOR -111 >r n . r> drop NEXT ; ok
    # t4 0 1 2 3 4 5 6 7 8 9 ok
    # : T2 pad 5 cell FOR> n u. NEXT ; ok
    # : T3 pad 5 cell negate FOR> n u. NEXT ; ok
    # hex ok
    $ pad u. F7BD2F1C ok
    $ t2 F7BD2F1C F7BD2F20 F7BD2F24 F7BD2F28 F7BD2F2C ok
    $ t3 F7BD2F2C F7BD2F28 F7BD2F24 F7BD2F20 F7BD2F1C ok
    $ decimal ok
    # fload demo/for_next.m3
    Benchmarking 1000000 loops:
    DO..LOOP: 101.7 ms
    FOR..NEXT: 26.5 ms ok
    # : T5 3 FOR 7 >r 3 FOR 8 >r n . m . r> drop NEXT r> drop NEXT ; ok
    # t5 0 0 1 0 2 0 0 1 1 1 2 1 0 2 1 2 2 2 ok
    #








    --- Synchronet 3.21a-Linux NewsLink 1.2