' moves a sum of two variables into body of the third one.
1 VARIABLE X
2 VARIABLE Y
3 VARIABLE Z
: TEST1 1000 0 DO 10000 0 DO X @ Y @ + Z ! LOOP LOOP ; ok
: TEST2 1000 0 DO 10000 0 DO X Y Z +> LOOP LOOP ; ok
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 121 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 71 ok
: TEST1 1000 0 DO 10000 0 DO 1 X +! 1 Y +! X @ Y @ + Z ! LOOP LOOP ;
ok
: TEST2 1000 0 DO 10000 0 DO X ++ Y ++ X Y Z +> LOOP LOOP ; ok
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 217 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 132 ok
' moves a sum of two variables into body of the third one.
The results are rather promising, from one can see.
The saving come from rolling @ @ + ! into a single very specialized function. But what about the loading of X Y and retrieving of Z which
are unavoidable in practice? Should that not be included in the test?
The saving come from rolling @ @ + ! into a single very specialized
function. But what about the loading of X Y and retrieving of Z which
are unavoidable in practice? Should that not be included in the test?
Let's find out then:
1 VARIABLE X
2 VARIABLE Y
3 VARIABLE Z
: TEST1 1000 0 DO 10000 0 DO
I DUP X ! Y ! X @ Y @ + Z ! Z @ DROP
LOOP LOOP ;
: TEST2 1000 0 DO 10000 0 DO
I DUP X ! Y ! X Y Z +> Z @ DROP
LOOP LOOP ;
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 252 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 202 ok
: TEST1 1000 0 DO 10000 0 DO
I DUP X ! Y ! 1 X +! 1 Y +! X @ Y @ + Z ! Z @ DROP
LOOP LOOP ;
: TEST2 1000 0 DO 10000 0 DO
I DUP X ! Y ! X ++ Y ++ X Y Z +> Z @ DROP
LOOP LOOP ;
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 346 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 258 ok
The difference is smaller - still it's significant.
Another test - using the "drawing a box" example
from "Thinking Forth" (and "simulated" LINE word):
0 VARIABLE TOP
0 VARIABLE LEFT
0 VARIABLE BOTTOM
0 VARIABLE RIGHT
: LINE 2DROP 2DROP ;
: BOX1 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
LEFT @ TOP @ RIGHT @ TOP @ LINE
RIGHT @ TOP @ RIGHT @ BOTTOM @ LINE
RIGHT @ BOTTOM @ LEFT @ BOTTOM @ LINE
LEFT @ BOTTOM @ LEFT @ TOP @ LINE ;
: BOX2 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
LEFT TOP RIGHT TOP LINE
RIGHT TOP RIGHT BOTTOM LINE
RIGHT BOTTOM LEFT BOTTOM LINE
LEFT BOTTOM LEFT TOP LINE ;
: TEST1 1000 0 DO 10000 0 DO I DUP 2DUP BOX1 LOOP LOOP ;
: TEST2 1000 0 DO 10000 0 DO I DUP 2DUP BOX2 LOOP LOOP ;
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 890 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 653 ok
The difference is even more significant in case
of multiplication:
1 VARIABLE X
2 VARIABLE Y
3 VARIABLE Z
: TEST1 1000 0 DO 10000 0 DO
I DUP X ! Y ! X @ Y @ * Z ! Z @ DROP
LOOP LOOP ;
: TEST2 1000 0 DO 10000 0 DO
I DUP X ! Y ! X Y Z *> Z @ DROP
LOOP LOOP ;
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 658 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 200 ok
But this time better implementation has also
its impact; fig-Forth's '*' is inefficient,
and I coded '*>' of course directly in ML,
simply using IMUL.
...
Another test - using the "drawing a box" example
from "Thinking Forth" (and "simulated" LINE word):
...
IIRC DO..LOOPs had been a hack for computers in the 60s.
A rather ugly hack, born out of necessity, slow and
often cumbersome to use. That it still persists in Forth
half a century later speaks for Forth's progressiveness.
On 27/06/2025 12:16 pm, minforth wrote:
...
IIRC DO..LOOPs had been a hack for computers in the 60s.
A rather ugly hack, born out of necessity, slow and
often cumbersome to use. That it still persists in Forth
half a century later speaks for Forth's progressiveness.
Testing FOR NEXT on my DTC system showed 15% speed increase over
DO LOOP. Putting 5 NOOPs (executes forth's address interpreter)
in the innermost loop brought it down to 6%. Not worth it IMO.
It really depends on how counted loops are implemented.
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
It really depends on how counted loops are implemented.
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
In that old fig-Forth it's rather short and simple:
sqHeader '(LOOP)'
XLOOP dw $ + 2
mov BX,1
XLOO1: add [BP],BX
mov AX,[BP]
sub AX,[BP+2]
xor AX,BX
js BRAN1
add BP,4
inc SI
inc SI
jmp NEXT
It doesn't look that bad. Can it be
done even shorter?
In article <bc63996456fe967e5c66d17cbbeb21c2@www.novabbs.com>,
LIT <zbigniew2011@gmail.com> wrote:
It really depends on how counted loops are implemented.
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
In that old fig-Forth it's rather short and simple:
sqHeader '(LOOP)'
XLOOP dw $ + 2
mov BX,1
XLOO1: add [BP],BX
mov AX,[BP]
sub AX,[BP+2]
xor AX,BX
js BRAN1
add BP,4
inc SI
inc SI
jmp NEXT
It doesn't look that bad. Can it be
done even shorter?
My optimiser looks into the combination of DO and LOOP,
transfers the returns stack into registers after inlining
everything. It is near vfx performance.
All experimental, but yes there is much to be gained.
It really depends on how counted loops are implemented.
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
In that old fig-Forth it's rather short and simple:
sqHeader '(LOOP)'
XLOOP dw $ + 2
mov BX,1
XLOO1: add [BP],BX
mov AX,[BP]
sub AX,[BP+2]
xor AX,BX
js BRAN1
add BP,4
inc SI
inc SI
jmp NEXT
It doesn't look that bad. Can it be
done even shorter?
Am 27.06.2025 um 20:15 schrieb albert@spenarnc.xs4all.nl:
In article <bc63996456fe967e5c66d17cbbeb21c2@www.novabbs.com>,
LIT <zbigniew2011@gmail.com> wrote:
It really depends on how counted loops are implemented.
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
In that old fig-Forth it's rather short and simple:
sqHeader '(LOOP)'
XLOOP dw $ + 2
mov BX,1
XLOO1: add [BP],BX
mov AX,[BP]
sub AX,[BP+2]
xor AX,BX
js BRAN1
add BP,4
inc SI
inc SI
jmp NEXT
It doesn't look that bad. Can it be
done even shorter?
My optimiser looks into the combination of DO and LOOP,
transfers the returns stack into registers after inlining
everything. It is near vfx performance.
All experimental, but yes there is much to be gained.
Must be tricky to do UNLOOP in a register-based loop. ;-)
Am 27.06.2025 um 20:15 schrieb albert@spenarnc.xs4all.nl:
In article <bc63996456fe967e5c66d17cbbeb21c2@www.novabbs.com>,
LIT <zbigniew2011@gmail.com> wrote:
It really depends on how counted loops are implemented.
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
In that old fig-Forth it's rather short and simple:
sqHeader '(LOOP)'
XLOOP dw $ + 2
mov BX,1
XLOO1: add [BP],BX
mov AX,[BP]
sub AX,[BP+2]
xor AX,BX
js BRAN1
add BP,4
inc SI
inc SI
jmp NEXT
It doesn't look that bad. Can it be
done even shorter?
My optimiser looks into the combination of DO and LOOP,
transfers the returns stack into registers after inlining
everything. It is near vfx performance.
All experimental, but yes there is much to be gained.
Must be tricky to do UNLOOP in a register-based loop. ;-)
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in=|
_CX register)
and you'll happily count down from 5 to 1.
FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter. ||>Should do speedy enough. ;-)||
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
minforth <minforth@gmx.net> writes:
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
Which "operators" do you have in mind, and what do you mean with
"blazingly fast".
Anyway, we have discussed this repeatedly, e.g., in ><2022Feb13.231208@mips.complang.tuwien.ac.at> I wrote in reply to your >posting <f4b89e0b-2ded-4b18-8dc1-bba6dcda47bbn@googlegroups.com>, and
cited earlier discussions in the topic.
|"minf...@arcor.de" <minforth@arcor.de> writes:
[...]
F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in= >|> _CX register)|
and you'll happily count down from 5 to 1.
|Yes, but why would one do this? As we have established in an earlier >|discussion (see below), the LOOP instruction is typically not faster
|than a sequence of simpler instructions:
|
|<2018Jun6.184616@mips.complang.tuwien.ac.at>:
||minforth@arcor.de writes:
FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter. >||>Should do speedy enough. ;-)||
||Have you measured it? I have >||<2017Mar14.183125@mips.complang.tuwien.ac.at> >||<2017Mar15.141411@mips.complang.tuwien.ac.at> and compared the
||following loops:
||
||.L5: .L5:
|| subq $1, %rax loop .L5
|| jne .L5
||
||I found that for these loops Sandy Bridge, Haswell, and Skylake take
||~4 cycles per iteration using LOOP, and 1-2 cycles per iteration when >||using jne.
|
|<2018Jun7.141731@mips.complang.tuwien.ac.at>:
||cycles for 1000 iterations
|| K10 Excavator Zen
||Phenom II Athlon X4 845 Ryzen 1600X
|| 3021 1314 1051 loop
|| 2020 1484 1051 sub; jne
|| 2026 1489 1053 add; cmp; jne
|
|There is no performance advantage on modern AMD and Intel CPUs for the >|instruction LOOP over a good implementation of the Forth word LOOP (as
|in the third example).
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
You obviously ignore repeated refutations of your claims of superior >performance for LOOP-instruction-based counted loops. Maybe you
should implement and measure such a counted loop yourself and compare
it to the LOOP word on SwiftForth and VFX Forth.
- anton
You obviously ignore repeated refutations of your claims of superior >performance for LOOP-instruction-based counted loops. Maybe you
should implement and measure such a counted loop yourself and compare
it to the LOOP word on SwiftForth and VFX Forth.
IIRC DO..LOOPs had been a hack for computers in the 60s.
A rather ugly hack, born out of necessity, slow and
often cumbersome to use.
<[Wil] W.BADEN1> Chuck walked out of ANSI becuz it wudn't make
FOR...NEXT required.
<[Wil] W.BADEN1> That was 1988. And the operative word is "required."
What is the difference between FOR/NEXT and DO/LOOP? Don't they do the
same thing?
It was thus said that the Great dxf <dxforth@gmail.com> once stated:
<[Wil] W.BADEN1> Chuck walked out of ANSI becuz it wudn't make
FOR...NEXT required.
<[Wil] W.BADEN1> That was 1988. And the operative word is "required."
What is the difference between FOR/NEXT and DO/LOOP? Don't they do the same thing?
On Thu, 26 Jun 2025 17:27:48 +0000, LIT wrote:
The saving come from rolling @ @ + ! into a single very specialized
function. But what about the loading of X Y and retrieving of Z which
are unavoidable in practice? Should that not be included in the test?
Let's find out then:
1 VARIABLE X
2 VARIABLE Y
3 VARIABLE Z
: TEST1 1000 0 DO 10000 0 DO
I DUP X ! Y ! X @ Y @ + Z ! Z @ DROP
LOOP LOOP ;
: TEST2 1000 0 DO 10000 0 DO
I DUP X ! Y ! X Y Z +> Z @ DROP
LOOP LOOP ;
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 252 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 202 ok
: TEST1 1000 0 DO 10000 0 DO
I DUP X ! Y ! 1 X +! 1 Y +! X @ Y @ + Z ! Z @ DROP
LOOP LOOP ;
: TEST2 1000 0 DO 10000 0 DO
I DUP X ! Y ! X ++ Y ++ X Y Z +> Z @ DROP
LOOP LOOP ;
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 346 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 258 ok
The difference is smaller - still it's significant.
Another test - using the "drawing a box" example
from "Thinking Forth" (and "simulated" LINE word):
0 VARIABLE TOP
0 VARIABLE LEFT
0 VARIABLE BOTTOM
0 VARIABLE RIGHT
: LINE 2DROP 2DROP ;
: BOX1 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
LEFT @ TOP @ RIGHT @ TOP @ LINE
RIGHT @ TOP @ RIGHT @ BOTTOM @ LINE
RIGHT @ BOTTOM @ LEFT @ BOTTOM @ LINE
LEFT @ BOTTOM @ LEFT @ TOP @ LINE ;
: BOX2 ( x1 y1 x2 y2) BOTTOM ! RIGHT ! TOP ! LEFT !
LEFT TOP RIGHT TOP LINE
RIGHT TOP RIGHT BOTTOM LINE
RIGHT BOTTOM LEFT BOTTOM LINE
LEFT BOTTOM LEFT TOP LINE ;
: TEST1 1000 0 DO 10000 0 DO I DUP 2DUP BOX1 LOOP LOOP ;
: TEST2 1000 0 DO 10000 0 DO I DUP 2DUP BOX2 LOOP LOOP ;
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 890 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 653 ok
The difference is even more significant in case
of multiplication:
1 VARIABLE X
2 VARIABLE Y
3 VARIABLE Z
: TEST1 1000 0 DO 10000 0 DO
I DUP X ! Y ! X @ Y @ * Z ! Z @ DROP
LOOP LOOP ;
: TEST2 1000 0 DO 10000 0 DO
I DUP X ! Y ! X Y Z *> Z @ DROP
LOOP LOOP ;
TICKS TEST1 TICKS 2SWAP DMINUS D+ D. 658 ok
TICKS TEST2 TICKS 2SWAP DMINUS D+ D. 200 ok
But this time better implementation has also
its impact; fig-Forth's '*' is inefficient,
and I coded '*>' of course directly in ML,
simply using IMUL.
With so many DO..LOOPs involved, be careful not to
measure more looping time than the multiplications.
IIRC DO..LOOPs had been a hack for computers in the 60s.
A rather ugly hack, born out of necessity, slow and
often cumbersome to use. That it still persists in Forth
half a century later speaks for Forth's progressiveness.
--
And while DO has an obvious shortcoming (partially addressed by ?DO),
I have found that variations on ?DO..LOOP are quite helpful in keeping
the number of items on the data stack manageable. They mean that I
don't have to deal with the index and limit in the loop body, and that
they are also out of the way, so I don't have to think about them in
the loop body. And when I need the loop index, "I" gives it to me,
like an automatically-defined local.
gives you hints on how to organize your stacks.R pair (like 2OS comes with a SWAP SWAP penalty). BTW, knowing this
On 28-06-2025 19:46, Anton Ertl wrote:
And while DO has an obvious shortcoming (partially addressed by ?DO),
I have found that variations on ?DO..LOOP are quite helpful in keeping
the number of items on the data stack manageable. They mean that I
don't have to deal with the index and limit in the loop body, and that
they are also out of the way, so I don't have to think about them in
the loop body. And when I need the loop index, "I" gives it to me,
like an automatically-defined local.
Wow.. I learned this about 20 years ago from the creator of the FIG Forth editor. You find it in the "c" and "delete" commands.
And yeah - you're completely right: it works like a "read-only" local.
The TORS can be used as a "r/w" local - with the additional penalty of a R> >R pair (like 2OS comes with a SWAP SWAP penalty). BTW, knowing this gives you hints on how to organize your stacks.
The DO..LOOP advantages - nah, not really. E.g. an "address" loop can be done like (a n = address count):
OVER SWAP /ELEMENT * + >R
BEGIN DUP R@ < WHILE ( ..) /ELEMENT + REPEAT R> DROP DROP
No need for BOUNDS DO..LOOP ..
FOR..NEXT is even easier:
>R BEGIN R@ 0> WHILE ( ..) R> 1- >R REPEAT R> DROP
So for a lot of applications, I don't really need DO..LOOP and its deeply flawed implementation. And since R@ and I are synonyms, I can even use I if I prefer I! :)
...
The DO..LOOP advantages - nah, not really. E.g. an "address" loop can be >done like (a n = address count):
OVER SWAP /ELEMENT * + >R
BEGIN DUP R@ < WHILE ( ..) /ELEMENT + REPEAT R> DROP DROP
No need for BOUNDS DO..LOOP ..
So for a lot of applications, I don't really need DO..LOOP and its
deeply flawed implementation.
And since R@ and I are synonyms
even use I if I prefer I! :)
Usually, if I think of it I could even do less DO..LOOP - but you know
how it is when a pattern has entered your mind. It's like an ear worm.
But anyways - that was the reason I defined R'@ and R"@ a long time ago.
It still feels like cheating if I use them, but they're just synonyms of
I' and J anyways - so they don't take up much space if you want to
support them.
As I've shown, in 4tH you can make the code much clearer by
(temporarily) assigning synonyms to them.
Now as I was playing with the idea there were a lot of people claiming
"It makes no sense adding R'@ and R"@. Been there done that."
But since I'm so easily convinced, I did it anyway. Can't say I
regretted that, since it allows you to better balance the load between
both stacks.
Hans Bezemer
P.S. If you want to do an equally misconceived FOR..NEXT as eForth has >implemented, place the (..) payload directly behind BEGIN. That will
make it do 11 iterations when you only asked for 10.
sean@conman.org writes:
What is the difference between FOR/NEXT and DO/LOOP? Don't they do the >>same thing?
FOR ... NEXT on one system does not do the same thing as FOR ... NEXT
on some other systems, and they all behave different from DO ... LOOP.
r dup >r * over + swap 2r> 2swap 2>r \ xt size -- s | r: end arrBEGIN 2r@ u>
On Sat, 28 Jun 2025 21:01:48 +0000, Anton Ertl wrote:
sean@conman.org writes:
What is the difference between FOR/NEXT and DO/LOOP? Don't they do the >>> same thing?
FOR ... NEXT on one system does not do the same thing as FOR ... NEXT
on some other systems, and they all behave different from DO ... LOOP.
Correct. Here are variants with iterators that even run on gforth 0.7.9:
\ ====== <n> FOR# .. #TIMES ==================================================
\ original: machine code
\ demo variant: slow Forth
: _ITERATE \ end xt
swap
BEGIN dup 0>
WHILE over execute 1-
REPEAT 2drop ;
: FOR# postpone [: ; IMMEDIATE
: #TIMES postpone ;] postpone _iterate ; IMMEDIATE
\ ====== <n> FOR .. N M .. NEXT ==============================================
On 03/07/2025 20:33, minforth wrote:
On Sat, 28 Jun 2025 21:01:48 +0000, Anton Ertl wrote:
sean@conman.org writes:
What is the difference between FOR/NEXT and DO/LOOP? Don't they do >>>> the
same thing?
FOR ... NEXT on one system does not do the same thing as FOR ... NEXT
on some other systems, and they all behave different from DO ... LOOP.
Correct. Here are variants with iterators that even run on gforth 0.7.9:
\ ====== <n> FOR# .. #TIMES
==================================================
\ original: machine code
\ demo variant: slow Forth
: _ITERATE \ end xt
swap
BEGIN dup 0>
WHILE over execute 1-
REPEAT 2drop ;
: FOR# postpone [: ; IMMEDIATE
: #TIMES postpone ;] postpone _iterate ; IMMEDIATE
\ ====== <n> FOR .. N M .. NEXT
==============================================
I've found looping quotations useful but I like to include the quotation inside the loop e.g. (without the syntactic sugar and moving the
iterator inside the quotation):
: downcount begin [: dup 0> if dup . 1- then ;] over 0> while execute
repeat 2drop ;
10 downcount \ displays 10 9 8 7 6 5 4 3 2 1 ok
Advantages are:
1) The xt is not passed to the quotation and so doesn't get in the way.
2) The xt is loaded as a literal when the quotation exits.
3) The quotation can exit the loop early by EXITing with 0 on the stack.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,064 |
Nodes: | 10 (0 / 10) |
Uptime: | 153:22:18 |
Calls: | 13,691 |
Calls today: | 1 |
Files: | 186,936 |
D/L today: |
2,526 files (731M bytes) |
Messages: | 2,411,055 |