From Newsgroup: comp.lang.forth
I recently improved #S with a separate loop when the high call of the
input number is 0:
: #s ( ud -- 0 0 ) \ core number-sign-s
dup if
begin
#
dup 0= until
then
drop begin
base @ u/mod swap digit hold
dup 0= until
0 ;
This gives a nice speedup for fillseq.4th <
2025Nov22.185430@mips.complang.tuwien.ac.at>. I have now
special-cased the second loop for base #10:
: #s ( ud -- 0 0 ) \ core number-sign-s
\G Used between @code{<<#} and @code{#>}. Prepend all digits of
\G @var{ud} to the pictured numeric output string. @code{#s} will
\G convert at least one digit. Therefore, if @var{ud} is 0,
\G @code{#s} will prepend a ``0'' to the pictured numeric output
\G string.
dup if
begin
#
dup 0= until
then
drop
base @ #10 = if
begin
#10 u/mod swap '0' + hold
dup 0= until
else
begin
base @ u/mod swap digit hold
dup 0= until
then
0 ;
This provides another nice speedup (see below).
I have also tried using a special primitive #10u/mod, but on
Rocketlake it caused a slowdown. Gcc selected code that used
multiplication instead of division and replaced the mod part not with multiplication and subtraction, but with several instructions, so the
end result consumes more instructions. And on CPUs like Rocket Lake
with fast division, it also consumes more cycles. Given that recent
AMD CPUs also have fast division, I removed #10u/mod again. My guess
is that gcc generated this code for Skylake and earlier Intel CPUs
where division was slow.
old #S #S opt1 #S opt2 worse
one loop two loops + #10 loop + #10u/mod
3245_981222 2690_088360 2422_977895 2492_586635 cycles 11679_661274 9813_132978 8564_869788 8909_131947 instructions
1391_034028 1204_585688 1086_707686 1086_667791 branches
1_521428 1_520834 1_516859 1_515857 branch-misses
0.4 3.3 0.4 0.4 % tma_backend_bound
3.9 3.9 3.5 3.5 % tma_bad_speculation
24.6 19.5 25.4 25.8 % tma_frontend_bound
71.1 73.3 70.7 70.4 % tma_retiring
- anton
--
M. Anton Ertl
http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs:
http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard:
https://forth-standard.org/
EuroForth 2025 CFP:
http://www.euroforth.org/ef25/cfp.html
EuroForth 2025 registration:
https://euro.theforth.net/
--- Synchronet 3.21a-Linux NewsLink 1.2