1) 16-bit displacements are _important_. Pretty well *all* microprocessors use 16-bit displacements, rather than anything shorter like 12 bits.
Even
though this meant, in most cases, they had to give up indexing.
2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code.
Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.
Array accesses are common, and not needing extra instructions for them is therefore beneficial.
3) At least one major microprocessor manufacturer, Motorola, did have base-index addressing with 16-bit displacements, Motorola, starting with
the 68020.
(3) may not be much of an argument, but it seems to me that (1) and (2)
can reasonably be considered fairly strong arguments. But what about the drawbacks?
The use case
for Ra+Rb+13..16 bit is extremely limited.
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
On Sat, 26 Jul 2025 08:48:34 +0000, Thomas Koenig wrote:
The use case
for Ra+Rb+13..16 bit is extremely limited.
Maybe so. It covers the case where multiple small arrays are located
in the same kind of 64K-byte segment as simple variables.
But what if arrays are larger than 64K in size? Well, in that case,
I've included Array Mode in the standard form of a memory address.
This is a kind of indirect addressing that uses a table of array
addresses in memory to supply the address to which the index
register contents are added.
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
You may not find the arguments strong.
My goal was to provide the instruction set for a very powerful computer;
so I included this addressing mode so as not to lack a feature that
clearly benefited performance that other computers had.
Base plus index plus displacement saves an instruction or two.
Wait.
Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?
This makes negative sense. Memory accesses, even from L1 cache, are
very expensive these days.
I know that. But computers these days do have such a thing as cache, and giving the array pointer table a higher priority to cache because it's expected to be used a lot is doable.
Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?
If you want this to actually be useful, do as Mitch has done, and go a
full 32-bit constant.
John Savard <quadibloc@invalid.invalid> schrieb:
1) 16-bit displacements are _important_. Pretty well *all* microprocessors >> use 16-bit displacements, rather than anything shorter like 12 bits.
That is a bit of an exaggeration - SPARC has 13-bit constants, RISC-V
has 12-bit constants.
Even
though this meant, in most cases, they had to give up indexing.
SPARC has both Ra+Rb and Ra+immediate, but not combined. The use
case for Ra+Rb+13..16 bit is extremely limited.
2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code.
GP registers were an even greater achievement.
Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.
The usual method is Ra+Rb. Mitch also has Ra+Rb<<n+32-bit or Ra+Rb<<n+64-bit, with n from 0 to 3. This is useful for addressing
global data encoded in the constant, which a 12 to 16 bit-offset
is not.
Array accesses are common, and not needing extra instructions for them is
therefore beneficial.
Yes, and indexing without offset can do that particular job just
fine.
3) At least one major microprocessor manufacturer, Motorola, did have
base-index addressing with 16-bit displacements, Motorola, starting with
the 68020.
Mitch recently explained that they had micorarchitectural reasons.
(3) may not be much of an argument, but it seems to me that (1) and (2)
can reasonably be considered fairly strong arguments. But what about the
drawbacks?
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
On 7/26/2025 3:48 AM, Thomas Koenig wrote:
John Savard <quadibloc@invalid.invalid> schrieb:
1) 16-bit displacements are _important_. Pretty well *all*
microprocessors
use 16-bit displacements, rather than anything shorter like 12 bits.
That is a bit of an exaggeration - SPARC has 13-bit constants, RISC-V
has 12-bit constants.
Or, XG2/XG3, and SH-5: 10-bits.
12 bits would be plenty, if it were scaled.
Unscaled displacements effectively lose 2 or 3 bits of range for no real benefit.
Though, for 32-bit instructions, 12 bits is a sensible size with 5 bit register fields. Whereas 10 bits makes more sense for 6-bit registers.
Going much smaller than this will result in a sharp increase in miss
rate though.
So, for example, 6 or 7 displacement bits would be too few.
As can be noted, the "best case" for scaled displacements was seemingly
9 bits unsigned, 10 bits signed.
Where, in terms of hit-rate: 9u>9s, but 10u<10s; since a slight majority
of those that miss the 9u range were also negative.
If using unscaled displacements (like RISC-V) the needed range is closer
to 13 bits.
One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.
In my more recent ISA designs (namely XG3), had made it so the Disp33 encodings can also encode an unscaled displacement.
In this case, there is effectively a 1 bit scale selector:
0: Use the element size as the scale;
1: Use a Byte scale.
The need for misaligned displacements being rare enough that needing to jumbo-encode them isn't all that much of an issue.
As noted, the only available displacement sizes here are 10 and 33 bits.
10 covers the vast majority of load/store;
33 covers everything else.
There doesn't seem to be much practical need for larger displacements
than 33 in the general case. As I see it, it is acceptable to have any larger displacement addressing decay to using general purpose ALU instructions.
In my case, there is generally no absolute addressing mode.
On 64 bit machines, absolute addressing no longer makes as much sense.
...
Even
though this meant, in most cases, they had to give up indexing.
SPARC has both Ra+Rb and Ra+immediate, but not combined. The use
case for Ra+Rb+13..16 bit is extremely limited.
Yeah.
Better IMHO to treat [Base+Disp] and [Base+Index*Sc] as two mutually exclusive cases in terms of the encoding scheme.
The gains of full [Base+Index*Sc+Disp] isn't really worth the cost;
mostly because cases where this is applicable tend to be statistically infrequent.
Even if one used larger encodings for these, still debatable if the
gains are worth the added implementation cost.
On the other side, things like array accesses are common enough that in
many cases, RISC-V effectively shoots itself in the foot by lacking [Rb+Ri*Sc] addressing.
Well, and while not quite as big of a deal in terms of static
instruction counts, this addressing mode tends to have a very high probability of being used inside loops; so has an oversized impact on performance.
Though, there was some murmuring that at least some people in RISC-V
land are considering adding it.
2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code.
GP registers were an even greater achievement.
Yeah.
Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.
The usual method is Ra+Rb. Mitch also has Ra+Rb<<n+32-bit or
Ra+Rb<<n+64-bit, with n from 0 to 3. This is useful for addressing
global data encoded in the constant, which a 12 to 16 bit-offset
is not.
A 16-bit offset can often be useful for globals if using a Global
Pointer, but not useful for PC-rel (and not worth it for general Ld/St
ops).
But, this is niche, and can effectively be limited in scope only to 32
and 64 bit items with a hard-coded base register.
This can then access the first 256K or 512K of ".data", which is
typically sufficient for global variables (though, any commonly-accessed globals may need to be promoted to being in ".data" regardless of
whether or not they are initialized).
Though, even with separate compilation, this is probably something a
linker could figure out (just that "common data" could be put either in ".data" or ".bss", rather than only in ".bss").
I would not recommend Disp16 for normal 32-bit Ld/St ops as this wastes
too much encoding space.
Array accesses are common, and not needing extra instructions for
them is
therefore beneficial.
Yes, and indexing without offset can do that particular job just
fine.
Agreed.
Where:
[Rb+Ri*Sc]
Is fairly common.
But:
[Rb+Ri*sc+Disp]
Is much less commonly needed IME.
3) At least one major microprocessor manufacturer, Motorola, did have
base-index addressing with 16-bit displacements, Motorola, starting with >>> the 68020.
Mitch recently explained that they had micorarchitectural reasons.
The M68K line is probably not a great design reference as, such wonk.
(3) may not be much of an argument, but it seems to me that (1) and (2)
can reasonably be considered fairly strong arguments. But what about the >>> drawbacks?
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
Agreed.
On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:
Wait.
Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?
This makes negative sense. Memory accesses, even from L1 cache, are
very expensive these days.
I know that. But computers these days do have such a thing as cache,
and giving the array pointer table a higher priority to cache because
it's expected to be used a lot is doable.
Doable how, in such a way that both memory accesses are as fast as a
single one with the obvious scheme?
But leaving that aside for a moment: You also need cycles and
instructions to set up that table. Is that less than loading the base address of an array into a register?
12 bits would be plenty, if it were scaled.
Unscaled displacements effectively lose 2 or 3 bits of range for no real benefit.
On 2025-07-27 12:18 a.m., BGB wrote:
On 7/26/2025 3:48 AM, Thomas Koenig wrote:
John Savard <quadibloc@invalid.invalid> schrieb:
1) 16-bit displacements are _important_. Pretty well *all*
microprocessors
use 16-bit displacements, rather than anything shorter like 12 bits.
That is a bit of an exaggeration - SPARC has 13-bit constants, RISC-V
has 12-bit constants.
Or, XG2/XG3, and SH-5: 10-bits.
12 bits would be plenty, if it were scaled.
Unscaled displacements effectively lose 2 or 3 bits of range for no
real benefit.
Though, for 32-bit instructions, 12 bits is a sensible size with 5 bit
register fields. Whereas 10 bits makes more sense for 6-bit registers.
Going much smaller than this will result in a sharp increase in miss
rate though.
So, for example, 6 or 7 displacement bits would be too few.
As can be noted, the "best case" for scaled displacements was
seemingly 9 bits unsigned, 10 bits signed.
Where, in terms of hit-rate: 9u>9s, but 10u<10s; since a slight
majority of those that miss the 9u range were also negative.
If using unscaled displacements (like RISC-V) the needed range is
closer to 13 bits.
One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.
In my more recent ISA designs (namely XG3), had made it so the Disp33
encodings can also encode an unscaled displacement.
In this case, there is effectively a 1 bit scale selector:
0: Use the element size as the scale;
1: Use a Byte scale.
The need for misaligned displacements being rare enough that needing
to jumbo-encode them isn't all that much of an issue.
As noted, the only available displacement sizes here are 10 and 33 bits.
10 covers the vast majority of load/store;
33 covers everything else.
There doesn't seem to be much practical need for larger displacements
than 33 in the general case. As I see it, it is acceptable to have any
larger displacement addressing decay to using general purpose ALU
instructions.
In my case, there is generally no absolute addressing mode.
On 64 bit machines, absolute addressing no longer makes as much sense. >> ...
Even
though this meant, in most cases, they had to give up indexing.
SPARC has both Ra+Rb and Ra+immediate, but not combined. The use
case for Ra+Rb+13..16 bit is extremely limited.
Yeah.
Better IMHO to treat [Base+Disp] and [Base+Index*Sc] as two mutually
exclusive cases in terms of the encoding scheme.
The gains of full [Base+Index*Sc+Disp] isn't really worth the cost;
mostly because cases where this is applicable tend to be statistically
infrequent.
Even if one used larger encodings for these, still debatable if the
gains are worth the added implementation cost.
On the other side, things like array accesses are common enough that
in many cases, RISC-V effectively shoots itself in the foot by lacking
[Rb+Ri*Sc] addressing.
Well, and while not quite as big of a deal in terms of static
instruction counts, this addressing mode tends to have a very high
probability of being used inside loops; so has an oversized impact on
performance.
Though, there was some murmuring that at least some people in RISC-V
land are considering adding it.
2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying
code.
GP registers were an even greater achievement.
Yeah.
Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.
The usual method is Ra+Rb. Mitch also has Ra+Rb<<n+32-bit or
Ra+Rb<<n+64-bit, with n from 0 to 3. This is useful for addressing
global data encoded in the constant, which a 12 to 16 bit-offset
is not.
A 16-bit offset can often be useful for globals if using a Global
Pointer, but not useful for PC-rel (and not worth it for general Ld/St
ops).
But, this is niche, and can effectively be limited in scope only to 32
and 64 bit items with a hard-coded base register.
This can then access the first 256K or 512K of ".data", which is
typically sufficient for global variables (though, any commonly-
accessed globals may need to be promoted to being in ".data"
regardless of whether or not they are initialized).
Though, even with separate compilation, this is probably something a
linker could figure out (just that "common data" could be put either
in ".data" or ".bss", rather than only in ".bss").
I would not recommend Disp16 for normal 32-bit Ld/St ops as this
wastes too much encoding space.
Array accesses are common, and not needing extra instructions for
them is
therefore beneficial.
Yes, and indexing without offset can do that particular job just
fine.
Agreed.
Where:
[Rb+Ri*Sc]
Is fairly common.
But:
[Rb+Ri*sc+Disp]
Is much less commonly needed IME.
Although much less commonly needed there is [Rb+Ri*sc+Disp24] in my
current design as the scaled index takes another instruction word and
24-bit would be wasted otherwise.
I think it depends somewhat on the instruction encoding what is
efficient. Given the number of transistors available today, even less
common needed functionality could be considered.>
I think the 68k is not that bad a reference design to study because it
3) At least one major microprocessor manufacturer, Motorola, did have
base-index addressing with 16-bit displacements, Motorola, starting
with
the 68020.
Mitch recently explained that they had micorarchitectural reasons.
The M68K line is probably not a great design reference as, such wonk.
shows both what to do and what not to do in a single design.
Understanding drawbacks can help a lot.
I learned a lot from studying the 6502 and asking why did they do that?
(3) may not be much of an argument, but it seems to me that (1) and (2) >>>> can reasonably be considered fairly strong arguments. But what about
the
drawbacks?
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
Agreed.
On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:
Do you really want to have an extra memory access to access an array
element, instead of loading the base address of your array into a
register?
Obviously, loading the starting address of the array into a register is preferred.
There are seven registers used as base registers with 16-bit displacements, and another seven registers used as base registers with 12-bit displacements.
So, in a normal program that uses only one each of the former group of base registers for its code and data segments respectively, there are enough registers to handle twelve arrays.
Array Mode is only for use when it's needed, because a program is either dealing with a lot of large arrays, or if it is under extreme register pressure in addition to dealing with some large arrays.
On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:
If you want this to actually be useful, do as Mitch has done, and go a
full 32-bit constant.
Well, I would actually need to use 64 bits if I wanted to include
full-size memory addresses in instructions.
I think that is too long.
John Savard <quadibloc@invalid.invalid> schrieb:
On Sat, 26 Jul 2025 13:34:19 +0000, Thomas Koenig wrote:
If you want this to actually be useful, do as Mitch has done, and go a
full 32-bit constant.
Well, I would actually need to use 64 bits if I wanted to include
full-size memory addresses in instructions.
OK, Mitch has this.
I think that is too long.
Too long for what? It's simpler than any of the alternatives, and
even x86_64 (which you are aiming to surpass in complexity, it seems)
has 64-bit constants.
On Sun, 27 Jul 2025 07:15:02 +0000, Thomas Koenig wrote:
Doable how, in such a way that both memory accesses are as fast as a
single one with the obvious scheme?
Cache is storage inside the processor chip. So are registers. Cache will
be slower, but not by as much as a normal memory access.
But leaving that aside for a moment: You also need cycles and
instructions to set up that table. Is that less than loading the base
address of an array into a register?
No, but it only happens once at the beginning.
As I've noted, I agree that putting the start address in a register is better. When you can do that. When there are enough registers available.
But what if you have more arrays than you have registers?
On Sat, 26 Jul 2025 23:18:58 -0500, BGB wrote:
12 bits would be plenty, if it were scaled.
Unscaled displacements effectively lose 2 or 3 bits of range for no real
benefit.
I had tried, in a few places, to allow the values in index registers to
be scaled. As this isn't a common feature in CPU designs, though, I didn't use the opcode space to indicate this scaling in most instructions.
But scaling *displacements* is an idea that had not even occurred to me.
There is _one_ important benefit of not scaling the displacements which is made very evident by certain other characteristics of the Concertina II architecture.
The System/360 had 12-bit displacements that were not scaled. As a result, when writing code for the System/360, it was known that each base register that was used would provide coverage of 4,096 bytes of memory. No more,
no less, regardless of what type of data you referenced. So you knew when
to allocate another base register with the address plus 4,096 in it if needed.
In the Concertina II, I actually take the 32 registers, and in addition to taking the last seven of a group of eight as base registers for use with 16-bit displacemets, I go with a different group of eight registers for
use with 12-bit displacements, and a different one still for 20-bit displacements.
Because there is no value in being only able to access the first 4,096
bytes of a 65,536 byte region of memory. So instead of having addressing modes that do that silly thing, I have more base registers available; if
you run out of 65,536-byte regios of memory to allocate, at leadt you can also allocate additional 4,096-byte regions of nemory, and that might be useful.
John Savard
On 7/27/2025 12:10 AM, Robert Finch wrote:
On 2025-07-27 12:18 a.m., BGB wrote:
On 7/26/2025 3:48 AM, Thomas Koenig wrote:
John Savard <quadibloc@invalid.invalid> schrieb:
1) 16-bit displacements are _important_. Pretty well *all*
microprocessors
use 16-bit displacements, rather than anything shorter like 12 bits.
That is a bit of an exaggeration - SPARC has 13-bit constants, RISC-V
has 12-bit constants.
Or, XG2/XG3, and SH-5: 10-bits.
12 bits would be plenty, if it were scaled.
Unscaled displacements effectively lose 2 or 3 bits of range for no
real benefit.
Though, for 32-bit instructions, 12 bits is a sensible size with 5
bit register fields. Whereas 10 bits makes more sense for 6-bit
registers.
Going much smaller than this will result in a sharp increase in miss
rate though.
So, for example, 6 or 7 displacement bits would be too few.
As can be noted, the "best case" for scaled displacements was
seemingly 9 bits unsigned, 10 bits signed.
Where, in terms of hit-rate: 9u>9s, but 10u<10s; since a slight
majority of those that miss the 9u range were also negative.
If using unscaled displacements (like RISC-V) the needed range is
closer to 13 bits.
One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.
In my more recent ISA designs (namely XG3), had made it so the Disp33
encodings can also encode an unscaled displacement.
In this case, there is effectively a 1 bit scale selector:
0: Use the element size as the scale;
1: Use a Byte scale.
The need for misaligned displacements being rare enough that needing
to jumbo-encode them isn't all that much of an issue.
As noted, the only available displacement sizes here are 10 and 33 bits. >>> 10 covers the vast majority of load/store;
33 covers everything else.
There doesn't seem to be much practical need for larger displacements
than 33 in the general case. As I see it, it is acceptable to have
any larger displacement addressing decay to using general purpose ALU
instructions.
In my case, there is generally no absolute addressing mode.
On 64 bit machines, absolute addressing no longer makes as much
sense.
...
Even
though this meant, in most cases, they had to give up indexing.
SPARC has both Ra+Rb and Ra+immediate, but not combined. The use
case for Ra+Rb+13..16 bit is extremely limited.
Yeah.
Better IMHO to treat [Base+Disp] and [Base+Index*Sc] as two mutually
exclusive cases in terms of the encoding scheme.
The gains of full [Base+Index*Sc+Disp] isn't really worth the cost;
mostly because cases where this is applicable tend to be
statistically infrequent.
Even if one used larger encodings for these, still debatable if the
gains are worth the added implementation cost.
On the other side, things like array accesses are common enough that
in many cases, RISC-V effectively shoots itself in the foot by
lacking [Rb+Ri*Sc] addressing.
Well, and while not quite as big of a deal in terms of static
instruction counts, this addressing mode tends to have a very high
probability of being used inside loops; so has an oversized impact on
performance.
Though, there was some murmuring that at least some people in RISC-V
land are considering adding it.
2) Index registers were hailed as a great advancement in computers
when they were added to them, as they allowed avoiding self-modifying >>>>> code.
GP registers were an even greater achievement.
Yeah.
Of course, if one just has base registers, one can still have
a special base register, used for array accessing, where the base
address of a segment has had the array offset added to it through
separate arithmetic instructions.
The usual method is Ra+Rb. Mitch also has Ra+Rb<<n+32-bit or
Ra+Rb<<n+64-bit, with n from 0 to 3. This is useful for addressing
global data encoded in the constant, which a 12 to 16 bit-offset
is not.
A 16-bit offset can often be useful for globals if using a Global
Pointer, but not useful for PC-rel (and not worth it for general Ld/
St ops).
But, this is niche, and can effectively be limited in scope only to
32 and 64 bit items with a hard-coded base register.
This can then access the first 256K or 512K of ".data", which is
typically sufficient for global variables (though, any commonly-
accessed globals may need to be promoted to being in ".data"
regardless of whether or not they are initialized).
Though, even with separate compilation, this is probably something a
linker could figure out (just that "common data" could be put either
in ".data" or ".bss", rather than only in ".bss").
I would not recommend Disp16 for normal 32-bit Ld/St ops as this
wastes too much encoding space.
Array accesses are common, and not needing extra instructions for
them is
therefore beneficial.
Yes, and indexing without offset can do that particular job just
fine.
Agreed.
Where:
[Rb+Ri*Sc]
Is fairly common.
But:
[Rb+Ri*sc+Disp]
Is much less commonly needed IME.
Although much less commonly needed there is [Rb+Ri*sc+Disp24] in my
current design as the scaled index takes another instruction word and
24-bit would be wasted otherwise.
I think it depends somewhat on the instruction encoding what is
efficient. Given the number of transistors available today, even less
common needed functionality could be considered.>
I had experimented with similar before, although it was in the form of:
[Rb+Ri*Sc+Disp13]
With a 64-bit jumbo encoded form.
Tried to get my compiler to use it in a few cases:
Array on the stack, where the array isn't already in a register;
Array inside struct;
...
But, usage frequency was at best fairly low.
I think the 68k is not that bad a reference design to study because it
3) At least one major microprocessor manufacturer, Motorola, did have >>>>> base-index addressing with 16-bit displacements, Motorola, starting >>>>> with
the 68020.
Mitch recently explained that they had micorarchitectural reasons.
The M68K line is probably not a great design reference as, such wonk.
shows both what to do and what not to do in a single design.
Understanding drawbacks can help a lot.
I learned a lot from studying the 6502 and asking why did they do that?
OK.
I guess it is can make sense for comparing tradeoffs, but, say, "M68K
did it, therefore it is a good idea" is a little weak.
Then again, M68K could be considered as part of a family that seems to
have largely taken design inspiration from the PDP-11.
Well, along with the MSP430 and SuperH.
Then looked around, and noted something.
SH was designed as a 32-bit redesign of the Hitachi H8 microcontroller;
The H8 design was itself partly based on the PDP-11.
Implying that some of the design similarities might not be coincidence.
Well, and by implication, there is sort of an indirect evolutionary through-line connecting my ISA design efforts back to the PDP-11.
Looking:
PDP-11:
zzzz-yyy-mmm-yyy-nnn
Regs: R0..R7 (16b), R6=SP, R7=PC
Condition Codes
Bcc, Disp8
H8:
zzzz-zzzz-mmmm-nnnn
Regs: R0..R7 (16b), R7=SP
Can address High/Low Bytes;
Paired registers
Condition Codes
Bcc, Disp8
SH (and BJX1):
zzzz-nnnn-mmmm-zzzz
Regs: R0..R15 (32b), R15=SP
S and T bits
BT/BF, Disp8
XG1 and XG2
XG1
16b: zzzz-zzzz-nnnn-mmmm
32b: 111p-zwzz-nnnn-mmmm zzzz-qnmo-oooo-zzzz
XG2:
NMOp-zwzz-nnnn-mmmm zzzz-qnmo-oooo-zzzz
Regs (XG1/XG2):
R0..R31 (64b), R15=SP
R32..R63 (XG1-XGPR and XG2)
T bit
BT/BF, Disp8 | Disp20
XG3
zzzzz-oooooo-mmmmmm-zzzz-nnnnnn-yyyypp
Regs:
R0..R31
R0..R3: ZR, LR, SP, GP
R32..R63 (F0..F31)
T bit (optional)
BT/BF, Disp23 (optional)
XG3 was reworked to resemble RISC-V,
but does does not break an evolutionary path per se.
Some instructions have similar names, but differ in terms of behavior:
PDP-11: JSR / RTS: Link goes on Stack.
H8: JSR / RTS: Link goes on Stack.
SH: JSR / RTS: Link goes in PR (register)
BSR, Disp12 (+/- 4K)
XG1/XG2:
JSR / RTS: Link goes in LR (register)
BSR, Disp20 (+/- 1MB)
BSR, Disp23 (XG2, +/- 8MB)
XG3:
JSR / RTS: Pseudo-Instructions over JALR
Link goes in LR
BSR, Disp23 (+/- 16MB)
Its BSR is hard-wired to LR, unlike RV's JAL.
XG3 sort of has an identity crisis between the PDP-derived elements and
the RV/MIPS derived elements; along with two wildly different ASM syntax styles. Though, admittedly, with BGBCC had still mostly been using the previous AT&T style ASM syntax, eg:
MOV.L (R18), R10
The M68K is part of a similar family, also borrowing design elements
from the PDP-11. The M68K mostly had its own wonk as well, like separate Address and Data registers.
So, M68K:
zzzz-mmm-yyy-yyy-nnn
Regs:
D0..D7
A0..A7
A7=SP
Condition Codes
JSR / RTS: Link goes on Stack.
Bcc, Disp8/Disp16/Disp32
BRA/BSR, Disp8/Disp16/Disp32
Escape-Coded Scheme, Disp==0 escapes to Disp16, ...
In PDP-11 and M68K, instructions may encode an addressing mode for each register, allowing Reg/Mem and Mem/Mem operation.
M68K seemingly renamed MOV to MOVE.
Looks like ASM examples lack the '%' on register names.
Seemingly, '%' prefixed register names were mostly a GAS thing?...
Contrast, H8, SH, and my BJX2 ISA went over to being Load/Store.
PDP-11, H8, SH, and M68K have auto-increment addressing.
Though, I had dropped auto-increment in BJX2.
Had also been dropped in some of my intermediate subsets (*1).
*1: B32V and BSR1 representing were the transition between SH and what
would become BJX2.
B32V was basically an SH variant with nearly all non-essential features removed. BSR1 had reorganized the encoding space, and (after 32-bit encodings were added) became the core of BJX2.
Something like:
MOV.L (R8)+, R4 //auto-increment loads
Still existing in these later ISA variants as pseudo-instructions.
Though, auto-increment notation is currently unusable for the RISC-V or
XG3 targets. Could re-add the pseudo instructions if it mattered though.
...
(3) may not be much of an argument, but it seems to me that (1) and >>>>> (2)
can reasonably be considered fairly strong arguments. But what
about the
drawbacks?
I have not seen a strong argument for Ra+Rb+16 bit in what you wrote
above.
Agreed.
But scaling *displacements* is an idea that had not even occurred to me.
One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.
BGB wrote:
One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.
Scaled displacements are also restricted in what it can address for
RIP-rel addressing if the RIP has alignment restrictions
(instructions aligned on 16 or 32 bits).
One possible downside of scaled displacements is that they can't directly encode a misaligned load or store. But this is rare.
BGB wrote:
One possible downside of scaled displacements is that they can't
directly encode a misaligned load or store. But this is rare.
Scaled displacements are also restricted in what it can address for
RIP-rel addressing if the RIP has alignment restrictions
(instructions aligned on 16 or 32 bits).
I think it would be in keeping with John's tradition to "not choose" and instead use one bit to indicate whether the displacement should be
scaled.
One possible downside of scaled displacements is that they can't directly
encode a misaligned load or store. But this is rare.
Much more frequent is misaligned pointers to aligned data, which are
used quite commonly in dynamically-typed languages when you use
a non-zero tag for boxed objects.
I think it would be in keeping with John's tradition to "not choose" and instead use one bit to indicate whether the displacement should
be scaled.
My current design fuses a max of one memory op into instructions instead
of having a load followed by the instruction (or an instruction followed
by a store). Address mode available without adding instruction words are
Rn, (Rn), (Rn)+, -(Rn). After that 32-bit instruction words are added to support 32 and 64-bit displacements or addresses.
The instructions with the extra displacement words are larger but there
are fewer instructions to execute.
LOAD Rd,[Rb+Disp16]
ADD Rd,Ra,Rd
Requiring two instruction words, and executing as two instructions, gets replaced with:
ADD Rd,Ra,[Rb+Disp32]
Which also takes two instruction words, but only one instruction.
Immediate operands and memory operands are routed according to two two-
bit routing fields. I may be able to compress this to a single three-bit field.
Typical instruction encoding:--
ADD: oooooo ss xx ii ww mmm rrrrr rrrrr ddddd
oooooo: is the opcode
ss: is the operation size
xx: is two extra opcode bits
ii: indicates which register field represents an immediate value
ww: indicates which register field is a memory operand
mmm: is the addressing mode, similar to a 68k
rrrrr: source register spec (or 4+ bit immediate)
ddddd: destination register spec (or 4+ bit immediate)
A 36-bit opcode would work great, allowing operand sign control.
On 7/27/2025 3:50 AM, Robert Finch wrote:
big snip
First, thanks for posting this. I don't recall you posting much about
your design. Can you talk about its goals, why you are doing it, its status, etc.?
Specific comments below
My current design fuses a max of one memory op into instructions
instead of having a load followed by the instruction (or an
instruction followed by a store). Address mode available without
adding instruction words are Rn, (Rn), (Rn)+, -(Rn). After that 32-bit
instruction words are added to support 32 and 64-bit displacements or
addresses.
The combined mem-op instructions used to be popular, but since the RISC revolution, are now out of fashion. Their advantages are, as you state, often eliminating an instruction. The disadvantages include that they preclude scheduling the load earlier in the instruction stream. Do you "crack" the instruction into two micro-ops in the decode stage? What
drove your decision to "buck" the trend. I am not saying you are wrong.
I just want to understand your reasoning.
The instructions with the extra displacement words are larger but
there are fewer instructions to execute.
LOAD Rd,[Rb+Disp16]
ADD Rd,Ra,Rd
Requiring two instruction words, and executing as two instructions,
gets replaced with:
ADD Rd,Ra,[Rb+Disp32]
Which also takes two instruction words, but only one instruction.
Immediate operands and memory operands are routed according to two
two- bit routing fields. I may be able to compress this to a single
three-bit field.
Yes. For example, unless you are doing some special case thing, having both registers be immediates probably doesn't make sense. And unless
you are going to allow two memory references in one instruction, that combination doesn't make sense.
is all that is needed. Inverting the sign on the destination and one ofTypical instruction encoding:I got some sign control using the two XX bits. For some operations that
ADD: oooooo ss xx ii ww mmm rrrrr rrrrr ddddd
oooooo: is the opcode
ss: is the operation size
xx: is two extra opcode bits
ii: indicates which register field represents an immediate value
ww: indicates which register field is a memory operand
mmm: is the addressing mode, similar to a 68k
rrrrr: source register spec (or 4+ bit immediate)
ddddd: destination register spec (or 4+ bit immediate)
A 36-bit opcode would work great, allowing operand sign control.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,064 |
Nodes: | 10 (0 / 10) |
Uptime: | 148:03:30 |
Calls: | 13,691 |
Calls today: | 1 |
Files: | 186,936 |
D/L today: |
33 files (6,120K bytes) |
Messages: | 2,410,932 |