Forum: War Ensemble BBS

Tough Decisions

From John Savard@quadibloc@invalid.invalid to comp.arch on Mon Jul 28 03:48:54 2025

From Newsgroup: comp.arch

Adding the new header type which was indicated by only five leading
bits, thus consuming 1/32nd of the opcode space, to provide for
fourteen-way superscalar operation has proved to lead to an overly
severe shortage of available opcode space.

Thus, I've removed that header type, and I've used the opcode space
freed up to allow two other header types to contribute to the feature
of allowing pseudo-immediates to be found in a saved previous block
instead of the current block - which feature reduces the need to pad
a block to make everything fit. One of those two header types is the zero-overhead header, which definitely makes this feature more useful.

The zero-overhead header now can also call for augmented short instruction
type instructions to follow it, which allows the use of 15-bit short instructions instead of the more restrictive 14-bit short instructions
more often.

As well, this made room for a P bit in some operate instructions - which,
of course, was needed for that feature to work at all, the P bit in
the instruction beng what indicates when that feture is used.

Also, a mistake in the diagrams for the 64-bit block-aware long
instructions was corrected.

So progress is being made; adding the 14-way superscalar header caused
some fat to be taken away, but now that opcode space is put back where
it will do the most good, I hope.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Jul 28 04:39:56 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

The zero-overhead header now can also call for augmented short instruction type instructions to follow it, which allows the use of 15-bit short instructions instead of the more restrictive 14-bit short instructions
more often.

A question regarding your various header types.

In order to avoid NOP hell, these have to be filled with the
same instruction length (say 15 bits, like you said above),
is that correct?

You don't have a compiler (obviously), but have you looked at actual
code generated by compilers (for example on godbolt, which is very
good) and checked which instruction sequence would fit which header?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Mon Jul 28 10:05:31 2025

From Newsgroup: comp.arch

On Mon, 28 Jul 2025 04:39:56 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

In order to avoid NOP hell, these have to be filled with the same
instruction length (say 15 bits, like you said above),
is that correct?

A block is 256 bits in length, and normally, in the absence of a
header, consists of eight 32-bit instructions. Blocks need, basically,
to be filled with 32-bit instructions to avoid NOPs.

To improve the compactness of code, I have made provision for short instructions; these are nominally 16 bits long, as they're packed
two to a 32-bit word. They can, in reality, be 14 bits long or 15 bits
long, depending on the overhead of indicating them, or 17 bits long or
even longer if some additional bits are provided in the header.

Some headers provide for variable-length instructions. Those headers,
instead of just being either 32 bits or 64 bits long, can also be
48 bits long.

The feature I mentioned in the post to which you are replying,
for avoiding NOP issues, has to do with data, specifically immediate
values in instructions, rather than the instructions themselves.

In order to keep length decoding of instructions simple - or allow code
that consists only of 32-bit instructions - but allow immediate values
of any length, what I had done was come up with a scheme that does the following:

In a register-to-register operate instruction, a source register
specification can be replaced with a 5-bit pointer to data; it points
to one of the bytes in a 256-bit block of instructions.

This feature requires a header for use; in the simplest case, a header
may contain a 3-bit field that specifies how many 32-bit instruction words
are to be left unused at the end of a block.

So I was avoiding NOP issues by saving old instruction blocks, and allowing
a pointer to a "pseudo-immediate" to point into a saved block instead of
the current block. Yes, that will require flexibility on the part of the compiler; more work on its part, and it will be tricky to code. I think
it's possible in the current state of the art, though.

You don't have a compiler (obviously), but have you looked at actual
code generated by compilers (for example on godbolt, which is very good)
and checked which instruction sequence would fit which header?

Not yet, I have to admit.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Jul 28 07:41:59 2025

From Newsgroup: comp.arch

On 7/28/2025 3:05 AM, John Savard wrote:

On Mon, 28 Jul 2025 04:39:56 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

In order to avoid NOP hell, these have to be filled with the same
instruction length (say 15 bits, like you said above),
is that correct?

A block is 256 bits in length, and normally, in the absence of a
header, consists of eight 32-bit instructions. Blocks need, basically,
to be filled with 32-bit instructions to avoid NOPs.

To improve the compactness of code, I have made provision for short instructions; these are nominally 16 bits long, as they're packed
two to a 32-bit word. They can, in reality, be 14 bits long or 15 bits
long, depending on the overhead of indicating them, or 17 bits long or
even longer if some additional bits are provided in the header.

Some headers provide for variable-length instructions. Those headers,
instead of just being either 32 bits or 64 bits long, can also be
48 bits long.

The feature I mentioned in the post to which you are replying,
for avoiding NOP issues, has to do with data, specifically immediate
values in instructions, rather than the instructions themselves.

In order to keep length decoding of instructions simple - or allow code
that consists only of 32-bit instructions - but allow immediate values
of any length, what I had done was come up with a scheme that does the following:

In a register-to-register operate instruction, a source register specification can be replaced with a 5-bit pointer to data; it points
to one of the bytes in a 256-bit block of instructions.

This feature requires a header for use; in the simplest case, a header
may contain a 3-bit field that specifies how many 32-bit instruction words are to be left unused at the end of a block.

So I was avoiding NOP issues by saving old instruction blocks, and allowing
a pointer to a "pseudo-immediate" to point into a saved block instead of
the current block. Yes, that will require flexibility on the part of the compiler; more work on its part, and it will be tricky to code. I think
it's possible in the current state of the art, though.

You don't have a compiler (obviously), but have you looked at actual
code generated by compilers (for example on godbolt, which is very good)
and checked which instruction sequence would fit which header?

Not yet, I have to admit.

Another alternative is to pick some "typical" problems that you would
expect to take a few dozen instructions to code, and write the
"assembler" code by hand for them in your proposed architecture. That
would give you an "upper limit" on density.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Mon Jul 28 12:47:24 2025

From Newsgroup: comp.arch

On 7/28/2025 5:05 AM, John Savard wrote:

On Mon, 28 Jul 2025 04:39:56 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

In order to avoid NOP hell, these have to be filled with the same
instruction length (say 15 bits, like you said above),
is that correct?

A block is 256 bits in length, and normally, in the absence of a
header, consists of eight 32-bit instructions. Blocks need, basically,
to be filled with 32-bit instructions to avoid NOPs.

To improve the compactness of code, I have made provision for short instructions; these are nominally 16 bits long, as they're packed
two to a 32-bit word. They can, in reality, be 14 bits long or 15 bits
long, depending on the overhead of indicating them, or 17 bits long or
even longer if some additional bits are provided in the header.

One possibility could be, rather than going the 16-bit route, having a
more limited set of "split" instructions, which encode two instructions
in a single 32-bit unit; albeit limited in what instructions that can co-execute.

These could potentially be encoded while still retaining 32-bit
alignment, though would naturally be more limited than full 16-bit
encodings.

If I were to do similar in XG3, one possibility might be:
yyyyyyyyyyyyy-xxxxxxxxxxxxx-z11100
Giving 13 bits of encoding space (with an extra bit for X).

Where, in this case, can assume that predicated jumbo-prefixes are not a
thing (the prefixes could always be assumed unconditional).
This leaves 11100 and1 11101 as unused blocks.
With 11110 being the normal XG3 jumbo-prefix space.

Possibly (z is assumed 0 in Y):
0-mmmmmnnnnn000: MOV Rm, Rn
0-mmmmmnnnnn001: ADD Rm, Rn
0-iiiiinnnnn010: MOV Imm5s, Rn
0-iiiiinnnnn011: ADD Imm5s, Rn

0-00mmm00nnn101: SUB Rm3, Rn3
0-00mmm01nnn101: XOR Rm3, Rn3
0-00mmm10nnn101: AND Rm3, Rn3
0-00mmm11nnn101: OR Rm3, Rn3

0-01mmm00nnn101: SUBW Rm3, Rn3 //SUBS.L Rn3, Rm3, Rn3
0-01mmm01nnn101: ADDW Rm3, Rn3 //ADDS.L Rn3, Rm3, Rn3
0-01mmm10nnn101: - Rm3, Rn3
0-01mmm11nnn101: - Rm3, Rn3

0-10mmm00nnn101: SLL Rm3, Rn3 //SHLD.Q Rn3, Rm3, Rn3
0-10mmm01nnn101: SLA Rm3, Rn3 //SHAD.Q Rn3, Rm3, Rn3
0-10mmm10nnn101: SRL Rm3, Rn3 //SHLR.Q Rn3, Rm3, Rn3
0-10mmm11nnn101: SRA Rm3, Rn3 //SHAR.Q Rn3, Rm3, Rn3

X position only:
1-0ddddnnnnn010: MOV.L (SP, Disp4), Rn
1-1ddddnnnnn010: MOV.Q (SP, Disp4), Rn
1-0ddddnnnnn011: MOV.L Rn, (SP, Disp4)
1-1ddddnnnnn011: MOV.Q Rn, (SP, Disp4)
1-ddmmm00nnn100: MOV.L (Rm3, Disp2), Rn3
1-ddmmm01nnn100: MOV.Q (Rm3, Disp2), Rn3
1-ddmmm10nnn100: MOV.L Rn3, (Rm3, Disp2)
1-ddmmm11nnn100: MOV.Q Rn3, (Rm3, Disp2)

Where, probably register encodings could go into several groups:
5-bit, only for a few instructions.
3-bit, R8..R15 (typical)

When using the RV ABI, this could map to the same register space as the
RISC-V 'C' extension.

Would skip instructions that are not "super common". For example, RV-C
gives 16-bit encodings for some instructions which are unlikely to be
used sufficiently often to justify having 16-bit encodings.

Whereas, being able to compact things like MOV and ADD instructions is,
or MOV+LD, MOV+ST, ... would likely be higher priorities.

Worth it? Debatable.

Also, unlike more traditional 16/32, this doesn't have the same level of precedence (so harder to say if it is unencumbered). AFAIK/IIRC, about
the only ISA that I am aware of ATM that really went this sort of
direction was Qualcomm Hexagon.

In theory, this could regain some of what was "lost" by XG3 being
incompatible with the RV-C encodings. Albeit, XG3 still seemingly gets
better code density than RG64GC despite the lack of 16-bit instructions,
so I didn't see it as a huge loss.

Though, in theory this scheme is a lot more limited than the 'C'
encodings, so the code-size savings would likely be smaller.

Some headers provide for variable-length instructions. Those headers,
instead of just being either 32 bits or 64 bits long, can also be
48 bits long.

The feature I mentioned in the post to which you are replying,
for avoiding NOP issues, has to do with data, specifically immediate
values in instructions, rather than the instructions themselves.

In order to keep length decoding of instructions simple - or allow code
that consists only of 32-bit instructions - but allow immediate values
of any length, what I had done was come up with a scheme that does the following:

In a register-to-register operate instruction, a source register specification can be replaced with a 5-bit pointer to data; it points
to one of the bytes in a 256-bit block of instructions.

This feature requires a header for use; in the simplest case, a header
may contain a 3-bit field that specifies how many 32-bit instruction words are to be left unused at the end of a block.

So I was avoiding NOP issues by saving old instruction blocks, and allowing
a pointer to a "pseudo-immediate" to point into a saved block instead of
the current block. Yes, that will require flexibility on the part of the compiler; more work on its part, and it will be tricky to code. I think
it's possible in the current state of the art, though.

You don't have a compiler (obviously), but have you looked at actual
code generated by compilers (for example on godbolt, which is very good)
and checked which instruction sequence would fit which header?

Not yet, I have to admit.

I had compiler first.

It was when I started working on my own compiler that it quickly became apparent all the ways the SuperH ISA was lacking.

But, at the time, didn't just abandon my existing effort for RISC-V.

Whether or not all the years I burnt on all this has been worthwhile, is debatable...

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue Jul 29 05:12:10 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

On Mon, 28 Jul 2025 04:39:56 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

In order to avoid NOP hell, these have to be filled with the same
instruction length (say 15 bits, like you said above),
is that correct?

A block is 256 bits in length, and normally, in the absence of a
header, consists of eight 32-bit instructions. Blocks need, basically,
to be filled with 32-bit instructions to avoid NOPs.

That sounds straightforward.

To improve the compactness of code, I have made provision for short instructions; these are nominally 16 bits long, as they're packed
two to a 32-bit word.

Still straightforward.

They can, in reality, be 14 bits long or 15 bits
long, depending on the overhead of indicating them, or 17 bits long or
even longer if some additional bits are provided in the header.

Some headers provide for variable-length instructions. Those headers,
instead of just being either 32 bits or 64 bits long, can also be
48 bits long.

Please provide an example of an assembly sequence, how this
woult look like.

The feature I mentioned in the post to which you are replying,
for avoiding NOP issues, has to do with data, specifically immediate
values in instructions, rather than the instructions themselves.

Data in the instruction stream have to be considered part
of the instruction for any reasonable purposes.

I'm afraid I do not understand in the absence of an example.
Could you code up an example of a simple function that uses
these features?

In order to keep length decoding of instructions simple - or allow code
that consists only of 32-bit instructions - but allow immediate values
of any length, what I had done was come up with a scheme that does the following:

In a register-to-register operate instruction, a source register specification can be replaced with a 5-bit pointer to data; it points
to one of the bytes in a 256-bit block of instructions.

This feature requires a header for use; in the simplest case, a header
may contain a 3-bit field that specifies how many 32-bit instruction words are to be left unused at the end of a block.

So I was avoiding NOP issues by saving old instruction blocks, and allowing
a pointer to a "pseudo-immediate" to point into a saved block instead of
the current block. Yes, that will require flexibility on the part of the compiler; more work on its part, and it will be tricky to code. I think
it's possible in the current state of the art, though.

OMG.

The current state of the art is very much different from what you
describe.

You don't have a compiler (obviously), but have you looked at actual
code generated by compilers (for example on godbolt, which is very good)
and checked which instruction sequence would fit which header?

Not yet, I have to admit.

This is an essential part of defining an architecture...
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Tue Jul 29 22:03:39 2025

From Newsgroup: comp.arch

On Tue, 29 Jul 2025 05:12:10 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

Some headers provide for variable-length instructions. Those headers,
instead of just being either 32 bits or 64 bits long, can also be 48
bits long.

Please provide an example of an assembly sequence, how this woult look
like.

It's not clear to me what you would expect to see.

In Assembly, there would just be a directive telling the assembler
what instruction set to expect,

ISET 2

followed by a program that would hardly look any different from,
say, System/360 assembler.

The feature I mentioned in the post to which you are replying,
for avoiding NOP issues, has to do with data, specifically immediate
values in instructions, rather than the instructions themselves.

Data in the instruction stream have to be considered part of the
instruction for any reasonable purposes.

I'm afraid I do not understand in the absence of an example. Could you
code up an example of a simple function that uses these features?

In assembler, pseudo-immediates just look like normal immediates. Normally,
the programmer is not required to keep track of where blocks begin and
end.

However, I'll try to provide an example in a different way.

Normally, a block with a pseudo-immediate looks like this:

(2) I I I#2 I I M2

(2) is a header that says "reserve the last two 32-bit instruction
slots for immediate values or stuff".

I is an instruction.

I#2 is an instruction that uses a 64-bit immediate value.

M2 is the immediate value which is two instruction slots or
64 bits long.

Now, suppose we're at the start of an instruction block, and we have
15 ordinary instructions coming, followed by one that takes a
32-bit immediate value.

Then

I I I I I I I I
(1) I I I I I I I

doesn't work, the I#1 that needs the 32-bit immediate for which
space is being reserved is in the next block after.

But if we do this

I I I I I I I I
(1) I I I I I I M1
I I#1^ I I I I I I

where I#1& is an instruction that uses a single instruction slot pseudo-immediate, but as the ^ indicates, from the previous block,
then it all goes together with no wasted space.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Tue Jul 29 22:22:26 2025

From Newsgroup: comp.arch

On Tue, 29 Jul 2025 22:03:39 +0000, John Savard wrote:

My examples didn't reproduce correctly.

Then

I I I I I I I I

(1) I I I I I I I

doesn't work, the I#1 that needs the 32-bit immediate for which space is being reserved is in the next block after.

But if we do this

I I I I I I I I

(1) I I I I I I M1

I I#1^ I I I I I I

where I#1& is an instruction that uses a single instruction slot pseudo-immediate, but as the ^ indicates, from the previous block,
then it all goes together with no wasted space.

was what I meant.

If we use I. to stand for an instruction that is a branch target,
then a case where a NOP is still needed can be shown. Let's say
in the example above, the instruction immediately preceding the one
with the 32-bit immediate is a branch target.

Then the previous block which contains the value for the
pseudo-immediate isn't guaranteed to be read, so we have to do this
instead:

I I I I I I I I

() I I I I I I I
.
(1) I#1 I I I I I M1

with () being a 32-bit do-nothing header, that being cheaper than
a NOP.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Tue Jul 29 23:20:45 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> wrote:

On Tue, 29 Jul 2025 05:12:10 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

Some headers provide for variable-length instructions. Those headers,
instead of just being either 32 bits or 64 bits long, can also be 48
bits long.

Please provide an example of an assembly sequence, how this woult look
like.

It's not clear to me what you would expect to see.

In Assembly, there would just be a directive telling the assembler
what instruction set to expect,

ISET 2

followed by a program that would hardly look any different from,
say, System/360 assembler.

The feature I mentioned in the post to which you are replying,
for avoiding NOP issues, has to do with data, specifically immediate
values in instructions, rather than the instructions themselves.

Data in the instruction stream have to be considered part of the
instruction for any reasonable purposes.

I'm afraid I do not understand in the absence of an example. Could you
code up an example of a simple function that uses these features?

In assembler, pseudo-immediates just look like normal immediates. Normally, the programmer is not required to keep track of where blocks begin and
end.

However, I'll try to provide an example in a different way.

Normally, a block with a pseudo-immediate looks like this:

(2) I I I#2 I I M2

(2) is a header that says "reserve the last two 32-bit instruction
slots for immediate values or stuff".

I is an instruction.

I#2 is an instruction that uses a 64-bit immediate value.

M2 is the immediate value which is two instruction slots or
64 bits long.

Do you only allow 32-byte aligned branch targets? Otherwise what
happens when somebody jumps into middle of a block?
--
Waldek Hebisch
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Wed Jul 30 03:11:45 2025

From Newsgroup: comp.arch

On Tue, 29 Jul 2025 23:20:45 +0000, Waldek Hebisch wrote:

Do you only allow 32-byte aligned branch targets? Otherwise what
happens when somebody jumps into middle of a block?

Branches are allowed to any 16-bit aligned location.

Whenever a branch is taken, however, the whole block in which the
target is located is fetched, in order that the instructions in the
block can be correctly decoded.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Wed Jul 30 05:02:11 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

On Tue, 29 Jul 2025 05:12:10 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

Some headers provide for variable-length instructions. Those headers,
instead of just being either 32 bits or 64 bits long, can also be 48
bits long.

Please provide an example of an assembly sequence, how this woult look
like.

It's not clear to me what you would expect to see.

I would like to see an assembly sequence which efficiently uses
your non-32-bit and non-16-bit encodings in a block, including
all necssary NOPs.

And I mean an _actual_ assembly sequence. Pick some well-known
algorithm and implement it, then see if it will fit your block
structure.

In Assembly, there would just be a directive telling the assembler
what instruction set to expect,

ISET 2

You (the compiler) have to take care of block limits, surely?
Or is the assembler supposed to figure out something that fits?
Then, a single instruction emitted which requires an instruction
size that is not supported by that block will cause the block size
to change. I assume that, if any of your 15-bit (or whatever)
instructions do not work for block size reasons, you have
different fallback instructions.

By the way, your scheme of encoding constants in different blocks
is going to land whoever writes an assembler for you in fixup
and relocation hell.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Wed Jul 30 09:34:13 2025

From Newsgroup: comp.arch

On Wed, 30 Jul 2025 03:11:45 +0000, John Savard wrote:

Whenever a branch is taken, however, the whole block in which the target
is located is fetched, in order that the instructions in the block can
be correctly decoded.

I hadn't been thinking much in terms of implementations with a memory bus
width of less than 256 bits. In that case, one doesn't want to fetch
more than is needed from memory.

Actually, one has to at least fetch the first 32 bits of the block, to see
if they're an instruction or (part of) a header. If it's a header, the
whole header or group of headers at the front needs to be fetched and processed.

After that, only the instructioon branched to needs to be fetched, not the
ones before it, since the header gives all the information needed to decode
all the instructions in the block in parallel.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Sat Aug 23 00:05:56 2025
  from Moore, Ok via Telnet
- Noozle
  Fri Aug 22 11:07:42 2025
  from Noozle City via Telnet
- Microbot
  Fri Aug 22 01:53:59 2025
  from Moore, Ok via Telnet
- Microbot
  Thu Aug 21 03:21:53 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,064
Nodes:	10 (0 / 10)
Uptime:	149:56:28
Calls:	13,691
Calls today:	1
Files:	186,936
D/L today:	437 files (114M bytes)
Messages:	2,410,967

Tough Decisions

Who's Online

Recent Visitors

System Info