Forum: War Ensemble BBS

Pseudo-Immediates as Part of the Instruction

From John Savard@quadibloc@invalid.invalid to comp.arch on Fri Aug 1 15:11:49 2025

From Newsgroup: comp.arch

I couldn't locate a post I finally felt I was ready to respond to, which
was in reply to one of my posts about Concertina II, which said that immediates ought to be properly considered part of the instruction.

Well, in nearly all computer architectures, immediates _are_ part of the instruction, and quite obviously so.

But what Concertina II has are *pseudo* immediates. That is, they're not really immediates, but they pretend to be.

What does this mean? What could this mean?

Well, in my register-to-register operate instruction, associated with each _source_ register field, there's a bit which, if set, says that the five
bits in the field aren't a register specifier, but a pointer to a constant.

A constant that's addressed by an instruction isn't an immediate; it's a constant. So why do I even call these constants "pseudo-immediates" then?

Well, that pointer - five bits long - is an awfully short pointer. Where
does it point?

Instructions are fetched in blocks that are 256 bits long. One of the
things this allows for is for the block to begin with a header that
specifies that a certain number of 32-bit instruction slots at the end of
the current block are to be skipped over in the sequence of instructions
to be executed; this space can be used for constants.

So although the constant is fetched in response to a pointer, and thus is
not an immediate, the constant is located directly in the instruction
stream. This is particularly true in implementations where the memory bus
is 256 bits wide, and a block of instructions is fetched in a single
memory read.

So the pseudo-immediate value is not part of the _instruction_ in the conventional sense, but if you think of the 256-bit block as being the
"real" instruction for a VLIW architecture, it's part of *that*.

Think of the Itanium: the 128-bit thingie is one thing, and each of the 41-
bit thingies that make it up, along with the 5-bit header, is another
thing.

The 5-bit header is part of the 128-bit thingy without being part of any
of the 41-bit thingies. That is the limbo in which my pseudo-immediates
are found. Data? Or a field in the instruction? It can be either one, depending on whether you define each individual 32-bit instruction as an instruction, or the 256-bit block as the "real" instruction the
architecture executes.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Fri Aug 1 16:52:34 2025

From Newsgroup: comp.arch

On Fri, 01 Aug 2025 15:11:49 +0000, John Savard wrote:

The 5-bit header is part of the 128-bit thingy without being part of any
of the 41-bit thingies. That is the limbo in which my pseudo-immediates
are found. Data? Or a field in the instruction? It can be either one, depending on whether you define each individual 32-bit instruction as an instruction, or the 256-bit block as the "real" instruction the
architecture executes.

...and if you think that's crazy, in some of the earliest iterations of
the Concertina II design, I implemented instructions longer than 32 bits
by having a six-bit pointer in an instruction to the rest of the
instruction.

Which, I suppose, argues against the view that pseudo-immediates are not
part of the instruction, since that which definitely is part of the instruction can be pointed to in the same way.

I stopped doing that because I felt it involved too much overhead.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Fri Aug 1 18:08:17 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

I couldn't locate a post I finally felt I was ready to respond to, which
was in reply to one of my posts about Concertina II, which said that immediates ought to be properly considered part of the instruction.

That was probably mine.

Well, in nearly all computer architectures, immediates _are_ part of the instruction, and quite obviously so.

But what Concertina II has are *pseudo* immediates. That is, they're not really immediates, but they pretend to be.

What does this mean? What could this mean?

Well, in my register-to-register operate instruction, associated with each _source_ register field, there's a bit which, if set, says that the five bits in the field aren't a register specifier, but a pointer to a constant.

A constant that's addressed by an instruction isn't an immediate; it's a constant. So why do I even call these constants "pseudo-immediates" then?

Well, that pointer - five bits long - is an awfully short pointer. Where does it point?

Question: Do the pointers point to the same block only, or also
to other blocks? With 5 bits, you could address others as well.
Can you give an example of their use, including the block headers?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Fri Aug 1 21:04:11 2025

From Newsgroup: comp.arch

On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:

Question: Do the pointers point to the same block only, or also to other blocks? With 5 bits, you could address others as well. Can you give an example of their use, including the block headers?

Actually, no, 5 bits are only enough to point within the same block.
That's because it's a byte pointer, as it can be used to point to any type
of constant, including single byte constants.

This is despite the fact that I do have an instruction format for
conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).

However, they _can_ point to another block, by means of a sixth bit that
some instructions have... but when this happens, it does not trigger an
extra fetch from memory. Instead, the data is retrieved from a copy of an earlier block in the instruction stream that's saved in a special
register... so as to reduce potential NOP-style problems.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Robert Finch@robfi680@gmail.com to comp.arch on Fri Aug 1 21:03:17 2025

From Newsgroup: comp.arch

On 2025-08-01 5:04 p.m., John Savard wrote:

On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:

Question: Do the pointers point to the same block only, or also to other
blocks? With 5 bits, you could address others as well. Can you give an
example of their use, including the block headers?

Actually, no, 5 bits are only enough to point within the same block.
That's because it's a byte pointer, as it can be used to point to any type
of constant, including single byte constants.

This is despite the fact that I do have an instruction format for conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).

However, they _can_ point to another block, by means of a sixth bit that
some instructions have... but when this happens, it does not trigger an
extra fetch from memory. Instead, the data is retrieved from a copy of an earlier block in the instruction stream that's saved in a special
register... so as to reduce potential NOP-style problems.

John Savard

I tried something similar to this but without block headers and it
worked okay. But there were a couple of issues. One was the last
instruction in cache line could not have an immediate. Or instructions
had to stop before the end of the cache line to accommodate immediates.
This resulted in some wasted space. There would sometimes be a 32-bit
hole between the last instruction and the first immediate. I used a
four-bit index and 32-bit immediate, instruction word size. Four bits
was enough for a 512-bit (cache line size). IIRC the wasted space was
about 5%.
It made the assembler more complex. I had immediates being positioned
from the far end of the cache line down (like a stack) towards the instructions which began at the lower end. The assembler had to be able
to keep track of where things were on the cache line and the assembler
was not built to handle that.
Also, it made reading listings more difficult as constants were in the
middle of sequences of instructions.
Sometimes constants could be shared, but this turned out to be not
possible in many cases as the assembler needed to emit relocation
records for some constants and it could not handle having two or more instructions pointing to the same constant.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.arch on Sat Aug 2 03:21:56 2025

From Newsgroup: comp.arch

On Fri, 1 Aug 2025 15:11:49 -0000 (UTC), John Savard wrote:

Well, that pointer - five bits long - is an awfully short pointer. Where
does it point?

Instructions are fetched in blocks that are 256 bits long. One of the
things this allows for is for the block to begin with a header that
specifies that a certain number of 32-bit instruction slots at the end
of the current block are to be skipped over in the sequence of
instructions to be executed; this space can be used for constants.

Just add a couple of modifier bits: one is the indirect bit, indicating
that the location referenced contains the address of the value, not the
value itself, and another “page zero” bit, which indicates that the location is not in the current block, but in another block at a fixed
address ...

... and I start having PDP-8 flashbacks.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Sat Aug 2 03:22:41 2025

From Newsgroup: comp.arch

On Fri, 01 Aug 2025 21:03:17 -0400, Robert Finch wrote:

I tried something similar to this but without block headers and it
worked okay. But there were a couple of issues. One was the last
instruction in cache line could not have an immediate. Or instructions
had to stop before the end of the cache line to accommodate immediates.
This resulted in some wasted space.

This is interesting. I've tried to keep things simple by making everything explicit.

Also, it made reading listings more difficult as constants were in the
middle of sequences of instructions.

I don't plan on structuring my assembly language that way. It might make reading _core dumps_ more difficult, but pseudo-immediate values would
appear in the assembler source within the instruction just like
conventional immediates.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Aug 2 09:12:17 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:

Question: Do the pointers point to the same block only, or also to other
blocks? With 5 bits, you could address others as well. Can you give an
example of their use, including the block headers?

Actually, no, 5 bits are only enough to point within the same block.
That's because it's a byte pointer, as it can be used to point to any type
of constant, including single byte constants.

This is despite the fact that I do have an instruction format for conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).

Is there a reason for that? On the face of it, having both makes
no sense.

But even so: Having a single, let's say, 32-bit immedate would require
a 32-bit header and a 32-bit constant, so 64 bits used instead of
directly encoding a 32-bit constant.

However, they _can_ point to another block, by means of a sixth bit that some instructions have...

Try writing an assembler and disassembler for what you have. I have
written this for Mitch's ISA, and it turned out to be very difficult
already. Your method, I would guess, would be much more difficult.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Sat Aug 2 18:57:43 2025

From Newsgroup: comp.arch

On Sat, 02 Aug 2025 09:12:17 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

This is despite the fact that I do have an instruction format for
conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).

Is there a reason for that? On the face of it, having both makes no
sense.

The option of having a pseudo-immediate pointer instead of a register specification is baked into the format of the operate instructions.
Removing it for some variable types would be messy.

But even so: Having a single, let's say, 32-bit immedate would require a 32-bit header and a 32-bit constant, so 64 bits used instead of directly encoding a 32-bit constant.

And avoiding that for eight and sixteen bit constants is the reason for conventional immediates for them, despite the duplication. (Try fitting
the other sizes of immediate into a 32-bit instruction.)

But I'm sneaky. Since this situation dismayed me all along with
Concertina II, I have what I call a "zero-overhead header". In the first instruction slot of a block, one may have a Type I header, which is a two-address operate instruction which *also* supplies a three-bit _decode_ field, reserving slots for pseudo-immediates.

Since operate instructions are the most common type of instruction, if one
can re-arrange instructions a little, one might be able to have these pseudo-imediates *without* the crushing burden of a 32-bit overhead!

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Aug 2 19:23:01 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

Since operate instructions are the most common type of instruction, if one can re-arrange instructions a little, one might be able to have these pseudo-imediates *without* the crushing burden of a 32-bit overhead!

I read "one might" as "never will".

You still haven't shown a single piece of code with your header
scheme, I presume because it is to difficult even for you, the
author of the ISA.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Sun Aug 3 05:30:34 2025

From Newsgroup: comp.arch

On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:

You still haven't shown a single piece of code with your header scheme,
I presume because it is to difficult even for you, the author of the
ISA.

I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic
essentials of how it works, I fail to realize how making the extra effort
to smother that information in a mass of irrelevant detail is going to
make it any clearer to you.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Aug 3 11:25:51 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:

You still haven't shown a single piece of code with your header scheme,
I presume because it is to difficult even for you, the author of the
ISA.

I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic essentials of how it works, I fail to realize how making the extra effort
to smother that information in a mass of irrelevant detail is going to
make it any clearer to you.

It is not how something appears in a diagram, it is how an actual
algorithm is transformed into efficient machine language (I would
have said assembly language, but you put a massive barrier between
the two with your block structure).

You wrote, upthread, that you have never done so. My current
assumption is that you chose not to do it because this would
be too complicated for you, the inventor of this ISA, let alone
anybody else.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sun Aug 3 12:50:05 2025

From Newsgroup: comp.arch

On 8/2/2025 2:12 AM, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:

Question: Do the pointers point to the same block only, or also to other >>> blocks? With 5 bits, you could address others as well. Can you give an
example of their use, including the block headers?

Actually, no, 5 bits are only enough to point within the same block.
That's because it's a byte pointer, as it can be used to point to any type >> of constant, including single byte constants.

This is despite the fact that I do have an instruction format for
conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).

Is there a reason for that? On the face of it, having both makes
no sense.

But even so: Having a single, let's say, 32-bit immedate would require
a 32-bit header and a 32-bit constant, so 64 bits used instead of
directly encoding a 32-bit constant.

Yup. And as Robert Finch pointed out, what if the instruction that
needs the constant is the last instruction in the block?

However, they _can_ point to another block, by means of a sixth bit that
some instructions have...

But using this capability isn't a solution, as it adds 32 bits to the
block, which pushes the last instruction in that block into the current
block, which pushes the instruction that needs the immediate into the
next block and forces the extra nop anyway.

Try writing an assembler and disassembler for what you have. I have
written this for Mitch's ISA, and it turned out to be very difficult
already.

I am curious as to what features you found difficult?

Your method, I would guess, would be much more difficult.

Agreed!
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sun Aug 3 13:03:21 2025

From Newsgroup: comp.arch

On 8/2/2025 10:30 PM, John Savard wrote:

On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:

You still haven't shown a single piece of code with your header scheme,
I presume because it is to difficult even for you, the author of the
ISA.

I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic essentials of how it works, I fail to realize how making the extra effort
to smother that information in a mass of irrelevant detail is going to
make it any clearer to you.

I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real
programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Sun Aug 3 22:36:32 2025

From Newsgroup: comp.arch

Stephen Fuld wrote:

On 8/2/2025 10:30 PM, John Savard wrote:

On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:

You still haven't shown a single piece of code with your header scheme,
I presume because it is to difficult even for you, the author of the
ISA.

I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic
essentials of how it works, I fail to realize how making the extra effort
to smother that information in a mass of irrelevant detail is going to>> make it any clearer to you.

I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write actual code for a real program.

That is always a required step, but still not enough.
I.e when I first got the Itanium architecture manual (long before any CPUs/systems were available) I sat down and wrote some (to me)
interesting kernels, like medium-sized arbitrary precision math, up to a kbit or two, using carry-save in-register storage.
That persuaded me that it was possible for the Itanium do do these kinds of calculations very fast indeed, but the architecure was still a
memorable failure.
Being fit for a number of hand-written asm kernels does not a generally
useful cpu make.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sun Aug 3 14:28:11 2025

From Newsgroup: comp.arch

On 8/3/2025 1:36 PM, Terje Mathisen wrote:

Stephen Fuld wrote:

On 8/2/2025 10:30 PM, John Savard wrote:

On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:

You still haven't shown a single piece of code with your header scheme, >>>> I presume because it is to difficult even for you, the author of the
ISA.

I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic
essentials of how it works, I fail to realize how making the extra
effort
to smother that information in a mass of irrelevant detail is going to
make it any clearer to you.

I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real
programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not
be in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.

That is always a required step, but still not enough.

I.e when I first got the Itanium architecture manual (long before any CPUs/systems were available) I sat down and wrote some (to me)
interesting kernels, like medium-sized arbitrary precision math, up to a kbit or two, using carry-save in-register storage.

That persuaded me that it was possible for the Itanium do do these kinds
of calculations very fast indeed, but the architecure was still a
memorable failure.

Being fit for a number of hand-written asm kernels does not a generally useful cpu make.

I absolutely agree, though John seems reluctant to do even that despite Thomas's and my suggestions.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Sun Aug 3 22:14:51 2025

From Newsgroup: comp.arch

On Sun, 03 Aug 2025 12:50:05 -0700, Stephen Fuld wrote:

Yup. And as Robert Finch pointed out, what if the instruction that
needs the constant is the last instruction in the block?

The first thing one could do is precede that instruction by a NOP.

In Concertina II, the preferred way to achieve the same effect is to use a do-nothing header, because that wouldn't consume a whole cycle like a NOP might.

But I thought of that, and added a feature where instructions can
(provided a recent branch hadn't taken place) indicate that they're using
a saved copy of the preceding block, instead of the current block, for the constant.

Oh, I see you noticed that:

However, they _can_ point to another block, by means of a sixth bit
that some instructions have...

But using this capability isn't a solution, as it adds 32 bits to the
block, which pushes the last instruction in that block into the current block, which pushes the instruction that needs the immediate into the
next block and forces the extra nop anyway.

That isn't quite how it would work out.

Current issue...

I I I I I I I I#

When I fix it, to put the value in the current block, it pushes the
problem instruction to the next one,

(1) I I I I I I M1
I I#

so pointing to the previous block *does* solve the problem.

I# - instruction wanting to use a 32-bit pseudo-immediate constant
(1) - header that reserves one instruction slot at the end of the current block for a constant
M1 - constant value that's one 32-bit instruction slot long
I - plain 32-bit instruction

So, yes, it works just fine.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Sun Aug 3 22:43:04 2025

From Newsgroup: comp.arch

On Sat, 02 Aug 2025 18:57:43 +0000, John Savard wrote:

But I'm sneaky. Since this situation dismayed me all along with
Concertina II, I have what I call a "zero-overhead header". In the first instruction slot of a block, one may have a Type I header, which is a two-address operate instruction which *also* supplies a three-bit
_decode_ field, reserving slots for pseudo-immediates.

It had also provided a few extra bits to allow some other things to be
done without overhead.

Recently, I mistakenly thought I had the opportunity to add one extra bit
to this instruction, to give me the chance to point to a 35-bit
instruction without the overhead of a full 32-bit header. I thought that
might be too good to be true, though, so I did make preparations to revert
the change.

Well, indeed I did find the extra opcode space was not available. But I decided not to revert, but to correct things as they now were, because one result of the changes I had made was that the opcode range containing
operate instructions was now neater - and this applied to some other categories of instructions as well.

And even though I wasn't able to modify the Type I header as I had wished,
I ended up figuring out another attainable way of achieving my objective. Since both forms of the Augmented Short Instruction format of 32-bit instructions provided versions of the operate instructions with longer opcodes, I really didn't need to provide them in the Alternate 32-bit Instructions as well. So I took those out, and provided a stripped-down limited form of the memory-to-register operate instruction (which is what
the 35-bit instructions were, but without being stripped down) within the Alternate 32-bit Instructions... so now this capability is provided, at
least to an extent, by the Type I header.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Sun Aug 3 19:23:50 2025

From Newsgroup: comp.arch

Being fit for a number of hand-written asm kernels does not a generally useful cpu make.

Beside bignums, other "kernels" worth trying might be something like
a simple balanced binary tree, including some operation that
requires recursion, like counting the number of leaves.

And of course, trying to get a compiler to generate code vaguely similar
to the ASM you wrote by hand is always a good test, tho it may take
more effort.

Stefan
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Sun Aug 3 18:11:28 2025

From Newsgroup: comp.arch

On 8/3/2025 3:14 PM, John Savard wrote:

On Sun, 03 Aug 2025 12:50:05 -0700, Stephen Fuld wrote:

Yup. And as Robert Finch pointed out, what if the instruction that
needs the constant is the last instruction in the block?

The first thing one could do is precede that instruction by a NOP.

In Concertina II, the preferred way to achieve the same effect is to use a do-nothing header, because that wouldn't consume a whole cycle like a NOP might.

But I thought of that, and added a feature where instructions can
(provided a recent branch hadn't taken place) indicate that they're using
a saved copy of the preceding block, instead of the current block, for the constant.

Oh, I see you noticed that:

However, they _can_ point to another block, by means of a sixth bit
that some instructions have...

But using this capability isn't a solution, as it adds 32 bits to the
block, which pushes the last instruction in that block into the current
block, which pushes the instruction that needs the immediate into the
next block and forces the extra nop anyway.

That isn't quite how it would work out.

Current issue...

I I I I I I I I#

When I fix it, to put the value in the current block, it pushes the
problem instruction to the next one,

(1) I I I I I I M1
I I#

so pointing to the previous block *does* solve the problem.

OK, I see what you are saying.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Mon Aug 4 04:07:41 2025

From Newsgroup: comp.arch

On Sun, 03 Aug 2025 13:03:21 -0700, Stephen Fuld wrote:

I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.

While I'm not prepared to go to the trouble of creating a fleshed-out
example, a very short and trivial example will still indicate what my
goals are.

X = Y * 2.78 + Z

On a typical RISC architecture, this would involve instructions like this:

load 18, Y
load 19, K#0001
fmul 18, 18, 19
load 19, Z
fadd 18, 18, 19
fsto X

Six instructions, each 32 bits long.

On the IBM System/360, though, it would be something like

le 12, Y
me 12, K#0001
ae 12, Z
ste 12, x

All four instructions are memory-reference instructions, so they're also
32 bits long.

How would I do this on Concertina II?

Well, since the sequence has to start with a memory-reference, I can't use
the zero-overhead header (Type I). Instead, a Type XI header is in order;
that specifies a decode field, so that space can be reserved for a pseudo- immediate, and instruction slots can be indicated as containing
instructions from the alternate instruction set.

Then the instructions can be

lf 6,y
mfr 6,#2.78
af 6,z
stf 6,x

with the instruction "af" coming from the alternate 32-bit instruction set.

The other tricky precondition that must be met is to store z in a data
region that is only 4,096 bytes or less in size, prefaced with

USING *,23

or another register from 17 to 23 could be used as the base register, so
that it is addressed with a 12-bit displacement. (Also, register 6, from
the first eight registers, is used to do the arithmetic to meet the limitations of the "add floating" memory to register operate instruction
in the alternate instruction set.)

Because it uses a pseudo-immediate, which gets fetched along with the instruction stream, where the 360 uses a constant, it has an advantage
over the 360. On the other hand, while the actual code is the same length, there's also the 32-bit overhead of the header.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Aug 4 05:52:31 2025

From Newsgroup: comp.arch

On 2025-08-03, Stephen Fuld <sfuld@alumni.cmu.edu.invalid> wrote:

On 8/2/2025 2:12 AM, Thomas Koenig wrote:

Try writing an assembler and disassembler for what you have. I have
written this for Mitch's ISA, and it turned out to be very difficult
already.

I am curious as to what features you found difficult?

A few things.

First, I wrote this as a port of GNU binutils. binutils internals
are not very well documented. You do not to ELF stuff directly,
but rather you have to interface with BFD, which then does the
ELF stuff. And this interface is hairy, to say the least.

Second, there are very many instructions with the same name name,
but with different flags doing different things. Things like

add r1,r2,#Imm16 ! Different major opcode from the rest
add r1,r2,r3
add r1,-r2,r3
add r1,r2,-r3
add r1,-r2,-r3
add r1,r2,#Imm5
add r1,r2,#Imm32
add r1,r2,#Imm64

(the list is not complete, and each variant has its own combination
of flags) make things complex to begin with. Syntactically,
a 16-bit integer looks like a 32-bit integer, but a 16-bit
integer should be selected for size reasons.

There are also 47 different operand types at latest count, which
makes writing an assembler/disassembler somewhat error-prone.
(I think the complexity for the assembler works well for a user,
I find My 66000 assembly very easy to read and write).

But the most difficult part was getting the relocations and
fixups right, also for things like a (8,16,32 or 64-bit)
jump table instruction, and there the main problem was
a) getting thins straight in my head and b) interfacing
with BFD, see above.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Aug 4 16:56:13 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

On Sun, 03 Aug 2025 13:03:21 -0700, Stephen Fuld wrote:

I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real
programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.

While I'm not prepared to go to the trouble of creating a fleshed-out example, a very short and trivial example will still indicate what my
goals are.

X = Y * 2.78 + Z

On a typical RISC architecture, this would involve instructions like this:

load 18, Y
load 19, K#0001
fmul 18, 18, 19
load 19, Z
fadd 18, 18, 19
fsto X

If all the variables were in BSS.

My 66000 with its compiler:

double foo (double y, double z)
{
return y*2.78 + z;
}

yields

foo: ; @foo
; %bb.0:
fmac r1,r1,#0x40063D70A3D70A3D,r2
ret

One instruction for the arithmetic, one for the function return.
Here's the disassembly:

0000000000000000 <foo>:
0: 3021e040 fmac r1,r1,#0x4006337003370033,r2
4: 03370033
8: 40063370
c: 6be00000 ret

Six instructions, each 32 bits long.

On the IBM System/360, though, it would be something like

le 12, Y
me 12, K#0001
ae 12, Z
ste 12, x

With gcc -O2 -m31, on godbolt:

foo:
larl %r5,.L3
madb %f2,%f0,.L4-.L3(%r5)
ldr %f0,%f2
br %r14
.L3:
.L4:
.long 1074150768
.long -1546188227

All four instructions are memory-reference instructions, so they're also
32 bits long.

How would I do this on Concertina II?

Well, since the sequence has to start with a memory-reference, I can't use the zero-overhead header (Type I). Instead, a Type XI header is in order; that specifies a decode field, so that space can be reserved for a pseudo- immediate, and instruction slots can be indicated as containing
instructions from the alternate instruction set.

Then the instructions can be

lf 6,y
mfr 6,#2.78
af 6,z
stf 6,x

with the instruction "af" coming from the alternate 32-bit instruction set.

The other tricky precondition that must be met is to store z in a data region that is only 4,096 bytes or less in size, prefaced with

USING *,23

or another register from 17 to 23 could be used as the base register, so that it is addressed with a 12-bit displacement.

Using USING is just horrible, and this makes it worse. Where would
you need store this, in an executable page? Newer architectures
have read, write and execute bits on their page tables for a very
good reason.

And... would you like to have a stack in your architecture?

(Also, register 6, from
the first eight registers, is used to do the arithmetic to meet the limitations of the "add floating" memory to register operate instruction
in the alternate instruction set.)

Because it uses a pseudo-immediate, which gets fetched along with the instruction stream, where the 360 uses a constant, it has an advantage
over the 360. On the other hand, while the actual code is the same length, there's also the 32-bit overhead of the header.

Where is the advantage over putting a constant directly in
the instruction stream?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Tue Aug 5 02:10:40 2025

From Newsgroup: comp.arch

On Mon, 04 Aug 2025 16:56:13 +0000, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

The other tricky precondition that must be met is to store z in a data
region that is only 4,096 bytes or less in size, prefaced with

USING *,23

or another register from 17 to 23 could be used as the base register,
so that it is addressed with a 12-bit displacement.

Using USING is just horrible, and this makes it worse. Where would you
need store this, in an executable page? Newer architectures have read,
write and execute bits on their page tables for a very good reason.

Never fear. The virtual memory subsystem will indeed mark the DSECTs as writeable but not executable, and the CSECTs as execyhtabke but not
writeable. These operations being privileged, they don't take place in the user code.

And... would you like to have a stack in your architecture?

No. One always has to worry about stacks overflowing. The System/360 got
along just fine withoug a stack, faking one whenever the need arose.

To me, having stacks is just asking for trouble; they're a disaster
waiting to happen and a blatant security hole.

Because it uses a pseudo-immediate, which gets fetched along with the
instruction stream, where the 360 uses a constant, it has an advantage
over the 360. On the other hand, while the actual code is the same
length, there's also the 32-bit overhead of the header.

Where is the advantage over putting a constant directly in the
instruction stream?

One would need a different instruction format for each length of variable.

I'm trying to have either all instructions 32 bits long, of, if a limited variation in instruction length is allowed, the header indicates where
every instruction begins, so all instructions may decode in parallel.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Aug 4 20:37:04 2025

From Newsgroup: comp.arch

On 8/4/2025 9:56 AM, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

On Sun, 03 Aug 2025 13:03:21 -0700, Stephen Fuld wrote:

I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real
programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.

While I'm not prepared to go to the trouble of creating a fleshed-out
example, a very short and trivial example will still indicate what my
goals are.

X = Y * 2.78 + Z

On a typical RISC architecture, this would involve instructions like this: >>
load 18, Y
load 19, K#0001
fmul 18, 18, 19
load 19, Z
fadd 18, 18, 19
fsto X

If all the variables were in BSS.

My 66000 with its compiler:

double foo (double y, double z)
{
return y*2.78 + z;
}

yields

foo: ; @foo
; %bb.0:
fmac r1,r1,#0x40063D70A3D70A3D,r2
ret

One instruction for the arithmetic, one for the function return.
Here's the disassembly:

0000000000000000 <foo>:
0: 3021e040 fmac r1,r1,#0x4006337003370033,r2
4: 03370033
8: 40063370
c: 6be00000 ret

Six instructions, each 32 bits long.

On the IBM System/360, though, it would be something like

le 12, Y
me 12, K#0001
ae 12, Z
ste 12, x

With gcc -O2 -m31, on godbolt:

foo:
larl %r5,.L3
madb %f2,%f0,.L4-.L3(%r5)
ldr %f0,%f2
br %r14
.L3:
.L4:
.long 1074150768
.long -1546188227

All four instructions are memory-reference instructions, so they're also
32 bits long.

How would I do this on Concertina II?

Well, since the sequence has to start with a memory-reference, I can't use >> the zero-overhead header (Type I). Instead, a Type XI header is in order;
that specifies a decode field, so that space can be reserved for a pseudo- >> immediate, and instruction slots can be indicated as containing
instructions from the alternate instruction set.

Then the instructions can be

lf 6,y
mfr 6,#2.78
af 6,z
stf 6,x

with the instruction "af" coming from the alternate 32-bit instruction set.

So, if I got this right, four instructions plus 2 32 bit words, one for
the constant and one for the header required by the constant.

The other tricky precondition that must be met is to store z in a data
region that is only 4,096 bytes or less in size, prefaced with

USING *,23

or another register from 17 to 23 could be used as the base register, so
that it is addressed with a 12-bit displacement.

This shows why one should use more "complete" examples rather than
single statements for ISA comparisons. John showed the series of
instructions for the single source line as if it were pulled from the
middle of some program. But you showed, since you wanted something
actually compileable, a function/subroutine. This allowed you to assume
that all the inputs were already in registers, whereas John had to
include the instructions to load the values from memory and store the
result. If you had to do that, it would add two load instructions (for
Y and Z), and the store for the results. But the MY 66000 has the
advantage of the FMA, so, eliminating the return instruction, four instructions plus 32 bits for the constant. Of course, a more extensive example might show what inputs were already in registers, etc. If the
inputs were already in registers, then the MY 66000's instruction count
goes down, but the Concertina's doesn't

So, an apples to apples comparison gives the advantage to the MY 66000, primarily due to the FMA instruction and not requiring a header for the
inline immediate. But I still maintain a more "complete" example is
really needed.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue Aug 5 04:56:21 2025

From Newsgroup: comp.arch

John Savard <quadibloc@invalid.invalid> schrieb:

And... would you like to have a stack in your architecture?

No.

OK. I think that is the final nail in the coffin, I will
henceforth stop reading (and writing) about your architecture.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Tue Aug 5 16:26:52 2025

From Newsgroup: comp.arch

On Mon, 04 Aug 2025 20:37:04 -0700, Stephen Fuld wrote:

But I still maintain a more "complete" example is
really needed.

That may be. But now that the one most ardently seeking that has
identified my ISA as being dead to him, I'm not going to rush.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Tue Aug 5 09:51:11 2025

From Newsgroup: comp.arch

On 8/4/2025 9:56 PM, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

And... would you like to have a stack in your architecture?

No.

OK. I think that is the final nail in the coffin, I will
henceforth stop reading (and writing) about your architecture.

While I agree that having at least push and pop instructions would be beneficial, I hardly think that is the most "bizarre" and less than
useful aspect of John's architecture. After all, both of those
instructions can be accomplished by two "standard" instructions, a store
and an add (for push) and a load and subtract (for pop). Interchange
the add and the subtract if you want the stack to grow in the other
direction.

Of course, you are free to stop contributing on this topic, but I, for
one, will miss your contributions.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Tue Aug 5 18:23:36 2025

From Newsgroup: comp.arch

On 8/5/2025 11:51 AM, Stephen Fuld wrote:

On 8/4/2025 9:56 PM, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

And... would you like to have a stack in your architecture?

No.

OK. I think that is the final nail in the coffin, I will
henceforth stop reading (and writing) about your architecture.

While I agree that having at least push and pop instructions would be beneficial, I hardly think that is the most "bizarre" and less than
useful aspect of John's architecture. After all, both of those instructions can be accomplished by two "standard" instructions, a store
and an add (for push) and a load and subtract (for pop). Interchange
the add and the subtract if you want the stack to grow in the other direction.

Of course, you are free to stop contributing on this topic, but I, for
one, will miss your contributions.

The lack of dedicated PUSH/POP instructions IME has relatively little
direct impact on the usability of an ISA. Either way, one is likely to
need stack-frame adjustment, in which case PUSH/POP don't tend to offer
much over normal Load/Store instructions.

That said, a lot of John's other ideas come off to me like straight up absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.

--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Tue Aug 5 23:49:08 2025

From Newsgroup: comp.arch

On Tue, 05 Aug 2025 09:51:11 -0700, Stephen Fuld wrote:

While I agree that having at least push and pop instructions would be beneficial,

And I have now added exactly that to the architecture - as I note in the
new thread titled "By Popular Demand".

But subroutine calls still don't use them.

I've also added another requested feature while I was at it; allowing the
use of a 64-bit displacement without a base register but with an index.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Wed Aug 6 05:32:41 2025

From Newsgroup: comp.arch

Stephen Fuld <sfuld@alumni.cmu.edu.invalid> schrieb:

On 8/4/2025 9:56 PM, Thomas Koenig wrote:

John Savard <quadibloc@invalid.invalid> schrieb:

And... would you like to have a stack in your architecture?

No.

OK. I think that is the final nail in the coffin, I will
henceforth stop reading (and writing) about your architecture.

While I agree that having at least push and pop instructions would be beneficial, I hardly think that is the most "bizarre" and less than
useful aspect of John's architecture. After all, both of those
instructions can be accomplished by two "standard" instructions, a store
and an add (for push) and a load and subtract (for pop). Interchange
the add and the subtract if you want the stack to grow in the other direction.

What I meant was that, the way he described his addressind modes,
he was not considering a stack at all, even implemented by
the usual RISC method (which is better than push/pop, see the
special hoops that AMD64 has to jump through to fuse several
push or pop instructions into one - IIRC, it costs them a cycle
of pipeline length).

And stacks _are_ extremely efficient, as everybody except one
person knows, because they save memory and improve cache locality.

Of course, you are free to stop contributing on this topic, but I, for
one, will miss your contributions.

Hm, thanks. Maybe I'll look into it again.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Sun Aug 10 18:07:59 2025

From Newsgroup: comp.arch

On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:

That said, a lot of John's other ideas come off to me like straight up absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.

While I think that not being able to be put to use isn't really one of the faults of the Concertina II ISA, the block structure, especially at its current level of complexity, is going to come across as quite weird to
many, and I don't yet see any hope of achieving a drastic simplification
in that area.

Each of the sixteen block types serves one or another functionality which
I see as necessary to give this ISA the breadth of application that I have
as my goal.

But I have introduced "scaled displacements" back in, allowing the
augmented short instruction mode instruction set to be more powerful.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Sun Aug 10 18:59:29 2025

From Newsgroup: comp.arch

On 8/10/2025 1:07 PM, John Savard wrote:

On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:

That said, a lot of John's other ideas come off to me like straight up
absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.

While I think that not being able to be put to use isn't really one of the faults of the Concertina II ISA, the block structure, especially at its current level of complexity, is going to come across as quite weird to
many, and I don't yet see any hope of achieving a drastic simplification
in that area.

OK.

I judge things here by a few criteria:
Could be affordably implemented in hardware;
Would be usable and useful;
Mostly makes sense in terms of relative cost/benefit tradeoffs.

I am a little more pessimistic on things that I don't really feel
satisfy the above constraints.

For comparison, RISC-V mostly satisfies the above, although:
Many of the extensions are weaker on these points;
Some of the encodings, and the 'C' extension in general,
are badly dog chewed.

Then again, my ISA has potentially ended up with an excess of niche-case format converter instructions and similar.

Each of the sixteen block types serves one or another functionality which
I see as necessary to give this ISA the breadth of application that I have
as my goal.

Many make it work with plain 32-bit or 16/32 encodings.

Granted, I have ended up with more:
16/32/64/96, depending on ISA.
XG1, 16/32/64/96
XG2, 32/64/96
XG3, 32/64/96 (32/64 for RV ops)
RV, 16/32/(48)/64

Apparently, Huawei and similar have some 48-bit encodings defined for
RV64. In my sensibilities, 48-bit only makes sense if one is already
committed to 16 bit ops, but given how quickly they burnt through the
encoding space; practically the 48-bit space would just end up being a space-saving subset of the 64-bit space (in my experimental attempt to
deal with the 48-bit encodings, they were unpacked temporarily into the
64-bit encoding space).

Basically, they burnt through most of the 48-bit encoding space with a
handful of Imm32 and a few Disp32 ops. If it were me I would have gone
for Imm24 ops and had a little more encoding space left over.

Did experimentally mock up a 48-bit scheme that did basically extend the 32-bit space to have Imm24 (adding 12 bits to each Imm/Disp for all the Imm12/Disp12 ops), but it was a little dog chewed. Could potentially
lead alternate encodings for Imm32 constant load and Disp32 branch (by
adding 12 bits to LUI and JAL).

One can argue though, which would they rather have:
Pretty much all of the 32-bit immediate forms extended to 24 bits;
Or, 32-bits immediate values,
but only for a very limited range of ops.

Though, I suspect for general use, extending the whole ISA to 24 bits
might be "better" for average case code density (with 64-bit encodings
for cases when one needs Imm32).

Then again, I am on the fence about 48 bit encodings in general:
Helps code density;
Hurts performance for a cheap core;
Say, if one doesn't want to spend the cost of dealing with superscalar
for misaligned instructions and 16 bit ops (doing so would add
significant resource cost).

I did experiment with adding the C extension to BGBCC, and RV64GC+Jumbo
can seemingly get decent code density.

Granted, both are mostly similar here, both using 5-bit register fields.
Though, XG1 16-bit ops mostly have access to 16 registers;
And, RV-C ops mostly are a mix of 8 and 32 registers.

Did experiment with a pair encoding for XG3 (X3C), which doesn't match
either XG1 or RV64GC+Jumbo in terms of code density. But not too far off.

At the moment (Doom ".text" size, static-linked C library):
XG1: 275K
XG2: 290K
RV64GC+Jumbo: 295K (vs 350K RV+Jumbo, or 370K RV64GC)
XG3+X3C: 305K (vs 320K)

Granted, XG3 isn't designed for maximum code density, rather performance
and being able to merge with RV64G.

It is unclear if the improvement in code density (of X3C) would be worth
the added decoder cost (and doesn't fit in with the existing decoder
paths for XG1 or RVC; so would need something new/wacky to deal with it).

Though, could deal with it (in the core) in a similar way to how I dealt
with 48-bit ops, namely unpacking it to a 64-bit form (two instructions)
after fetch.

In theory, XG3 should be able to match XG2 code density as there isn't
really anything that XG2 has that XG3 lacks that would significantly
effect code density. XG3 did drop the 2RI-Imm10 ops, but these had
largely become redundant. So, the main difference is likely related to
BGBCC itself, which is mostly treating XG3 as an extension of its RV64G
mode (which "suffers" slightly by having less usable callee save
registers in the ABI, and fewer register arguments; but had on/off
considered tweaking the ABI here).

Though, if XG3 did match XG2 code density, X3C could potentially also
reduce it to 275K.

But, could just focus more on RV64GC here, as I sorta already needed it,
and recently found/fixed a bug in the decoder in my CPU core that was
stopping the 'C' extension from working (so now it seems to work).

Though, to recap (X3C):
X3C packs a 13 and 14 bit instruction together into a 32 bit word;
Which serves a similar purpose to RVC;
Though only allows instruction pairs which can safely co-execute.
Instructions encode:
MOV/ADD Rm5, Rn5
LI/ADD/ADDW Imm5s, Rn5
SUB/ADDW/ADDWU/AND/OR/XOR Rm3, Rn3
SLL/SRL/SRA Rm3, Rn3
SLLW/SRLW/SLAW/SRAW Rm3, Rn3
SLL/SRL/SRA Imm3, Rn3
SLLW/SRLW/SLAW/SRAW Imm3, Rn3
And, for the 14-bit case:
LD/SD/LW/SW Rn5, Disp5(SP)
LD/SD/LW/SW Rn3, Disp2(Rm3)
LB/LBU/LH/LHU Rn3, 0(Rm3)
SB/SH Rn3, 0(Rm3)

X3C was put into a hole in the encoding space that previously held the
PrWEX space (in XG1/XG2), but PrWEX is N/A in XG3. The WEX space is N/A
(used for RV encodings, and the large-constant instruction was replaced
with the XG3's Jumbo Prefix). Granted, the scope of X3C is more limited
than that of RV-C.

But I have introduced "scaled displacements" back in, allowing the
augmented short instruction mode instruction set to be more powerful.

OK.

Yeah, scaled displacements make sense.

Ironically, another one of my complaints about RVC is that while they
saved bits in the displacements, rather than doing something sane like changing scale based on type, they bit-sliced the displacements based on
type in a way that means it effectively has unique displacement
encodings for:
LW, Disp(SP)
SW, Disp(SP)
LD, Disp(SP)
SD, Disp(SP)
LW, Disp(Reg3)
SW, Disp(Reg3)
LD, Disp(Reg3)
SD, Disp(Reg3)
Which is, groan...

Would have been better, say, if all the encodings just sorta had Rd/Rs2
in the same spot and then not had separate Load/Store encoding.
IMHO, having Rd and Rs2 in the same location is a lesser evil than
having twice as many displacement types.

And, also adjusting scale is a lesser evil than separate bit slicing for
each type.

Though, it does lead to the partial irony that despite XG3 having a
longer listing than RV64G, when I wrote a VM that did both RV64 and XG3,
the XG3 decoder is smaller due partly due to "less dog chew".

The decoder is bigger in the Verilog core, but this is mostly because
XG1/2/3 all use a shared decoder. An XG3 exclusive decoder would be smaller.

Though, maybe moot if one is also going to need a RISC-V decoder, unless
I make a purely XG3 target that doesn't use any of the RV encodings.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Aug 11 10:27:08 2025

From Newsgroup: comp.arch

On 8/10/2025 11:07 AM, John Savard wrote:

On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:

That said, a lot of John's other ideas come off to me like straight up
absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.

While I think that not being able to be put to use isn't really one of the faults of the Concertina II ISA,

I am not sure what you are saying here. Is it the while you agree that
at least some features cannot be put to use, but that isn't the fault of
the ISA, or that the fault of not being able to be put to use doesn't
exist in the ISA?

the block structure, especially at its
current level of complexity, is going to come across as quite weird to
many, and I don't yet see any hope of achieving a drastic simplification
in that area.

Each of the sixteen block types serves one or another functionality which
I see as necessary to give this ISA the breadth of application that I have
as my goal.

While I agree that they meet your goals (at least as I understand them),
I think that you have two problems.

Your goals, even if you meet them aren't particularly useful, e.g. being "nearly" plug compatible with S/360

There are *far* simpler ways to accomplish what most people really want
to do.
--
- Stephen Fuld
(e-mail address disguised to prevent spam)
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Mon Aug 11 18:20:05 2025

From Newsgroup: comp.arch

On Mon, 11 Aug 2025 10:27:08 -0700, Stephen Fuld wrote:

On 8/10/2025 11:07 AM, John Savard wrote:

On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:

That said, a lot of John's other ideas come off to me like straight up
absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.

While I think that not being able to be put to use isn't really one of
the faults of the Concertina II ISA,

I am not sure what you are saying here. Is it the while you agree that
at least some features cannot be put to use, but that isn't the fault of
the ISA, or that the fault of not being able to be put to use doesn't
exist in the ISA?

What I was trying to say was that while the Concertina II ISA no doubt has many flaws, not being able to crank out useful work is, in my opinion, not
one of them.

On the other hand, driving insane those who attempt to program it or write compilers for it must be admitted to be an obstacle to making use of a
given CPU, and so I must admit to its usability being limited in that
manner.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Mon Aug 11 18:33:14 2025

From Newsgroup: comp.arch

On Mon, 11 Aug 2025 10:27:08 -0700, Stephen Fuld wrote:

Your goals, even if you meet them aren't particularly useful, e.g. being "nearly" plug compatible with S/360

There are *far* simpler ways to accomplish what most people really want
to do.

Being plug-compatible with System/360 is not among the goals of my ISA.
The term "plug-compatible" refers to... _plugs_, as one might guess.
Nothing in my ISA talks about stuff like USB ports, Centronics parallel ports... or the kind of port IBM used to connect a 1403 printer to a System/360 computer.

There are certainly far simpler ways to run System/360 code correctly.
One can just set a mode bit to enter System/360 emulation, for example.

What I'm doing with the Type V header is to provide a way to imitate the behavior of a System/360 program after code conversion. So one could write
a special FORTRAN compiler to generate code using this header to allow a FORTRAN program running on the Concertina II to deliver the same results
as on a System/360.

And this isn't simple because it's buried deep down in the instruction set
as an _afterthought_ within an ISA which is primarily designed to do the
same sort of work as one might do with an x86-64 chip or a PowerPC chip or
a SPARC chip even. And secondarily designed to be capable of
implementations which shine at whatever the TMS20C6000 shines at, or even whatever, if anything, the Itanium was good for.

It may not, however, be lost on implementors that a full implementation of
the Type V header stuff ends up putting the needed circuitry on the die to *provide* a very nice System/360 emulation or implementation, which they
might offer as an added feature not defined in the Concertina II specification.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Mon Aug 11 19:16:06 2025

From Newsgroup: comp.arch

On Mon, 11 Aug 2025 18:33:14 +0000, John Savard wrote:

implementations which shine at whatever the TMS20C6000 shines at, or

Oops, the TMS320C6000.

John Savard
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Sat Aug 23 00:05:56 2025
  from Moore, Ok via Telnet
- Noozle
  Fri Aug 22 11:07:42 2025
  from Noozle City via Telnet
- Microbot
  Fri Aug 22 01:53:59 2025
  from Moore, Ok via Telnet
- Microbot
  Thu Aug 21 03:21:53 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,064
Nodes:	10 (0 / 10)
Uptime:	148:08:55
Calls:	13,691
Calls today:	1
Files:	186,936
D/L today:	33 files (6,120K bytes)
Messages:	2,410,932

Pseudo-Immediates as Part of the Instruction

Who's Online

Recent Visitors

System Info