The 5-bit header is part of the 128-bit thingy without being part of any
of the 41-bit thingies. That is the limbo in which my pseudo-immediates
are found. Data? Or a field in the instruction? It can be either one, depending on whether you define each individual 32-bit instruction as an instruction, or the 256-bit block as the "real" instruction the
architecture executes.
I couldn't locate a post I finally felt I was ready to respond to, which
was in reply to one of my posts about Concertina II, which said that immediates ought to be properly considered part of the instruction.
Well, in nearly all computer architectures, immediates _are_ part of the instruction, and quite obviously so.
But what Concertina II has are *pseudo* immediates. That is, they're not really immediates, but they pretend to be.
What does this mean? What could this mean?
Well, in my register-to-register operate instruction, associated with each _source_ register field, there's a bit which, if set, says that the five bits in the field aren't a register specifier, but a pointer to a constant.
A constant that's addressed by an instruction isn't an immediate; it's a constant. So why do I even call these constants "pseudo-immediates" then?
Well, that pointer - five bits long - is an awfully short pointer. Where does it point?
Question: Do the pointers point to the same block only, or also to other blocks? With 5 bits, you could address others as well. Can you give an example of their use, including the block headers?
On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:
Question: Do the pointers point to the same block only, or also to other
blocks? With 5 bits, you could address others as well. Can you give an
example of their use, including the block headers?
Actually, no, 5 bits are only enough to point within the same block.
That's because it's a byte pointer, as it can be used to point to any type
of constant, including single byte constants.
This is despite the fact that I do have an instruction format for conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).
However, they _can_ point to another block, by means of a sixth bit that
some instructions have... but when this happens, it does not trigger an
extra fetch from memory. Instead, the data is retrieved from a copy of an earlier block in the instruction stream that's saved in a special
register... so as to reduce potential NOP-style problems.
John Savard
Well, that pointer - five bits long - is an awfully short pointer. Where
does it point?
Instructions are fetched in blocks that are 256 bits long. One of the
things this allows for is for the block to begin with a header that
specifies that a certain number of 32-bit instruction slots at the end
of the current block are to be skipped over in the sequence of
instructions to be executed; this space can be used for constants.
I tried something similar to this but without block headers and it
worked okay. But there were a couple of issues. One was the last
instruction in cache line could not have an immediate. Or instructions
had to stop before the end of the cache line to accommodate immediates.
This resulted in some wasted space.
Also, it made reading listings more difficult as constants were in the
middle of sequences of instructions.
On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:
Question: Do the pointers point to the same block only, or also to other
blocks? With 5 bits, you could address others as well. Can you give an
example of their use, including the block headers?
Actually, no, 5 bits are only enough to point within the same block.
That's because it's a byte pointer, as it can be used to point to any type
of constant, including single byte constants.
This is despite the fact that I do have an instruction format for conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).
However, they _can_ point to another block, by means of a sixth bit that some instructions have...
John Savard <quadibloc@invalid.invalid> schrieb:
This is despite the fact that I do have an instruction format for
conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).
Is there a reason for that? On the face of it, having both makes no
sense.
But even so: Having a single, let's say, 32-bit immedate would require a 32-bit header and a 32-bit constant, so 64 bits used instead of directly encoding a 32-bit constant.
Since operate instructions are the most common type of instruction, if one can re-arrange instructions a little, one might be able to have these pseudo-imediates *without* the crushing burden of a 32-bit overhead!
You still haven't shown a single piece of code with your header scheme,
I presume because it is to difficult even for you, the author of the
ISA.
On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:
You still haven't shown a single piece of code with your header scheme,
I presume because it is to difficult even for you, the author of the
ISA.
I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic essentials of how it works, I fail to realize how making the extra effort
to smother that information in a mass of irrelevant detail is going to
make it any clearer to you.
John Savard <quadibloc@invalid.invalid> schrieb:
On Fri, 01 Aug 2025 18:08:17 +0000, Thomas Koenig wrote:
Question: Do the pointers point to the same block only, or also to other >>> blocks? With 5 bits, you could address others as well. Can you give an
example of their use, including the block headers?
Actually, no, 5 bits are only enough to point within the same block.
That's because it's a byte pointer, as it can be used to point to any type >> of constant, including single byte constants.
This is despite the fact that I do have an instruction format for
conventional style byte immediates (and I've just squeezed in one for
16-bit immediates as well).
Is there a reason for that? On the face of it, having both makes
no sense.
But even so: Having a single, let's say, 32-bit immedate would require
a 32-bit header and a 32-bit constant, so 64 bits used instead of
directly encoding a 32-bit constant.
However, they _can_ point to another block, by means of a sixth bit that
some instructions have...
Try writing an assembler and disassembler for what you have. I have
written this for Mitch's ISA, and it turned out to be very difficult
already.
On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:
You still haven't shown a single piece of code with your header scheme,
I presume because it is to difficult even for you, the author of the
ISA.
I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic essentials of how it works, I fail to realize how making the extra effort
to smother that information in a mass of irrelevant detail is going to
make it any clearer to you.
On 8/2/2025 10:30 PM, John Savard wrote:That is always a required step, but still not enough.
On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:
You still haven't shown a single piece of code with your header scheme,
I presume because it is to difficult even for you, the author of the
ISA.
I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic
essentials of how it works, I fail to realize how making the extra effort
to smother that information in a mass of irrelevant detail is going to>> make it any clearer to you.
I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write actual code for a real program.
Stephen Fuld wrote:
On 8/2/2025 10:30 PM, John Savard wrote:
On Sat, 02 Aug 2025 19:23:01 +0000, Thomas Koenig wrote:
You still haven't shown a single piece of code with your header scheme, >>>> I presume because it is to difficult even for you, the author of the
ISA.
I can understand how you might feel that way, but if my block structure
isn't understandable when illustrated by diagrams showing the basic
essentials of how it works, I fail to realize how making the extra
effort
to smother that information in a mass of irrelevant detail is going to
make it any clearer to you.
I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real
programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not
be in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.
That is always a required step, but still not enough.
I.e when I first got the Itanium architecture manual (long before any CPUs/systems were available) I sat down and wrote some (to me)
interesting kernels, like medium-sized arbitrary precision math, up to a kbit or two, using carry-save in-register storage.
That persuaded me that it was possible for the Itanium do do these kinds
of calculations very fast indeed, but the architecure was still a
memorable failure.
Being fit for a number of hand-written asm kernels does not a generally useful cpu make.
Yup. And as Robert Finch pointed out, what if the instruction that
needs the constant is the last instruction in the block?
However, they _can_ point to another block, by means of a sixth bit
that some instructions have...
But using this capability isn't a solution, as it adds 32 bits to the
block, which pushes the last instruction in that block into the current block, which pushes the instruction that needs the immediate into the
next block and forces the extra nop anyway.
But I'm sneaky. Since this situation dismayed me all along with
Concertina II, I have what I call a "zero-overhead header". In the first instruction slot of a block, one may have a Type I header, which is a two-address operate instruction which *also* supplies a three-bit
_decode_ field, reserving slots for pseudo-immediates.
Being fit for a number of hand-written asm kernels does not a generally useful cpu make.
On Sun, 03 Aug 2025 12:50:05 -0700, Stephen Fuld wrote:
Yup. And as Robert Finch pointed out, what if the instruction that
needs the constant is the last instruction in the block?
The first thing one could do is precede that instruction by a NOP.
In Concertina II, the preferred way to achieve the same effect is to use a do-nothing header, because that wouldn't consume a whole cycle like a NOP might.
But I thought of that, and added a feature where instructions can
(provided a recent branch hadn't taken place) indicate that they're using
a saved copy of the preceding block, instead of the current block, for the constant.
Oh, I see you noticed that:
However, they _can_ point to another block, by means of a sixth bit
that some instructions have...
But using this capability isn't a solution, as it adds 32 bits to the
block, which pushes the last instruction in that block into the current
block, which pushes the instruction that needs the immediate into the
next block and forces the extra nop anyway.
That isn't quite how it would work out.
Current issue...
I I I I I I I I#
When I fix it, to put the value in the current block, it pushes the
problem instruction to the next one,
(1) I I I I I I M1
I I#
so pointing to the previous block *does* solve the problem.
I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.
On 8/2/2025 2:12 AM, Thomas Koenig wrote:
Try writing an assembler and disassembler for what you have. I have
written this for Mitch's ISA, and it turned out to be very difficult
already.
I am curious as to what features you found difficult?
On Sun, 03 Aug 2025 13:03:21 -0700, Stephen Fuld wrote:
I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real
programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.
While I'm not prepared to go to the trouble of creating a fleshed-out example, a very short and trivial example will still indicate what my
goals are.
X = Y * 2.78 + Z
On a typical RISC architecture, this would involve instructions like this:
load 18, Y
load 19, K#0001
fmul 18, 18, 19
load 19, Z
fadd 18, 18, 19
fsto X
Six instructions, each 32 bits long.
On the IBM System/360, though, it would be something like
le 12, Y
me 12, K#0001
ae 12, Z
ste 12, x
All four instructions are memory-reference instructions, so they're also
32 bits long.
How would I do this on Concertina II?
Well, since the sequence has to start with a memory-reference, I can't use the zero-overhead header (Type I). Instead, a Type XI header is in order; that specifies a decode field, so that space can be reserved for a pseudo- immediate, and instruction slots can be indicated as containing
instructions from the alternate instruction set.
Then the instructions can be
lf 6,y
mfr 6,#2.78
af 6,z
stf 6,x
with the instruction "af" coming from the alternate 32-bit instruction set.
The other tricky precondition that must be met is to store z in a data region that is only 4,096 bytes or less in size, prefaced with
USING *,23
or another register from 17 to 23 could be used as the base register, so that it is addressed with a 12-bit displacement.
(Also, register 6, from
the first eight registers, is used to do the arithmetic to meet the limitations of the "add floating" memory to register operate instruction
in the alternate instruction set.)
Because it uses a pseudo-immediate, which gets fetched along with the instruction stream, where the 360 uses a constant, it has an advantage
over the 360. On the other hand, while the actual code is the same length, there's also the 32-bit overhead of the header.
John Savard <quadibloc@invalid.invalid> schrieb:
The other tricky precondition that must be met is to store z in a data
region that is only 4,096 bytes or less in size, prefaced with
USING *,23
or another register from 17 to 23 could be used as the base register,
so that it is addressed with a 12-bit displacement.
Using USING is just horrible, and this makes it worse. Where would you
need store this, in an executable page? Newer architectures have read,
write and execute bits on their page tables for a very good reason.
And... would you like to have a stack in your architecture?
Because it uses a pseudo-immediate, which gets fetched along with the
instruction stream, where the 360 uses a constant, it has an advantage
over the 360. On the other hand, while the actual code is the same
length, there's also the 32-bit overhead of the header.
Where is the advantage over putting a constant directly in the
instruction stream?
John Savard <quadibloc@invalid.invalid> schrieb:
On Sun, 03 Aug 2025 13:03:21 -0700, Stephen Fuld wrote:
I suspect that the purpose of Thomas's suggestion wasn't to make the
design clearer to him, but to force you to discover/think about the
utility and ease of use of some of the features you propose *in real
programs* . If a typical programmer can't figure out how to use some
CPU feature, it probably won't be used, and thus probably should not be
in the architecture. The best way to learn about what features are
useful is to try to use them! and the best way to do that is to write
actual code for a real program.
While I'm not prepared to go to the trouble of creating a fleshed-out
example, a very short and trivial example will still indicate what my
goals are.
X = Y * 2.78 + Z
On a typical RISC architecture, this would involve instructions like this: >>
load 18, Y
load 19, K#0001
fmul 18, 18, 19
load 19, Z
fadd 18, 18, 19
fsto X
If all the variables were in BSS.
My 66000 with its compiler:
double foo (double y, double z)
{
return y*2.78 + z;
}
yields
foo: ; @foo
; %bb.0:
fmac r1,r1,#0x40063D70A3D70A3D,r2
ret
One instruction for the arithmetic, one for the function return.
Here's the disassembly:
0000000000000000 <foo>:
0: 3021e040 fmac r1,r1,#0x4006337003370033,r2
4: 03370033
8: 40063370
c: 6be00000 ret
Six instructions, each 32 bits long.
On the IBM System/360, though, it would be something like
le 12, Y
me 12, K#0001
ae 12, Z
ste 12, x
With gcc -O2 -m31, on godbolt:
foo:
larl %r5,.L3
madb %f2,%f0,.L4-.L3(%r5)
ldr %f0,%f2
br %r14
.L3:
.L4:
.long 1074150768
.long -1546188227
All four instructions are memory-reference instructions, so they're also
32 bits long.
How would I do this on Concertina II?
Well, since the sequence has to start with a memory-reference, I can't use >> the zero-overhead header (Type I). Instead, a Type XI header is in order;
that specifies a decode field, so that space can be reserved for a pseudo- >> immediate, and instruction slots can be indicated as containing
instructions from the alternate instruction set.
Then the instructions can be
lf 6,y
mfr 6,#2.78
af 6,z
stf 6,x
with the instruction "af" coming from the alternate 32-bit instruction set.
The other tricky precondition that must be met is to store z in a data
region that is only 4,096 bytes or less in size, prefaced with
USING *,23
or another register from 17 to 23 could be used as the base register, so
that it is addressed with a 12-bit displacement.
And... would you like to have a stack in your architecture?
No.
But I still maintain a more "complete" example is
really needed.
John Savard <quadibloc@invalid.invalid> schrieb:
And... would you like to have a stack in your architecture?
No.
OK. I think that is the final nail in the coffin, I will
henceforth stop reading (and writing) about your architecture.
On 8/4/2025 9:56 PM, Thomas Koenig wrote:
John Savard <quadibloc@invalid.invalid> schrieb:
And... would you like to have a stack in your architecture?
No.
OK. I think that is the final nail in the coffin, I will
henceforth stop reading (and writing) about your architecture.
While I agree that having at least push and pop instructions would be beneficial, I hardly think that is the most "bizarre" and less than
useful aspect of John's architecture. After all, both of those instructions can be accomplished by two "standard" instructions, a store
and an add (for push) and a load and subtract (for pop). Interchange
the add and the subtract if you want the stack to grow in the other direction.
Of course, you are free to stop contributing on this topic, but I, for
one, will miss your contributions.
While I agree that having at least push and pop instructions would be beneficial,
On 8/4/2025 9:56 PM, Thomas Koenig wrote:
John Savard <quadibloc@invalid.invalid> schrieb:
And... would you like to have a stack in your architecture?
No.
OK. I think that is the final nail in the coffin, I will
henceforth stop reading (and writing) about your architecture.
While I agree that having at least push and pop instructions would be beneficial, I hardly think that is the most "bizarre" and less than
useful aspect of John's architecture. After all, both of those
instructions can be accomplished by two "standard" instructions, a store
and an add (for push) and a load and subtract (for pop). Interchange
the add and the subtract if you want the stack to grow in the other direction.
Of course, you are free to stop contributing on this topic, but I, for
one, will miss your contributions.
That said, a lot of John's other ideas come off to me like straight up absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.
On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:
That said, a lot of John's other ideas come off to me like straight up
absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.
While I think that not being able to be put to use isn't really one of the faults of the Concertina II ISA, the block structure, especially at its current level of complexity, is going to come across as quite weird to
many, and I don't yet see any hope of achieving a drastic simplification
in that area.
Each of the sixteen block types serves one or another functionality which
I see as necessary to give this ISA the breadth of application that I have
as my goal.
But I have introduced "scaled displacements" back in, allowing the
augmented short instruction mode instruction set to be more powerful.
John Savard
On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:
That said, a lot of John's other ideas come off to me like straight up
absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.
While I think that not being able to be put to use isn't really one of the faults of the Concertina II ISA,
the block structure, especially at its
current level of complexity, is going to come across as quite weird to
many, and I don't yet see any hope of achieving a drastic simplification
in that area.
Each of the sixteen block types serves one or another functionality which
I see as necessary to give this ISA the breadth of application that I have
as my goal.
On 8/10/2025 11:07 AM, John Savard wrote:
On Tue, 05 Aug 2025 18:23:36 -0500, BGB wrote:
That said, a lot of John's other ideas come off to me like straight up
absurdity. So, I wouldn't hold up much hope personally for it to turn
into much usable.
While I think that not being able to be put to use isn't really one of
the faults of the Concertina II ISA,
I am not sure what you are saying here. Is it the while you agree that
at least some features cannot be put to use, but that isn't the fault of
the ISA, or that the fault of not being able to be put to use doesn't
exist in the ISA?
Your goals, even if you meet them aren't particularly useful, e.g. being "nearly" plug compatible with S/360
There are *far* simpler ways to accomplish what most people really want
to do.
implementations which shine at whatever the TMS20C6000 shines at, or
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,064 |
Nodes: | 10 (0 / 10) |
Uptime: | 148:08:55 |
Calls: | 13,691 |
Calls today: | 1 |
Files: | 186,936 |
D/L today: |
33 files (6,120K bytes) |
Messages: | 2,410,932 |