• Tough Decisions

    From John Savard@quadibloc@invalid.invalid to comp.arch on Mon Jul 28 03:48:54 2025
    From Newsgroup: comp.arch

    Adding the new header type which was indicated by only five leading
    bits, thus consuming 1/32nd of the opcode space, to provide for
    fourteen-way superscalar operation has proved to lead to an overly
    severe shortage of available opcode space.

    Thus, I've removed that header type, and I've used the opcode space
    freed up to allow two other header types to contribute to the feature
    of allowing pseudo-immediates to be found in a saved previous block
    instead of the current block - which feature reduces the need to pad
    a block to make everything fit. One of those two header types is the zero-overhead header, which definitely makes this feature more useful.

    The zero-overhead header now can also call for augmented short instruction
    type instructions to follow it, which allows the use of 15-bit short instructions instead of the more restrictive 14-bit short instructions
    more often.

    As well, this made room for a P bit in some operate instructions - which,
    of course, was needed for that feature to work at all, the P bit in
    the instruction beng what indicates when that feture is used.

    Also, a mistake in the diagrams for the 64-bit block-aware long
    instructions was corrected.

    So progress is being made; adding the 14-way superscalar header caused
    some fat to be taken away, but now that opcode space is put back where
    it will do the most good, I hope.

    John Savard
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Jul 28 04:39:56 2025
    From Newsgroup: comp.arch

    John Savard <quadibloc@invalid.invalid> schrieb:

    The zero-overhead header now can also call for augmented short instruction type instructions to follow it, which allows the use of 15-bit short instructions instead of the more restrictive 14-bit short instructions
    more often.

    A question regarding your various header types.

    In order to avoid NOP hell, these have to be filled with the
    same instruction length (say 15 bits, like you said above),
    is that correct?

    You don't have a compiler (obviously), but have you looked at actual
    code generated by compilers (for example on godbolt, which is very
    good) and checked which instruction sequence would fit which header?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Savard@quadibloc@invalid.invalid to comp.arch on Mon Jul 28 10:05:31 2025
    From Newsgroup: comp.arch

    On Mon, 28 Jul 2025 04:39:56 +0000, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    In order to avoid NOP hell, these have to be filled with the same
    instruction length (say 15 bits, like you said above),
    is that correct?

    A block is 256 bits in length, and normally, in the absence of a
    header, consists of eight 32-bit instructions. Blocks need, basically,
    to be filled with 32-bit instructions to avoid NOPs.

    To improve the compactness of code, I have made provision for short instructions; these are nominally 16 bits long, as they're packed
    two to a 32-bit word. They can, in reality, be 14 bits long or 15 bits
    long, depending on the overhead of indicating them, or 17 bits long or
    even longer if some additional bits are provided in the header.

    Some headers provide for variable-length instructions. Those headers,
    instead of just being either 32 bits or 64 bits long, can also be
    48 bits long.

    The feature I mentioned in the post to which you are replying,
    for avoiding NOP issues, has to do with data, specifically immediate
    values in instructions, rather than the instructions themselves.

    In order to keep length decoding of instructions simple - or allow code
    that consists only of 32-bit instructions - but allow immediate values
    of any length, what I had done was come up with a scheme that does the following:

    In a register-to-register operate instruction, a source register
    specification can be replaced with a 5-bit pointer to data; it points
    to one of the bytes in a 256-bit block of instructions.

    This feature requires a header for use; in the simplest case, a header
    may contain a 3-bit field that specifies how many 32-bit instruction words
    are to be left unused at the end of a block.

    So I was avoiding NOP issues by saving old instruction blocks, and allowing
    a pointer to a "pseudo-immediate" to point into a saved block instead of
    the current block. Yes, that will require flexibility on the part of the compiler; more work on its part, and it will be tricky to code. I think
    it's possible in the current state of the art, though.

    You don't have a compiler (obviously), but have you looked at actual
    code generated by compilers (for example on godbolt, which is very good)
    and checked which instruction sequence would fit which header?

    Not yet, I have to admit.

    John Savard
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Jul 28 07:41:59 2025
    From Newsgroup: comp.arch

    On 7/28/2025 3:05 AM, John Savard wrote:
    On Mon, 28 Jul 2025 04:39:56 +0000, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    In order to avoid NOP hell, these have to be filled with the same
    instruction length (say 15 bits, like you said above),
    is that correct?

    A block is 256 bits in length, and normally, in the absence of a
    header, consists of eight 32-bit instructions. Blocks need, basically,
    to be filled with 32-bit instructions to avoid NOPs.

    To improve the compactness of code, I have made provision for short instructions; these are nominally 16 bits long, as they're packed
    two to a 32-bit word. They can, in reality, be 14 bits long or 15 bits
    long, depending on the overhead of indicating them, or 17 bits long or
    even longer if some additional bits are provided in the header.

    Some headers provide for variable-length instructions. Those headers,
    instead of just being either 32 bits or 64 bits long, can also be
    48 bits long.

    The feature I mentioned in the post to which you are replying,
    for avoiding NOP issues, has to do with data, specifically immediate
    values in instructions, rather than the instructions themselves.

    In order to keep length decoding of instructions simple - or allow code
    that consists only of 32-bit instructions - but allow immediate values
    of any length, what I had done was come up with a scheme that does the following:

    In a register-to-register operate instruction, a source register specification can be replaced with a 5-bit pointer to data; it points
    to one of the bytes in a 256-bit block of instructions.

    This feature requires a header for use; in the simplest case, a header
    may contain a 3-bit field that specifies how many 32-bit instruction words are to be left unused at the end of a block.

    So I was avoiding NOP issues by saving old instruction blocks, and allowing
    a pointer to a "pseudo-immediate" to point into a saved block instead of
    the current block. Yes, that will require flexibility on the part of the compiler; more work on its part, and it will be tricky to code. I think
    it's possible in the current state of the art, though.

    You don't have a compiler (obviously), but have you looked at actual
    code generated by compilers (for example on godbolt, which is very good)
    and checked which instruction sequence would fit which header?

    Not yet, I have to admit.


    Another alternative is to pick some "typical" problems that you would
    expect to take a few dozen instructions to code, and write the
    "assembler" code by hand for them in your proposed architecture. That
    would give you an "upper limit" on density.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From BGB@cr88192@gmail.com to comp.arch on Mon Jul 28 12:47:24 2025
    From Newsgroup: comp.arch

    On 7/28/2025 5:05 AM, John Savard wrote:
    On Mon, 28 Jul 2025 04:39:56 +0000, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    In order to avoid NOP hell, these have to be filled with the same
    instruction length (say 15 bits, like you said above),
    is that correct?

    A block is 256 bits in length, and normally, in the absence of a
    header, consists of eight 32-bit instructions. Blocks need, basically,
    to be filled with 32-bit instructions to avoid NOPs.

    To improve the compactness of code, I have made provision for short instructions; these are nominally 16 bits long, as they're packed
    two to a 32-bit word. They can, in reality, be 14 bits long or 15 bits
    long, depending on the overhead of indicating them, or 17 bits long or
    even longer if some additional bits are provided in the header.


    One possibility could be, rather than going the 16-bit route, having a
    more limited set of "split" instructions, which encode two instructions
    in a single 32-bit unit; albeit limited in what instructions that can co-execute.

    These could potentially be encoded while still retaining 32-bit
    alignment, though would naturally be more limited than full 16-bit
    encodings.

    If I were to do similar in XG3, one possibility might be:
    yyyyyyyyyyyyy-xxxxxxxxxxxxx-z11100
    Giving 13 bits of encoding space (with an extra bit for X).

    Where, in this case, can assume that predicated jumbo-prefixes are not a
    thing (the prefixes could always be assumed unconditional).
    This leaves 11100 and1 11101 as unused blocks.
    With 11110 being the normal XG3 jumbo-prefix space.



    Possibly (z is assumed 0 in Y):
    0-mmmmmnnnnn000: MOV Rm, Rn
    0-mmmmmnnnnn001: ADD Rm, Rn
    0-iiiiinnnnn010: MOV Imm5s, Rn
    0-iiiiinnnnn011: ADD Imm5s, Rn

    0-00mmm00nnn101: SUB Rm3, Rn3
    0-00mmm01nnn101: XOR Rm3, Rn3
    0-00mmm10nnn101: AND Rm3, Rn3
    0-00mmm11nnn101: OR Rm3, Rn3

    0-01mmm00nnn101: SUBW Rm3, Rn3 //SUBS.L Rn3, Rm3, Rn3
    0-01mmm01nnn101: ADDW Rm3, Rn3 //ADDS.L Rn3, Rm3, Rn3
    0-01mmm10nnn101: - Rm3, Rn3
    0-01mmm11nnn101: - Rm3, Rn3

    0-10mmm00nnn101: SLL Rm3, Rn3 //SHLD.Q Rn3, Rm3, Rn3
    0-10mmm01nnn101: SLA Rm3, Rn3 //SHAD.Q Rn3, Rm3, Rn3
    0-10mmm10nnn101: SRL Rm3, Rn3 //SHLR.Q Rn3, Rm3, Rn3
    0-10mmm11nnn101: SRA Rm3, Rn3 //SHAR.Q Rn3, Rm3, Rn3

    X position only:
    1-0ddddnnnnn010: MOV.L (SP, Disp4), Rn
    1-1ddddnnnnn010: MOV.Q (SP, Disp4), Rn
    1-0ddddnnnnn011: MOV.L Rn, (SP, Disp4)
    1-1ddddnnnnn011: MOV.Q Rn, (SP, Disp4)
    1-ddmmm00nnn100: MOV.L (Rm3, Disp2), Rn3
    1-ddmmm01nnn100: MOV.Q (Rm3, Disp2), Rn3
    1-ddmmm10nnn100: MOV.L Rn3, (Rm3, Disp2)
    1-ddmmm11nnn100: MOV.Q Rn3, (Rm3, Disp2)


    Where, probably register encodings could go into several groups:
    5-bit, only for a few instructions.
    3-bit, R8..R15 (typical)

    When using the RV ABI, this could map to the same register space as the
    RISC-V 'C' extension.

    Would skip instructions that are not "super common". For example, RV-C
    gives 16-bit encodings for some instructions which are unlikely to be
    used sufficiently often to justify having 16-bit encodings.

    Whereas, being able to compact things like MOV and ADD instructions is,
    or MOV+LD, MOV+ST, ... would likely be higher priorities.


    Worth it? Debatable.

    Also, unlike more traditional 16/32, this doesn't have the same level of precedence (so harder to say if it is unencumbered). AFAIK/IIRC, about
    the only ISA that I am aware of ATM that really went this sort of
    direction was Qualcomm Hexagon.


    In theory, this could regain some of what was "lost" by XG3 being
    incompatible with the RV-C encodings. Albeit, XG3 still seemingly gets
    better code density than RG64GC despite the lack of 16-bit instructions,
    so I didn't see it as a huge loss.

    Though, in theory this scheme is a lot more limited than the 'C'
    encodings, so the code-size savings would likely be smaller.


    Some headers provide for variable-length instructions. Those headers,
    instead of just being either 32 bits or 64 bits long, can also be
    48 bits long.

    The feature I mentioned in the post to which you are replying,
    for avoiding NOP issues, has to do with data, specifically immediate
    values in instructions, rather than the instructions themselves.

    In order to keep length decoding of instructions simple - or allow code
    that consists only of 32-bit instructions - but allow immediate values
    of any length, what I had done was come up with a scheme that does the following:

    In a register-to-register operate instruction, a source register specification can be replaced with a 5-bit pointer to data; it points
    to one of the bytes in a 256-bit block of instructions.

    This feature requires a header for use; in the simplest case, a header
    may contain a 3-bit field that specifies how many 32-bit instruction words are to be left unused at the end of a block.

    So I was avoiding NOP issues by saving old instruction blocks, and allowing
    a pointer to a "pseudo-immediate" to point into a saved block instead of
    the current block. Yes, that will require flexibility on the part of the compiler; more work on its part, and it will be tricky to code. I think
    it's possible in the current state of the art, though.

    You don't have a compiler (obviously), but have you looked at actual
    code generated by compilers (for example on godbolt, which is very good)
    and checked which instruction sequence would fit which header?

    Not yet, I have to admit.


    I had compiler first.

    It was when I started working on my own compiler that it quickly became apparent all the ways the SuperH ISA was lacking.

    But, at the time, didn't just abandon my existing effort for RISC-V.

    Whether or not all the years I burnt on all this has been worthwhile, is debatable...

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Tue Jul 29 05:12:10 2025
    From Newsgroup: comp.arch

    John Savard <quadibloc@invalid.invalid> schrieb:
    On Mon, 28 Jul 2025 04:39:56 +0000, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    In order to avoid NOP hell, these have to be filled with the same
    instruction length (say 15 bits, like you said above),
    is that correct?

    A block is 256 bits in length, and normally, in the absence of a
    header, consists of eight 32-bit instructions. Blocks need, basically,
    to be filled with 32-bit instructions to avoid NOPs.

    That sounds straightforward.

    To improve the compactness of code, I have made provision for short instructions; these are nominally 16 bits long, as they're packed
    two to a 32-bit word.

    Still straightforward.

    They can, in reality, be 14 bits long or 15 bits
    long, depending on the overhead of indicating them, or 17 bits long or
    even longer if some additional bits are provided in the header.


    Some headers provide for variable-length instructions. Those headers,
    instead of just being either 32 bits or 64 bits long, can also be
    48 bits long.

    Please provide an example of an assembly sequence, how this
    woult look like.

    The feature I mentioned in the post to which you are replying,
    for avoiding NOP issues, has to do with data, specifically immediate
    values in instructions, rather than the instructions themselves.

    Data in the instruction stream have to be considered part
    of the instruction for any reasonable purposes.

    I'm afraid I do not understand in the absence of an example.
    Could you code up an example of a simple function that uses
    these features?

    In order to keep length decoding of instructions simple - or allow code
    that consists only of 32-bit instructions - but allow immediate values
    of any length, what I had done was come up with a scheme that does the following:

    In a register-to-register operate instruction, a source register specification can be replaced with a 5-bit pointer to data; it points
    to one of the bytes in a 256-bit block of instructions.

    This feature requires a header for use; in the simplest case, a header
    may contain a 3-bit field that specifies how many 32-bit instruction words are to be left unused at the end of a block.

    So I was avoiding NOP issues by saving old instruction blocks, and allowing
    a pointer to a "pseudo-immediate" to point into a saved block instead of
    the current block. Yes, that will require flexibility on the part of the compiler; more work on its part, and it will be tricky to code. I think
    it's possible in the current state of the art, though.

    OMG.

    The current state of the art is very much different from what you
    describe.


    You don't have a compiler (obviously), but have you looked at actual
    code generated by compilers (for example on godbolt, which is very good)
    and checked which instruction sequence would fit which header?

    Not yet, I have to admit.

    This is an essential part of defining an architecture...
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Savard@quadibloc@invalid.invalid to comp.arch on Tue Jul 29 22:03:39 2025
    From Newsgroup: comp.arch

    On Tue, 29 Jul 2025 05:12:10 +0000, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    Some headers provide for variable-length instructions. Those headers,
    instead of just being either 32 bits or 64 bits long, can also be 48
    bits long.

    Please provide an example of an assembly sequence, how this woult look
    like.

    It's not clear to me what you would expect to see.

    In Assembly, there would just be a directive telling the assembler
    what instruction set to expect,

    ISET 2

    followed by a program that would hardly look any different from,
    say, System/360 assembler.

    The feature I mentioned in the post to which you are replying,
    for avoiding NOP issues, has to do with data, specifically immediate
    values in instructions, rather than the instructions themselves.

    Data in the instruction stream have to be considered part of the
    instruction for any reasonable purposes.

    I'm afraid I do not understand in the absence of an example. Could you
    code up an example of a simple function that uses these features?

    In assembler, pseudo-immediates just look like normal immediates. Normally,
    the programmer is not required to keep track of where blocks begin and
    end.

    However, I'll try to provide an example in a different way.

    Normally, a block with a pseudo-immediate looks like this:

    (2) I I I#2 I I M2

    (2) is a header that says "reserve the last two 32-bit instruction
    slots for immediate values or stuff".

    I is an instruction.

    I#2 is an instruction that uses a 64-bit immediate value.

    M2 is the immediate value which is two instruction slots or
    64 bits long.

    Now, suppose we're at the start of an instruction block, and we have
    15 ordinary instructions coming, followed by one that takes a
    32-bit immediate value.

    Then

    I I I I I I I I
    (1) I I I I I I I

    doesn't work, the I#1 that needs the 32-bit immediate for which
    space is being reserved is in the next block after.

    But if we do this

    I I I I I I I I
    (1) I I I I I I M1
    I I#1^ I I I I I I

    where I#1& is an instruction that uses a single instruction slot pseudo-immediate, but as the ^ indicates, from the previous block,
    then it all goes together with no wasted space.

    John Savard
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Savard@quadibloc@invalid.invalid to comp.arch on Tue Jul 29 22:22:26 2025
    From Newsgroup: comp.arch

    On Tue, 29 Jul 2025 22:03:39 +0000, John Savard wrote:

    My examples didn't reproduce correctly.

    Then


    I I I I I I I I

    (1) I I I I I I I

    doesn't work, the I#1 that needs the 32-bit immediate for which space is being reserved is in the next block after.

    But if we do this

    I I I I I I I I

    (1) I I I I I I M1

    I I#1^ I I I I I I

    where I#1& is an instruction that uses a single instruction slot pseudo-immediate, but as the ^ indicates, from the previous block,
    then it all goes together with no wasted space.

    was what I meant.

    If we use I. to stand for an instruction that is a branch target,
    then a case where a NOP is still needed can be shown. Let's say
    in the example above, the instruction immediately preceding the one
    with the 32-bit immediate is a branch target.

    Then the previous block which contains the value for the
    pseudo-immediate isn't guaranteed to be read, so we have to do this
    instead:

    I I I I I I I I

    () I I I I I I I
    .
    (1) I#1 I I I I I M1

    with () being a 32-bit do-nothing header, that being cheaper than
    a NOP.

    John Savard
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.arch on Tue Jul 29 23:20:45 2025
    From Newsgroup: comp.arch

    John Savard <quadibloc@invalid.invalid> wrote:
    On Tue, 29 Jul 2025 05:12:10 +0000, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    Some headers provide for variable-length instructions. Those headers,
    instead of just being either 32 bits or 64 bits long, can also be 48
    bits long.

    Please provide an example of an assembly sequence, how this woult look
    like.

    It's not clear to me what you would expect to see.

    In Assembly, there would just be a directive telling the assembler
    what instruction set to expect,

    ISET 2

    followed by a program that would hardly look any different from,
    say, System/360 assembler.

    The feature I mentioned in the post to which you are replying,
    for avoiding NOP issues, has to do with data, specifically immediate
    values in instructions, rather than the instructions themselves.

    Data in the instruction stream have to be considered part of the
    instruction for any reasonable purposes.

    I'm afraid I do not understand in the absence of an example. Could you
    code up an example of a simple function that uses these features?

    In assembler, pseudo-immediates just look like normal immediates. Normally, the programmer is not required to keep track of where blocks begin and
    end.

    However, I'll try to provide an example in a different way.

    Normally, a block with a pseudo-immediate looks like this:

    (2) I I I#2 I I M2

    (2) is a header that says "reserve the last two 32-bit instruction
    slots for immediate values or stuff".

    I is an instruction.

    I#2 is an instruction that uses a 64-bit immediate value.

    M2 is the immediate value which is two instruction slots or
    64 bits long.

    Do you only allow 32-byte aligned branch targets? Otherwise what
    happens when somebody jumps into middle of a block?
    --
    Waldek Hebisch
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Savard@quadibloc@invalid.invalid to comp.arch on Wed Jul 30 03:11:45 2025
    From Newsgroup: comp.arch

    On Tue, 29 Jul 2025 23:20:45 +0000, Waldek Hebisch wrote:

    Do you only allow 32-byte aligned branch targets? Otherwise what
    happens when somebody jumps into middle of a block?

    Branches are allowed to any 16-bit aligned location.

    Whenever a branch is taken, however, the whole block in which the
    target is located is fetched, in order that the instructions in the
    block can be correctly decoded.

    John Savard

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Wed Jul 30 05:02:11 2025
    From Newsgroup: comp.arch

    John Savard <quadibloc@invalid.invalid> schrieb:
    On Tue, 29 Jul 2025 05:12:10 +0000, Thomas Koenig wrote:
    John Savard <quadibloc@invalid.invalid> schrieb:

    Some headers provide for variable-length instructions. Those headers,
    instead of just being either 32 bits or 64 bits long, can also be 48
    bits long.

    Please provide an example of an assembly sequence, how this woult look
    like.

    It's not clear to me what you would expect to see.

    I would like to see an assembly sequence which efficiently uses
    your non-32-bit and non-16-bit encodings in a block, including
    all necssary NOPs.

    And I mean an _actual_ assembly sequence. Pick some well-known
    algorithm and implement it, then see if it will fit your block
    structure.

    In Assembly, there would just be a directive telling the assembler
    what instruction set to expect,

    ISET 2

    You (the compiler) have to take care of block limits, surely?
    Or is the assembler supposed to figure out something that fits?
    Then, a single instruction emitted which requires an instruction
    size that is not supported by that block will cause the block size
    to change. I assume that, if any of your 15-bit (or whatever)
    instructions do not work for block size reasons, you have
    different fallback instructions.

    By the way, your scheme of encoding constants in different blocks
    is going to land whoever writes an assembler for you in fixup
    and relocation hell.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From John Savard@quadibloc@invalid.invalid to comp.arch on Wed Jul 30 09:34:13 2025
    From Newsgroup: comp.arch

    On Wed, 30 Jul 2025 03:11:45 +0000, John Savard wrote:

    Whenever a branch is taken, however, the whole block in which the target
    is located is fetched, in order that the instructions in the block can
    be correctly decoded.

    I hadn't been thinking much in terms of implementations with a memory bus
    width of less than 256 bits. In that case, one doesn't want to fetch
    more than is needed from memory.

    Actually, one has to at least fetch the first 32 bits of the block, to see
    if they're an instruction or (part of) a header. If it's a header, the
    whole header or group of headers at the front needs to be fetched and processed.

    After that, only the instructioon branched to needs to be fetched, not the
    ones before it, since the header gives all the information needed to decode
    all the instructions in the block in parallel.

    John Savard
    --- Synchronet 3.21a-Linux NewsLink 1.2