Forum: War Ensemble BBS

What I did on my summer vacation

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Thu Aug 21 20:49:51 2025

From Newsgroup: comp.arch

Greetings everyone !

Since Google closed down comp.ach on google groups, I had been using
Real World Technologies as a portal. About 8 weeks ago it crashed for
the first time, then a couple weeks later it crashed a second time,
apparently terminally, or Dave Kanter's interest has waned ...

With help from Terje and SFuld, we have located this portal, and this
is my first attempt at posting here.

Anyone familiar with my comp.arch record over the years, understands
that I participate "a lot"; probably more than it good for my interests,
but it is energy I seem to have on a continuous basis. My unanticipated
down time gave my energy a time to work on stuff that I had been neglect-
ing for quite some time--that is the non-ISA parts of my architecture.
{{I should probably learn something for this down time and my productivity}}

My 66000 ISA is in "pretty good shape" having almost no changes over
the last 6 months, with only specification clarifications. So it was time
to work on the non-ISA parts.
----------------------------
First up was/is the System Binary Interface: which for the most part is
"just like" Application Binary Interaface, except that it uses supervisor
call SVC and supervisor return SVR instead of CALL, CALX, and RET. This
method gives uniform across the 4 privilege levels {HyperVisor, Host OS,
Guest OS, and Application}.

I decided to integrate exception, check, and interrupts with SVC since
they all involve a privilege transfer of control, and a dispatcher.
Instead of having something like <Interrupt> vector table, I decided,
under EricP's tutelage, to use a software dispatcher. This allows the
vector table to be positioned anywhere in memory, on any boundary, and
be of any size; such that each Guest OS, Host OS, HyperVisor can have
their own table organized anyway they please. Control arrives with a
re-entrant Thread State and register file at the dispatcher with R0
holding "why" and enough temporary registers for dispatch to perform
its duties immediately.

{It ends up that even Linux "signals" can use this means with very
slight modification to the software dispatcher--it merely has to
be cognizant that signal privilege == thread waiting privilege and
thus "save the non-preserved registers".}

Dispatch extracts R0<38:32>, compares this to the size of the table,
and if it is within the table, CALX's the entry point in the table.
This performs an ABI control transfer to the required "handler".
Upon return, Dispatcher performs SVR to return control whence it came.
The normal path through Dispatcher is 7 instructions.

In My 66000 Architecture, SVR also checks pending interrupts of higher
priority than where SVR is going; thus, softIRQ's are popped off the
deferred call list and processed before control is delivered to lower
priority levels.
----------------------------
Next up was the System Programming model: I modeled Chip Resources after
PCIe Peripherals. {{I had to use the term Peripheral, because with SR-IOV
and MR-IOV; with physical Functions, virtual Functions, and base Functions
and Bus; Device. Function being turned into a routing code--none of those
terms made sense and required to many words to describe. So, I use the term Peripheral as anything that performs an I/O service on behalf of system.}}

My 66000 uses nested paging with Application and Guest OS using Level-1 translation while Host OS and HyperVisor using Level-2 translation.

My 66000 translation projects a 64-bit virtual address space into a
66-bit universal address space with {DRAM, Configuration, MM I/O, and
ROM} spaces.

Since My 66000 comes out of reset with the MMU turned on. Boot software assesses virtual Configuration space, which is mapped to {Chip, DRAM,
and PCIe} configuration spaces. Resources are identified by Type 0
PCIe Configuration headers, and programmed the "obvious" way (later)
assigning a page of MM I/O address space to/for each Resource.

Chip Configuration headers have the Built-In Self-Test BIST control
port. Chip-resources use BIST to clear and initialize the internal
stores for normal operation. Prior to writing to BIST these resources
can be read using the diagnostic port and dumped as desired. BIST is
assumed to "take some time" so BOOT SW might cause most Chip resources
to BIST while it goes about getting DRAM up and running.

In all cases:: Control Registers exist--it is only whether SW can access
them that is in question. A control registers that does not exist, reads
as 0 and discards any write, while a control register that does exist
absorbs the write, and returns the last write or the last HW update. Configuration control registers are accessible in <physical> configuration space, The BAR registers in particular are used to assign MM I/O addresses
to the rest of the control registers no addressable in configuration space.

Chip resources {Cores, on-Die Interconnect, {L3, DRAM}, {HostBridge,
I/O MMU, PCIe Segmenter}} have the first 32 DoubleWords of the
assigned MM I/O space defined as a "file" containing R0..R31. In all
cases:
R0 contains the Voltage and Frequency control terms of the resource,
R1..R27 contains any general purpose control registers of resource.
R28..R30 contains the debug port,
R31 contains the Performance Counter port.
The remaining 480 DoubleWords are defined by the resource itself
(or not).

Because My 66000 ISA has memory instructions that "touch" multiple
memory locations, these instructions take on special significance
when using the debug and performance counter ports. Single memory
instructions access the control registers themselves, while multi-
memory instructions access "through" the port to the registers
the port controls.

For example: each resource has 8 performance counters and 1 control
register (R31) governing that port.
a STB Rd,[R31] writes a selection into the PC selectors
a STD Rd,[R31] writes 8 selections into the PC selectors
a LDB Rd,[R31] reads a selection from a PC selectors
a LDD Rd,[R31] reads 8 selections from the PC selectors
while:
a LDM Rd,Rd+7,[R31] reads 8 Performance Counters,
a STM Rd,Rd+7,[R31] writes 8 Performance Counters,
a MS #0,[R31],#64 clears 8 Performance Counters.

The Diagnostic port provides access to storage within the resource.
R28 is roughly the "address" control register
R29 is roughly the "data" control register
R30 is roughly the "other" control register
For a Core; one can access the following components from this port:
ICache Tag
ICache Data
ICache TLB
DCache Tag
DCache Data
DCache TLB
Level-1 Miss Buffer
L2Cache Tag
L2Cache Data
L2Cache TLB
L2Cache MMU
Level-2 Miss Buffer

Accesses through this port come in single-memory and multi-memory
flavors. Accessing these control registers as single memory actions
allows raw access to the data and associated ECC. Reads tell you
what HW has stored, writes allow SW to write "bad" ECC, should it
so choose. Multi-memory accesses allow SW to read or write cache
line sized chunks. The Core tags are configured so that every line
has a state where this line neither hits nor participates in set
allocation (when a line needs allocated on miss or replacement.)
So, a single bad line in a 16KB cache 4-way set looses 64-bytes
and one line becomes 3-way set associative.
----------------------------
By using the fact that cores come out of reset with MMU turned on,
and BOOT ROM supplying the translation tables, I was able to achieve
that all resources come out of reset with all control register flip-
flops = 0, except for Core[0].Hypervisor_Context.v = 1.

Core[0] I$, D$, and L2$ come out of reset in the "allocated" state,
so Boot SW has a small amount of memory from which to find DRAM,
configure, initialize, tune the pin interface, and clear; so that
one can proceed to walk and configure the PCIe trees of peripherals. ----------------------------
Guest OS can configure its translation tables to emit {Configuration
and MM I/O} space accesses. Now that these are so easy to recognize:
Host OS and HyperVisor have the ability to translate Guest Physical {Configuration and MM I/O} accesses into Universal {Config or MM I/O}
accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral. All we really want is a) the "routing" code
of the physical counterpart of the virtual Function, and b) whether
the access is to be allowed (valid & present). Here, the routing code
contains the PCIe physical Segment, whether the access is physical
or virtual, and whether the routing code uses {Bus, Device, *},
{Bus, *, *} or {*, *, *}. The rest is PCIe transport engines.

Anyway: School is back in session !

Mitch
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Fri Aug 22 01:20:40 2025

From Newsgroup: comp.arch

On Thu, 21 Aug 2025 20:49:51 GMT
MitchAlsup <user5857@newsgrouper.org.invalid> wrote:

Greetings everyone !

Since Google closed down comp.ach on google groups, I had been using
Real World Technologies as a portal. About 8 weeks ago it crashed for
the first time, then a couple weeks later it crashed a second time, apparently terminally, or Dave Kanter's interest has waned ...

Mitch, you are mixing Usenet portal with conventional Internet Forum.

RWT is a forum. It didn't crash in recent months. David Kanter didn't
lose interest. Not that he has a whole lot of interest, but that is
another story. He is interested enough to pay the bill for hosting and
that is the most important thing as far as participants are concerned.
I don't know why you stopped posting there. Would guess that you forgot
the address.
BTW, https://www.realworldtech.com/forum/?roomid=1

For Usenet you were using i2pn2.org server, probably via www.novabbs.com
web portal created with Rocksolid Light software. The server and portal
were maintained by Retro Guy (Thom). 2025-04-26 Thom passed away from pancreatic cancer. His Usenet server and his portal continued to work
without maintenance until late July. But eventually they stopped.
So it goes.

--- Synchronet 3.21a-Linux NewsLink 1.2

From John Savard@quadibloc@invalid.invalid to comp.arch on Thu Aug 21 23:15:52 2025

From Newsgroup: comp.arch

On Fri, 22 Aug 2025 01:20:40 +0300, Michael S wrote:

For Usenet you were using i2pn2.org server, probably via www.novabbs.com
web portal created with Rocksolid Light software. The server and portal
were maintained by Retro Guy (Thom). 2025-04-26 Thom passed away from pancreatic cancer. His Usenet server and his portal continued to work
without maintenance until late July. But eventually they stopped.
So it goes.

I knew that novaBBS recently stopped working, but I had no idea of the
tragic reason why this was the case. Thank you.

John Savard

--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Aug 22 01:39:38 2025

From Newsgroup: comp.arch

On 8/21/2025 3:49 PM, MitchAlsup wrote:

Greetings everyone !

Since Google closed down comp.ach on google groups, I had been using
Real World Technologies as a portal. About 8 weeks ago it crashed for
the first time, then a couple weeks later it crashed a second time, apparently terminally, or Dave Kanter's interest has waned ...

With help from Terje and SFuld, we have located this portal, and this
is my first attempt at posting here.

Anyone familiar with my comp.arch record over the years, understands
that I participate "a lot"; probably more than it good for my interests,
but it is energy I seem to have on a continuous basis. My unanticipated
down time gave my energy a time to work on stuff that I had been neglect-
ing for quite some time--that is the non-ISA parts of my architecture.
{{I should probably learn something for this down time and my productivity}}

My 66000 ISA is in "pretty good shape" having almost no changes over
the last 6 months, with only specification clarifications. So it was time
to work on the non-ISA parts.

Good that you are back.

For a while I had mostly been using 'news.eternal-september.org' via Thunderbird...

Had once been using a different usenet server ('albasani'), but it
seemingly stopped working around 5 years ago. Seems like the server
still exists, but I don't see any messages more recent than 2020.

I haven't have all that many recent core ISA changes either.

There was BITMOV a few months ago.
Then an instruction for packed FP8 vectors.
Had looked at adding a Binary16 Horizontal Add instr,
but too expensive for now.

Found and fixed a few bugs mostly related to RISC-V and RV-C.

BGBCC now supports RV-C.
Determined that RV64GC + Jumbo Extensions is fairly competitive in terms
of code density.
XG3 is still getting worse code density than its predecessors.

Where, RV64GC+Jx has instruction sizes: 16/32/64
With optional additional sizes: 48 and 96.
LI Imm32, Rn5 //48-bit
LI Imm64, Rn //96-bit

For now (apart from LI and SHORI) not much else for 48-bit encodings;
and would need to choose between Huawei, Qualcomm, or custom encodings
for the rest (the 48-bit encoding space, didn't exactly go very far with
this stuff...). My encoding scheme worked different in that it basically shoehorned a subset of the 64-bit jumbo encodings into the 48-bit space, mostly encoding Imm24/Disp24 ops; and optionally synthesizing Imm17s
forms of some 3R instructions. It is ugly, but does well for code density.

I did experiment with a pair encoding for XG3, where a pair of "compact" instructions could be encoded into a 32-bit instruction word. Gains were
very modest, not enough to change the ranking or to make a strong reason
to have it.

Despite being weaker on the code density front, it seems to do well for performance.
XG2 and XG3 fight for the top in terms of speed.
XG1 and RV64GC+Jx seem to be the top 2 for code density.

But both XG1 and RV64GC are worse for performance as my CPU core doesn't
favor 16-bit ops. Note that Jx does offer a notable code density and performance improvement over RV64GC by itself.

For XG2 vs XG3 performance, there seems to be a discrepancy between
emulator and CPU core which is faster. Emulator says XG2, CPU core says XG3.

Otherwise:
Was messing around a little bit in the CPU core trying to get the "use
traps to emulate proper IEEE-754" support working.

Partly debating formally moving FPSR from GBR[63:48] to SP[63:48].

The current location (in GBR) results in FPSR getting stomped by GBR
reloads; which also isn't ideal. Though, if I did move it, might still
want some way to keep it being saved along with GP/GBR in the ABI
(though, debatable as to whether it is better to have the FENV_ACCESS
stuff as global or dynamically bound, traditionally global is assumed,
with dynamic scoped FENV as a bit of an oddity, even if it makes more
sense IMHO).

Well, or give it its own CR, but then would still need to add additional
logic to save/restore it (which is a non-zero added cost).

Though, information online is a bit ambiguous as to the expected scoping behavior of FENV_ACCESS (with some stuff implying that the scoping
behavior depends on where the pragma is used).

Relocating it to the HOB's of SP would make sense if I want to assume
that it is global by default (within a given thread). Which is,
possibly, sane; and code doesn't usually stomp SP. Would need to do
something extra if I want dynamically scoping rather than global (but,
as-is; it isn't terribly useful if it typically just gets stomped anyways).

Well, for now, I have gone and added it to a flag to my Verilog code (to
await a more final decision). Relatively little impact on the Verilog
code either way (and doesn't appear to have much effect on resource cost).

Note that RISC-V accesses it via a CSR (and with the bits in a different order), but this is orthogonal.

Yes, I am well aware that this sort of wonk is kinda ugly...

There is always a non-zero risk that code will notice or care about this
stuff (say, computes a difference between two stack-derived pointers and notices that they are wildly far apart because FPU flags differ).

Arguably a greater risk for RISC-V though which uses plain ALU ops;
though one possible workaround could be making the HOBs of SP (and GP)
always read as 0 in RV mode.

Otherwise, had also been working slightly on BGBCC support for my very informal FP-SIMD extension for RISC-V (there may be reason to use it;
beyond its existence as an implementation quirk).

Will need to deal with a few things to make things play well with
supporting IEEE semantics. Like, ideally, "FADD.S" should support doing IEEE-754 properly. But, the logic for this currently only exists in the
main FPU, which (ideally) means needing some way to know that it is a
scalar FPU operation at decode time.

Can note that the DYN rounding mode is currently scalar only, which,
along with the flags updates means, possibly:
RNE/RTZ: FADD.S still gives non-IEEE behavior (FPSR still ignored);
DYN: FADD.S goes through main FPU, gives IEEE behavior via traps if flag
is set;
...

Or, IOW, rounding mode in FADD.S/FMUL.S/... instructions:
000: RNE, SIMD Unit, implicitly DAZ/FTZ
001: RTZ, SIMD Unit, implicitly DAZ/FTZ
01z: Main FPU
100: Main FPU
101: SIMD 128-bit RTZ, DAZ/FTZ
110: SIMD 128-bit RNE, DAZ/FTZ
111: DYN, Main FPU

Where DYN is used if FENV_ACCESS is enabled in BGBCC; and DYN is the
default rounding mode in GCC. Assumed to be always scalar...

Can note that opt-it makes more sense as otherwise one would need to
have trap-handlers in place before one could safely use any
floating-point operations (vs the non-IEEE mode never traps).

...

----------------------------
First up was/is the System Binary Interface: which for the most part is
"just like" Application Binary Interaface, except that it uses supervisor call SVC and supervisor return SVR instead of CALL, CALX, and RET. This method gives uniform across the 4 privilege levels {HyperVisor, Host OS, Guest OS, and Application}.

I decided to integrate exception, check, and interrupts with SVC since
they all involve a privilege transfer of control, and a dispatcher.
Instead of having something like <Interrupt> vector table, I decided,
under EricP's tutelage, to use a software dispatcher. This allows the
vector table to be positioned anywhere in memory, on any boundary, and
be of any size; such that each Guest OS, Host OS, HyperVisor can have
their own table organized anyway they please. Control arrives with a re-entrant Thread State and register file at the dispatcher with R0
holding "why" and enough temporary registers for dispatch to perform
its duties immediately.

Hmm.

Admittedly still seems less complicated (from a hardware design POV) to
just do everything with software traps; even if not the best option for performance.

{It ends up that even Linux "signals" can use this means with very
slight modification to the software dispatcher--it merely has to
be cognizant that signal privilege == thread waiting privilege and
thus "save the non-preserved registers".}

Dispatch extracts R0<38:32>, compares this to the size of the table,
and if it is within the table, CALX's the entry point in the table.
This performs an ABI control transfer to the required "handler".
Upon return, Dispatcher performs SVR to return control whence it came.
The normal path through Dispatcher is 7 instructions.

In My 66000 Architecture, SVR also checks pending interrupts of higher priority than where SVR is going; thus, softIRQ's are popped off the
deferred call list and processed before control is delivered to lower priority levels.
----------------------------
Next up was the System Programming model: I modeled Chip Resources after
PCIe Peripherals. {{I had to use the term Peripheral, because with SR-IOV
and MR-IOV; with physical Functions, virtual Functions, and base Functions and Bus; Device. Function being turned into a routing code--none of those terms made sense and required to many words to describe. So, I use the term Peripheral as anything that performs an I/O service on behalf of system.}}

My 66000 uses nested paging with Application and Guest OS using Level-1 translation while Host OS and HyperVisor using Level-2 translation.

My 66000 translation projects a 64-bit virtual address space into a
66-bit universal address space with {DRAM, Configuration, MM I/O, and
ROM} spaces.

Since My 66000 comes out of reset with the MMU turned on. Boot software assesses virtual Configuration space, which is mapped to {Chip, DRAM,
and PCIe} configuration spaces. Resources are identified by Type 0
PCIe Configuration headers, and programmed the "obvious" way (later) assigning a page of MM I/O address space to/for each Resource.

Chip Configuration headers have the Built-In Self-Test BIST control
port. Chip-resources use BIST to clear and initialize the internal
stores for normal operation. Prior to writing to BIST these resources
can be read using the diagnostic port and dumped as desired. BIST is
assumed to "take some time" so BOOT SW might cause most Chip resources
to BIST while it goes about getting DRAM up and running.

In all cases:: Control Registers exist--it is only whether SW can access
them that is in question. A control registers that does not exist, reads
as 0 and discards any write, while a control register that does exist
absorbs the write, and returns the last write or the last HW update. Configuration control registers are accessible in <physical> configuration space, The BAR registers in particular are used to assign MM I/O addresses
to the rest of the control registers no addressable in configuration space.

Chip resources {Cores, on-Die Interconnect, {L3, DRAM}, {HostBridge,
I/O MMU, PCIe Segmenter}} have the first 32 DoubleWords of the
assigned MM I/O space defined as a "file" containing R0..R31. In all
cases:
R0 contains the Voltage and Frequency control terms of the resource,
R1..R27 contains any general purpose control registers of resource.
R28..R30 contains the debug port,
R31 contains the Performance Counter port.
The remaining 480 DoubleWords are defined by the resource itself
(or not).

Because My 66000 ISA has memory instructions that "touch" multiple
memory locations, these instructions take on special significance
when using the debug and performance counter ports. Single memory instructions access the control registers themselves, while multi-
memory instructions access "through" the port to the registers
the port controls.

For example: each resource has 8 performance counters and 1 control
register (R31) governing that port.
a STB Rd,[R31] writes a selection into the PC selectors
a STD Rd,[R31] writes 8 selections into the PC selectors
a LDB Rd,[R31] reads a selection from a PC selectors
a LDD Rd,[R31] reads 8 selections from the PC selectors
while:
a LDM Rd,Rd+7,[R31] reads 8 Performance Counters,
a STM Rd,Rd+7,[R31] writes 8 Performance Counters,
a MS #0,[R31],#64 clears 8 Performance Counters.

The Diagnostic port provides access to storage within the resource.
R28 is roughly the "address" control register
R29 is roughly the "data" control register
R30 is roughly the "other" control register
For a Core; one can access the following components from this port:
ICache Tag
ICache Data
ICache TLB
DCache Tag
DCache Data
DCache TLB
Level-1 Miss Buffer
L2Cache Tag
L2Cache Data
L2Cache TLB
L2Cache MMU
Level-2 Miss Buffer

Accesses through this port come in single-memory and multi-memory
flavors. Accessing these control registers as single memory actions
allows raw access to the data and associated ECC. Reads tell you
what HW has stored, writes allow SW to write "bad" ECC, should it
so choose. Multi-memory accesses allow SW to read or write cache
line sized chunks. The Core tags are configured so that every line
has a state where this line neither hits nor participates in set
allocation (when a line needs allocated on miss or replacement.)
So, a single bad line in a 16KB cache 4-way set looses 64-bytes
and one line becomes 3-way set associative.
----------------------------
By using the fact that cores come out of reset with MMU turned on,
and BOOT ROM supplying the translation tables, I was able to achieve
that all resources come out of reset with all control register flip-
flops = 0, except for Core[0].Hypervisor_Context.v = 1.

Core[0] I$, D$, and L2$ come out of reset in the "allocated" state,
so Boot SW has a small amount of memory from which to find DRAM,
configure, initialize, tune the pin interface, and clear; so that
one can proceed to walk and configure the PCIe trees of peripherals. ----------------------------
Guest OS can configure its translation tables to emit {Configuration
and MM I/O} space accesses. Now that these are so easy to recognize:
Host OS and HyperVisor have the ability to translate Guest Physical {Configuration and MM I/O} accesses into Universal {Config or MM I/O} accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral. All we really want is a) the "routing" code
of the physical counterpart of the virtual Function, and b) whether
the access is to be allowed (valid & present). Here, the routing code contains the PCIe physical Segment, whether the access is physical
or virtual, and whether the routing code uses {Bus, Device, *},
{Bus, *, *} or {*, *, *}. The rest is PCIe transport engines.

Anyway: School is back in session !

Mitch

--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Aug 22 14:57:56 2025

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

Greetings everyone !

Chip resources {Cores, on-Die Interconnect, {L3, DRAM}, {HostBridge,
I/O MMU, PCIe Segmenter}} have the first 32 DoubleWords of the
assigned MM I/O space defined as a "file" containing R0..R31. In all
cases:
R0 contains the Voltage and Frequency control terms of the resource,
R1..R27 contains any general purpose control registers of resource.
R28..R30 contains the debug port,
R31 contains the Performance Counter port.
The remaining 480 DoubleWords are defined by the resource itself
(or not).

I'd allow for regions larger than 4096 bytes. It's not uncommmon
for specialized on-board DMA engines to require 20 bits of
address space to define the complete set of device resources,
even for on-chip devices (A DMA engine may support a large number
of "ring" structures, for example, and one might group the
ring configuration registers into 4k regions (so they can be assigned
to a guest in a SRIOV-type device)).

I've seen devices with dozens of performance registers (both
direct-access and indirect-access).

Because My 66000 ISA has memory instructions that "touch" multiple
memory locations, these instructions take on special significance
when using the debug and performance counter ports. Single memory >instructions access the control registers themselves, while multi-
memory instructions access "through" the port to the registers
the port controls.

That level of indirection may cause difficulties when virtualizing
a device.

For example: each resource has 8 performance counters and 1 control
register (R31) governing that port.
a STB Rd,[R31] writes a selection into the PC selectors
a STD Rd,[R31] writes 8 selections into the PC selectors
a LDB Rd,[R31] reads a selection from a PC selectors
a LDD Rd,[R31] reads 8 selections from the PC selectors
while:
a LDM Rd,Rd+7,[R31] reads 8 Performance Counters,
a STM Rd,Rd+7,[R31] writes 8 Performance Counters,
a MS #0,[R31],#64 clears 8 Performance Counters.

The Diagnostic port provides access to storage within the resource.
R28 is roughly the "address" control register
R29 is roughly the "data" control register
R30 is roughly the "other" control register
For a Core; one can access the following components from this port:
ICache Tag
ICache Data
ICache TLB
DCache Tag
DCache Data
DCache TLB
Level-1 Miss Buffer
L2Cache Tag
L2Cache Data
L2Cache TLB
L2Cache MMU
Level-2 Miss Buffer

Accesses through this port come in single-memory and multi-memory
flavors. Accessing these control registers as single memory actions
allows raw access to the data and associated ECC. Reads tell you
what HW has stored, writes allow SW to write "bad" ECC, should it
so choose. Multi-memory accesses allow SW to read or write cache
line sized chunks. The Core tags are configured so that every line
has a state where this line neither hits nor participates in set
allocation (when a line needs allocated on miss or replacement.)
So, a single bad line in a 16KB cache 4-way set looses 64-bytes
and one line becomes 3-way set associative.
----------------------------

The KISS principle applies.

By using the fact that cores come out of reset with MMU turned on,
and BOOT ROM supplying the translation tables, I was able to achieve
that all resources come out of reset with all control register flip-
flops = 0, except for Core[0].Hypervisor_Context.v = 1.

Where is the ROM? Modern SoCs have an on-board ROM, which
cannot be changed without a re-spin and new tapeout. That
ROM needs to be rock-solid and provide just enough capability
to securely load a trusted blob from a programmable device
(e.g. SPI flash device).

I'm really leary about the idea of starting with MMU enabled,
I don't see any advantage to doing that.

Core[0] I$, D$, and L2$ come out of reset in the "allocated" state,
so Boot SW has a small amount of memory from which to find DRAM,
configure, initialize, tune the pin interface, and clear; so that
one can proceed to walk and configure the PCIe trees of peripherals.

You don't need to configure peripherals before DRAM is initialized
(other than the DRAM controller itself). All other peripheral
initialization should be done in loadable firmware or a secure
monitor, hypervisor or bare-metal kernel.

----------------------------
Guest OS can configure its translation tables to emit {Configuration
and MM I/O} space accesses. Now that these are so easy to recognize:

Security. Guest OS should only be able to access resources
granted to it by the HV.

Host OS and HyperVisor have the ability to translate Guest Physical >{Configuration and MM I/O} accesses into Universal {Config or MM I/O} >accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral.

This seems unnecessarily complicated. Every SR-IOV capable device
is different and aside the standard PCIe defined configuration space
registers, everything else is device-specific.

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Aug 22 16:17:24 2025

From Newsgroup: comp.arch

scott@slp53.sl.home (Scott Lurndal) posted:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

Greetings everyone !

Chip resources {Cores, on-Die Interconnect, {L3, DRAM}, {HostBridge,
I/O MMU, PCIe Segmenter}} have the first 32 DoubleWords of the
assigned MM I/O space defined as a "file" containing R0..R31. In all
cases:
R0 contains the Voltage and Frequency control terms of the resource, >R1..R27 contains any general purpose control registers of resource. >R28..R30 contains the debug port,
R31 contains the Performance Counter port.
The remaining 480 DoubleWords are defined by the resource itself
(or not).

I'd allow for regions larger than 4096 bytes. It's not uncommmon
for specialized on-board DMA engines to require 20 bits of
address space to define the complete set of device resources,
even for on-chip devices (A DMA engine may support a large number
of "ring" structures, for example, and one might group the
ring configuration registers into 4k regions (so they can be assigned
to a guest in a SRIOV-type device)).

This is a fair point, but none of my current on-Die need more than 6 DWs,
So allocating a 4096 byte address space to them seems generous--remember
these are NOT PCIe peripherals, but resources in a My 66000 implementation. They all use std PCIe confi headers so a STD #-1 to BAR{01} and a LDD
tells how big the allocation should be, so even tho the spaces are sparse
they still follow PCIe Config conventions.

I've seen devices with dozens of performance registers (both
direct-access and indirect-access).

Because My 66000 ISA has memory instructions that "touch" multiple
memory locations, these instructions take on special significance
when using the debug and performance counter ports. Single memory >instructions access the control registers themselves, while multi-
memory instructions access "through" the port to the registers
the port controls.

That level of indirection may cause difficulties when virtualizing
a device.

These are on-Die resources not PCIe peripherals.

For example: each resource has 8 performance counters and 1 control >register (R31) governing that port.
a STB Rd,[R31] writes a selection into the PC selectors
a STD Rd,[R31] writes 8 selections into the PC selectors
a LDB Rd,[R31] reads a selection from a PC selectors
a LDD Rd,[R31] reads 8 selections from the PC selectors
while:
a LDM Rd,Rd+7,[R31] reads 8 Performance Counters,
a STM Rd,Rd+7,[R31] writes 8 Performance Counters,
a MS #0,[R31],#64 clears 8 Performance Counters.

The Diagnostic port provides access to storage within the resource.
R28 is roughly the "address" control register
R29 is roughly the "data" control register
R30 is roughly the "other" control register
For a Core; one can access the following components from this port:
ICache Tag
ICache Data
ICache TLB
DCache Tag
DCache Data
DCache TLB
Level-1 Miss Buffer
L2Cache Tag
L2Cache Data
L2Cache TLB
L2Cache MMU
Level-2 Miss Buffer

Accesses through this port come in single-memory and multi-memory
flavors. Accessing these control registers as single memory actions
allows raw access to the data and associated ECC. Reads tell you
what HW has stored, writes allow SW to write "bad" ECC, should it
so choose. Multi-memory accesses allow SW to read or write cache
line sized chunks. The Core tags are configured so that every line
has a state where this line neither hits nor participates in set
allocation (when a line needs allocated on miss or replacement.)
So, a single bad line in a 16KB cache 4-way set looses 64-bytes
and one line becomes 3-way set associative.
----------------------------

The KISS principle applies.

By using the fact that cores come out of reset with MMU turned on,
and BOOT ROM supplying the translation tables, I was able to achieve
that all resources come out of reset with all control register flip-
flops = 0, except for Core[0].Hypervisor_Context.v = 1.

Where is the ROM? Modern SoCs have an on-board ROM, which
cannot be changed without a re-spin and new tapeout. That
ROM needs to be rock-solid and provide just enough capability
to securely load a trusted blob from a programmable device
(e.g. SPI flash device).

ROM is external FLASH in the envisioned implementations.

I'm really leary about the idea of starting with MMU enabled,
I don't see any advantage to doing that.

Core[0] I$, D$, and L2$ come out of reset in the "allocated" state,
so Boot SW has a small amount of memory from which to find DRAM,
configure, initialize, tune the pin interface, and clear; so that
one can proceed to walk and configure the PCIe trees of peripherals.

You don't need to configure peripherals before DRAM is initialized
(other than the DRAM controller itself). All other peripheral initialization should be done in loadable firmware or a secure
monitor, hypervisor or bare-metal kernel.

Agreed, you can't use the peripherals until they have DRAM in which
to perform I/O, and send interrupts{Thus, acting normal}.

----------------------------
Guest OS can configure its translation tables to emit {Configuration
and MM I/O} space accesses. Now that these are so easy to recognize:

Security. Guest OS should only be able to access resources
granted to it by the HV.

Yes, Guest physical MM I/O Space is translated by Host MM I/O Translation tables. Real OS setup its translation tables to emit MM I/O Accesses, so
Guest OS should too, then that Guest Physical is considered Host virtual
and translated and protected again.

As far as I am concerned, Guest OS thinks it has 32 Devices each of which
have 8 Functions all on Bus 0... So, a Guest OS with fewer than 256 Fctns
sees only 1 BUS and can short circuit the virtual Config discovery.
These virtual Guest OS accesses, then, get redistributed to the
Segments and Busses on which the VFs actually reside by Level-2 SW.

Host OS and HyperVisor have the ability to translate Guest Physical >{Configuration and MM I/O} accesses into Universal {Config or MM I/O} >accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral.

This seems unnecessarily complicated.

So did IEEE 754 in 1982...

What I have done is to virtualize Config and MM I/O spaces, so Guest OS
does not even see that it is not Real OS running on bare metal--and doing
so without HV intervention on any of the Config or MM I/O accesses.

Every SR-IOV capable device
is different and aside the standard PCIe defined configuration space registers, everything else is device-specific.

Only requires 3 bits in the MM I/O PTE.
Only requires 1 bit in Config PTE, a bit that already had to be there.

--- Synchronet 3.21a-Linux NewsLink 1.2

From BGB@cr88192@gmail.com to comp.arch on Fri Aug 22 12:47:35 2025

From Newsgroup: comp.arch

On 8/22/2025 11:17 AM, MitchAlsup wrote:

scott@slp53.sl.home (Scott Lurndal) posted:

<snip>

Host OS and HyperVisor have the ability to translate Guest Physical
{Configuration and MM I/O} accesses into Universal {Config or MM I/O}
accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral.

This seems unnecessarily complicated.

So did IEEE 754 in 1982...

Still is...

Denormals, Inf/NaN, ... tend to accomplish relatively little in
practice; apart from making FPUs more expensive, often slower, and
requiring programmers to go through extra hoops to specify DAZ/FTZ in
cases where they need more performance.

Likewise, +/- 0.5 ULP, accomplishes little beyond adding cost; whereas
+/- 0.63 ULP would be a lot cheaper, and accomplishes nearly the same
effect.

Well, apart from the seeming failure of being unable to fully converge
the last few bits of N-R, which seems to depend primarily on sub-ULP bits.

But, there is a tradeoff:
Doing a faster FPU which uses trap-and-emulate.

Still isn't free, as detecting cases tat will require trap-and-emulate
still has a higher cost than merely not bothering in the first place
(and now requires trickery of routing FPSR bits into the instruction
decoder depending on whether they need to be routed in a way that will
allow the FPU to detect violations of IEEE semantics).

And finding some other issues in the process, ...

...

What I have done is to virtualize Config and MM I/O spaces, so Guest OS
does not even see that it is not Real OS running on bare metal--and doing
so without HV intervention on any of the Config or MM I/O accesses.

Still seems unnecessarily complicated.

Could be like:
Machine/ISR Mode: Bare metal, no MMU.
Supervisor Mode: Full Access, MMU.
User: Limited Access, MMU

VM Guest OS then runs in User Mode. and generates a fault whenever a privileged operation is encountered. The VM can then fake the rest of
the system in software...

And/Or: Ye Olde Interpreter or JIT compiler (sorta like DOSBox and similar).

Nested Translation? Fake it in software.
Unlike real VMs, SW address translation can more easily scale to N
levels of VM, even if this also means N levels of slow...

Every SR-IOV capable device
is different and aside the standard PCIe defined configuration space
registers, everything else is device-specific.

Only requires 3 bits in the MM I/O PTE.
Only requires 1 bit in Config PTE, a bit that already had to be there.

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Aug 22 18:58:59 2025

From Newsgroup: comp.arch

BGB <cr88192@gmail.com> posted:

On 8/22/2025 11:17 AM, MitchAlsup wrote:

scott@slp53.sl.home (Scott Lurndal) posted:

<snip>

Host OS and HyperVisor have the ability to translate Guest Physical
{Configuration and MM I/O} accesses into Universal {Config or MM I/O}
accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral.

This seems unnecessarily complicated.

So did IEEE 754 in 1982...

Still is...

Denormals, Inf/NaN, ... tend to accomplish relatively little in
practice; apart from making FPUs more expensive, often slower, and
requiring programmers to go through extra hoops to specify DAZ/FTZ in
cases where they need more performance.

You can (and have) argued this until the cows come home. But you cannot
deny that IEEE 754 is here to stay, and was produced by a democratic
process.

Likewise, +/- 0.5 ULP, accomplishes little beyond adding cost; whereas
+/- 0.63 ULP would be a lot cheaper, and accomplishes nearly the same effect.

I am only partially willing to buy that argument. I was able to get my Transcendentals down into the 0.502-0.505 range and the only major
difference is that FU also does integer multiplication.

I have also lived the messes of IBM FP and CDC FP. Luckily I missed
out on CRAY FP.

Well, apart from the seeming failure of being unable to fully converge
the last few bits of N-R, which seems to depend primarily on sub-ULP bits.

DIV and SQRT in Goldschmidt form require 57×57; in N-R form require 56×57
in order to get IEEE 754 accuracy.

J. M. Muler has a chapter where he investigates how many bits prior to
rounding are needed in order to achieve IEEE 754 accuracy. It turns out
that EXP requires 117-118-bits to achieve 0.5 ULP, and there are some
other nasty transcendentals. This requires something longer than 128-bit
IEEE FP in order to get properly round 64-bit FP transcendentals.

On the other hand, if the multiplier FU can perform integer ×, then one
can achieve 0.502-0.505 ULP with just the 64×64 ×, and have 3-4 cycle
integer multiply {you could say "for free", or you can say "since I×
is there, transcendental accuracy came for free"}

But, there is a tradeoff:
Doing a faster FPU which uses trap-and-emulate.

All of the big guys do full speed FMUL with IEEE accuracy. Only FPGA implementations have an argument to stop short.

Still isn't free, as detecting cases tat will require trap-and-emulate
still has a higher cost than merely not bothering in the first place
(and now requires trickery of routing FPSR bits into the instruction
decoder depending on whether they need to be routed in a way that will
allow the FPU to detect violations of IEEE semantics).

Oh, BTW, My 66000 transcendentals detect that the rounding might not
be to IEEE accuracy and have an Enable to trap and emulate the ~1/1273
that cannot be properly rounded. And, here, that capability is patented.

And finding some other issues in the process, ...

...

What I have done is to virtualize Config and MM I/O spaces, so Guest OS does not even see that it is not Real OS running on bare metal--and doing so without HV intervention on any of the Config or MM I/O accesses.

Still seems unnecessarily complicated.

The MMU does not even have a bit that allows it to be turned off.
Indeed, turning off the Root Pointer (validity) makes that privilege
level unable to function--so you could not even Boot.

The MMU has to work for CPU do anything reasonable, I just extend this all
the way into the exit from Reset.

Could be like:
Machine/ISR Mode: Bare metal, no MMU.
Supervisor Mode: Full Access, MMU.
User: Limited Access, MMU

If you don't trust Boot; how do you get to Secure Boot ?!?

But if you WANT to appear to have turned off the MMU, you can use a
single SuperPage PTE to map 8EB {or all of potential Flash ROM}

VM Guest OS then runs in User Mode. and generates a fault whenever a privileged operation is encountered. The VM can then fake the rest of
the system in software...

In My 66000, Config, MM I/O and ROM spaces are not part of the DRAM
address space. So, without an active MMU, Boot has nowhere to fetch instructions, no way to access Configuration space, and no way to
get started Booting the system.

Benefit: DRAM can be as big as 64-bits with no config or MM I/O
apertures.
Benefit: Can run Real OS as a Guest OS.
Complexity: MMU is never turned off. {I would call this a lessening of
complexity not an increase.}

And/Or: Ye Olde Interpreter or JIT compiler (sorta like DOSBox and similar).

Nested Translation? Fake it in software.

Performance sucks, and you end up having higher privilege SW have to look
at tables setup by lower privilege software. It is cleaner to run nested
paging from the exit of Reset.

Unlike real VMs, SW address translation can more easily scale to N
levels of VM, even if this also means N levels of slow...

That is what the Host OS privilege level is for.

Every SR-IOV capable device
is different and aside the standard PCIe defined configuration space
registers, everything else is device-specific.

Only requires 3 bits in the MM I/O PTE.
Only requires 1 bit in Config PTE, a bit that already had to be there.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Terje Mathisen@terje.mathisen@tmsw.no to comp.arch on Fri Aug 22 21:51:21 2025

From Newsgroup: comp.arch

BGB wrote:

On 8/22/2025 11:17 AM, MitchAlsup wrote:

scott@slp53.sl.home (Scott Lurndal) posted:

<snip>

Host OS and HyperVisor have the ability to translate Guest Physical
{Configuration and MM I/O} accesses into Universal {Config or MM I/O}
accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral.

This seems unnecessarily complicated.

So did IEEE 754 in 1982...

Still is...

Denormals, Inf/NaN, ... tend to accomplish relatively little in
practice; apart from making FPUs more expensive, often slower, and
requiring programmers to go through extra hoops to specify DAZ/FTZ in
cases where they need more performance.

Likewise, +/- 0.5 ULP, accomplishes little beyond adding cost; whereas
+/- 0.63 ULP would be a lot cheaper, and accomplishes nearly the same effect.

Well, apart from the seeming failure of being unable to fully converge
the last few bits of N-R, which seems to depend primarily on sub-ULP bits.

Having spent ~10 years (very much part time!) working on 754 standards I strongly believe you are wrong:

Yes, there are a few small issues, some related to grandfather clauses
that might go away at some point, but the zero/subnorm/normal/inf/nan
setup is not one of them.

Personally I think it would have been a huge win if the original
standard had defined inf/nan a different way:

What we have is Inf == Maximal exponent, all-zero mantissa, while all
other mantissa values indicates a NaN.

For binary FP it is totally up to the CPU vendor how to define Quiet NaN
vs Signalling NaN, most common seems to be to set the top bit in the
mantissa.

What we have been missing for 40 years now is a fourth category:

None (or Null/Missing)

This would have simplified all sorts of array/matrix sw where both
errors (NaN) and missing (None) items are possible.

The easiest way to implement it would also make the FPU hardware simpler:

The top two bits of the mantissa define

11 : SNaN
10 : QNaN
01 : None
00 : Inf

The rest of the mantissa bits could then carry any payload you want,
including optional debug info for Infinities.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
--- Synchronet 3.21a-Linux NewsLink 1.2

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.arch on Fri Aug 22 19:55:07 2025

From Newsgroup: comp.arch

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

scott@slp53.sl.home (Scott Lurndal) posted:

MitchAlsup <user5857@newsgrouper.org.invalid> writes:

Greetings everyone !

Because My 66000 ISA has memory instructions that "touch" multiple
memory locations, these instructions take on special significance
when using the debug and performance counter ports. Single memory
instructions access the control registers themselves, while multi-
memory instructions access "through" the port to the registers
the port controls.

That level of indirection may cause difficulties when virtualizing
a device.

These are on-Die resources not PCIe peripherals.

If it quacks like a duck - you're making them _look_
like PCIe peripherals (e.g. with a PCI/PCIe compatible
configuration space, BARs, MSI-X interrupts, etc.),
right?

By using the fact that cores come out of reset with MMU turned on,
and BOOT ROM supplying the translation tables, I was able to achieve
that all resources come out of reset with all control register flip-
flops = 0, except for Core[0].Hypervisor_Context.v = 1.

Where is the ROM? Modern SoCs have an on-board ROM, which
cannot be changed without a re-spin and new tapeout. That
ROM needs to be rock-solid and provide just enough capability
to securely load a trusted blob from a programmable device
(e.g. SPI flash device).

ROM is external FLASH in the envisioned implementations.

Which begs the question about how the flash controller
is initialized, if there is no ROM on-chip to do that;
and which flash controller is used - SPI, embedded MMC,
I2C/I3C - or do you envision something like the intel low-pin-count
devices that are directly exposed on the system address
so the processor can just start fetching instructions
directly from a custom flash controller like 8086?

Customers often wish to use specific technology in the
boot path, targeted to their use-case.

----------------------------
Guest OS can configure its translation tables to emit {Configuration
and MM I/O} space accesses. Now that these are so easy to recognize:

Security. Guest OS should only be able to access resources
granted to it by the HV.

Yes, Guest physical MM I/O Space is translated by Host MM I/O Translation >tables. Real OS setup its translation tables to emit MM I/O Accesses, so >Guest OS should too, then that Guest Physical is considered Host virtual
and translated and protected again.

As far as I am concerned, Guest OS thinks it has 32 Devices each of which >have 8 Functions all on Bus 0... So, a Guest OS with fewer than 256 Fctns >sees only 1 BUS and can short circuit the virtual Config discovery.
These virtual Guest OS accesses, then, get redistributed to the
Segments and Busses on which the VFs actually reside by Level-2 SW.

Generally speaking the guest device configuration (e.g. ECAM)
is completely emulated by the hypervisor (by either supporting
the legacy CF8/CF8 intel peek/poke mechanism (traping IN/OUT
instructions on x86) or by taking a page fault to the hypervisor
when the guest accesses a designated ECAM region (the address of
which is provided to the guest by the HV via ACPI or Device Tree tables).

Host OS and HyperVisor have the ability to translate Guest Physical
{Configuration and MM I/O} accesses into Universal {Config or MM I/O}
accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral.

This seems unnecessarily complicated.

So did IEEE 754 in 1982...

What I have done is to virtualize Config and MM I/O spaces, so Guest OS
does not even see that it is not Real OS running on bare metal--and doing
so without HV intervention on any of the Config or MM I/O accesses.

Config accesses are quite rare. Mainly for device discovery and
initial BAR setup, there is no benefit to supporting virtualization
of that in the hardware. All existing hypervisors provide emulated
ECAM regions to the guest.

MMI/O is handled efficiently by the HV using a nested page table.

Every SR-IOV capable device
is different and aside the standard PCIe defined configuration space
registers, everything else is device-specific.

Only requires 3 bits in the MM I/O PTE.
Only requires 1 bit in Config PTE, a bit that already had to be there.

You need MY66000 HV/Kernel software to support that.

I can tell you, from experience, that custom hardware support
for third-party standard devices in the kernels (windows, linux, et al)
is frowned upon by the linux community. They're called quirks, supporting
them complicates the driver implementations in the operating
software.

--- Synchronet 3.21a-Linux NewsLink 1.2

From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Fri Aug 22 20:28:56 2025

From Newsgroup: comp.arch

Terje Mathisen <terje.mathisen@tmsw.no> posted:

BGB wrote:

On 8/22/2025 11:17 AM, MitchAlsup wrote:

scott@slp53.sl.home (Scott Lurndal) posted:

<snip>

Host OS and HyperVisor have the ability to translate Guest Physical
{Configuration and MM I/O} accesses into Universal {Config or MM I/O} >>>> accesses. This requires that the PTE KNOW how SR-IOV was set up on
that virtual Peripheral.

This seems unnecessarily complicated.

So did IEEE 754 in 1982...

Still is...

Denormals, Inf/NaN, ... tend to accomplish relatively little in
practice; apart from making FPUs more expensive, often slower, and requiring programmers to go through extra hoops to specify DAZ/FTZ in cases where they need more performance.

Likewise, +/- 0.5 ULP, accomplishes little beyond adding cost; whereas
+/- 0.63 ULP would be a lot cheaper, and accomplishes nearly the same effect.

Well, apart from the seeming failure of being unable to fully converge
the last few bits of N-R, which seems to depend primarily on sub-ULP bits.

Having spent ~10 years (very much part time!) working on 754 standards I strongly believe you are wrong:

Not to mention the rest of the FP numerical community--even Posits.

Yes, there are a few small issues, some related to grandfather clauses
that might go away at some point, but the zero/subnorm/normal/inf/nan
setup is not one of them.

Personally I think it would have been a huge win if the original
standard had defined inf/nan a different way:

What we have is Inf == Maximal exponent, all-zero mantissa, while all
other mantissa values indicates a NaN.

For binary FP it is totally up to the CPU vendor how to define Quiet NaN
vs Signalling NaN, most common seems to be to set the top bit in the mantissa.

What we have been missing for 40 years now is a fourth category:

None (or Null/Missing)

This would have simplified all sorts of array/matrix sw where both
errors (NaN) and missing (None) items are possible.

The easiest way to implement it would also make the FPU hardware simpler:

The top two bits of the mantissa define

11 : SNaN
10 : QNaN
01 : None
00 : Inf

The rest of the mantissa bits could then carry any payload you want, including optional debug info for Infinities.

And a way where overflow saturates below infinity and underflow
saturates above zero. That would make it clear if the operand/
result was an actual infinity, or whether it overflowed the
container to become infinity.

zero < underflowed < denorm < norm < overflowed < infinity

Terje

--- Synchronet 3.21a-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Aug 23 06:05:03 2025

From Newsgroup: comp.arch

Terje Mathisen <terje.mathisen@tmsw.no> writes:

BGB wrote:
What we have been missing for 40 years now is a fourth category:

None (or Null/Missing)

My understanding has been that SNaNs are intended to be used for
elements that should not be used as computation operands, e.g., for
otherwise uninitialized array elements.

This would have simplified all sorts of array/matrix sw where both
errors (NaN) and missing (None) items are possible.

In what ways would None behave differently from SNaN?

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Sat Aug 23 00:05:56 2025
  from Moore, Ok via Telnet
- Noozle
  Fri Aug 22 11:07:42 2025
  from Noozle City via Telnet
- Microbot
  Fri Aug 22 01:53:59 2025
  from Moore, Ok via Telnet
- Microbot
  Thu Aug 21 03:21:53 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,064
Nodes:	10 (0 / 10)
Uptime:	149:56:42
Calls:	13,691
Calls today:	1
Files:	186,936
D/L today:	438 files (115M bytes)
Messages:	2,410,967

What I did on my summer vacation

Who's Online

Recent Visitors

System Info