Forum: War Ensemble BBS

$0.03 microcontroller

From Clifford Heath@no.spam@please.net to comp.arch.embedded on Wed Oct 10 12:05:23 2018

From Newsgroup: comp.arch.embedded

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath
--- Synchronet 3.20a-Linux NewsLink 1.114

From lasselangwadtchristensen@lasselangwadtchristensen@gmail.com to comp.arch.embedded on Wed Oct 10 16:12:50 2018

From Newsgroup: comp.arch.embedded

onsdag den 10. oktober 2018 kl. 03.05.27 UTC+2 skrev Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

https://youtu.be/VYhAGnsnO7w
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Wed Oct 10 17:51:12 2018

From Newsgroup: comp.arch.embedded

On Tuesday, October 9, 2018 at 9:05:27 PM UTC-4, Clifford Heath wrote:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

Interesting. They have some very off-brand FPGA type devices as well at very low prices, but they still don't do me any favors with the packages.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From lasselangwadtchristensen@lasselangwadtchristensen@gmail.com to comp.arch.embedded on Wed Oct 10 18:56:13 2018

From Newsgroup: comp.arch.embedded

torsdag den 11. oktober 2018 kl. 02.51.17 UTC+2 skrev gnuarm.del...@gmail.com:

On Tuesday, October 9, 2018 at 9:05:27 PM UTC-4, Clifford Heath wrote:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

Interesting. They have some very off-brand FPGA type devices as well at very low prices, but they still don't do me any favors with the packages.

they also do PCBs jlcpcb.com and I've heard they also have a dirt cheap assembly service as long as you only use their list of components, though it seems it
is so far only available in China
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Wed Oct 10 19:29:13 2018

From Newsgroup: comp.arch.embedded

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I²C, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just
an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram,
enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
I like that it's in a 6-pin SOT23 package since there aren't many other
MCUs that small.
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Thu Oct 11 11:39:56 2018

From Newsgroup: comp.arch.embedded

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I�C, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just
an accumulator, a cute concept.

There is a lot of operations that will update memory locations, so why
would you need a lot of CPU registers.

1K of program OTP and 64 bytes of ram,

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

enough for plenty of MCU things. Didn't check if it has an ADC or PWM.

At least the 8 pin version has both a PWM as well as a comparator, so
making an ADC wouldn't be too hard.

I like that it's in a 6-pin SOT23 package since there aren't many other
MCUs that small.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael Kellett@mk@mkesc.co.uk to comp.arch.embedded on Thu Oct 11 14:04:39 2018

From Newsgroup: comp.arch.embedded

On 10/10/2018 02:05, Clifford Heath wrote:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

Has anyone actually used them - or worked out where to get the ICE and
how much it costs ?

MK
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Thu Oct 11 15:56:52 2018

From Newsgroup: comp.arch.embedded

On 11/10/18 15:04, Michael Kellett wrote:

On 10/10/2018 02:05, Clifford Heath wrote:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

Has anyone actually used them - or worked out where to get the ICE and
how much it costs ?

MK

The cost of the ICE is not going to be significant for most people - you usually use a chip like this when you want huge quantities (even though
it is available in small numbers).

What turns me off here is the programming procedure for the OTP devices.
There is no information on it - just a simple one-at-a-time programmer
device. That is useless for production - you need an automated system,
or support from existing automated programmers, or at the very least the programming information so that you can build your own specialist
programmer. There is no point in buying a microcontroller for $0.03 if
the time taken to manually take a device out a tube, manually program
it, and manually put it back in another tube for the pick-and-place
costs you $1 production time.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Thu Oct 11 16:08:00 2018

From Newsgroup: comp.arch.embedded

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why
would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

It would be nice to have a C compiler, and registers help with that.

At least the 8 pin version has both a PWM as well as a comparator, so
making an ADC wouldn't be too hard.

Thanks.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Fri Oct 12 08:50:49 2018

From Newsgroup: comp.arch.embedded

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why
would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

It would be nice to have a C compiler, and registers help with that.

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative adressing mode. On would want to reserve a few memory locations as pseudo-registers to help with that, but that only goes so far.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Oct 12 09:44:15 2018

From Newsgroup: comp.arch.embedded

On 12/10/18 08:50, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why
would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the
accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

It would be nice to have a C compiler, and registers help with that.

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative adressing mode. On would want to reserve a few memory locations as pseudo-registers to help with that, but that only goes so far.

It looks like the lowest 16 memory addresses could be considered pseudo-registers - they are the ones that can be used for direct memory
access rather than needing indirect access.

And I don't think inefficient reentrant functions would be much of a
worry on a device with so little code space!

Some of the examples in the datasheet were given in C - that implies
that there already is a C compiler for the device. Has anyone tried the
IDE?

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Fri Oct 12 10:18:56 2018

From Newsgroup: comp.arch.embedded

Am 10.10.2018 um 03:05 schrieb Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

They even make dual-core variants (the part where the first digit in the
part number is '2'). It seems program counter, stack pointer, flag
register and accumulator are per-core, while the rest, including the ALU
is shared. In particular, the I/O registers are also shared, which means
some multiplier registers would also be - but currently all variants
with integrated multiplier are single-core.
Use of the ALU is shared byt he two cores, alternating by clock cycle.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Fri Oct 12 09:11:02 2018

From Newsgroup: comp.arch.embedded

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why
would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

It would be nice to have a C compiler, and registers help with that.

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative adressing mode. On would want to reserve a few memory locations as pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be programmed in Forth. It's a great tool for small MCUs and many times can be hosted on the target although not likely in this case. Still, you can bring enough functionality onto the MCU to allow direct downloads and many debugging features without an ICE.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Fri Oct 12 21:30:42 2018

From Newsgroup: comp.arch.embedded

On Fri, 12 Oct 2018 09:44:15 +0200, David Brown
<david.brown@hesbynett.no> wrote:

On 12/10/18 08:50, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the
accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

It would be nice to have a C compiler, and registers help with that.

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative
adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

It looks like the lowest 16 memory addresses could be considered >pseudo-registers - they are the ones that can be used for direct memory >access rather than needing indirect access.

And I don't think inefficient reentrant functions would be much of a
worry on a device with so little code space!

The real issue would be the small RAM size.

With such small ROM/RAM sizes, who needs reentrant functions ?
Possibly only if you think that every problem must be solved by
recursion :-).

Reentrancy is nice, when writing run time library (RTL) routines that
might be called from different contexts, but who in their right mind
would call RTL routines from the ISR ?

OK, some might put an RTOS into that processor, but even in that case,
the RTOS might consist only of a simple foreground/background monitor,
so unlikely need reentran routines.

If you insist of using "C", just declare all variables as

static uint8_t (and a few static uint16_t)

so no reentrant code is generated.

However, you could as well use some old 8 bitter languages such as
PL/M-80.

Some of the examples in the datasheet were given in C - that implies
that there already is a C compiler for the device. Has anyone tried the
IDE?

--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Fri Oct 12 21:39:06 2018

From Newsgroup: comp.arch.embedded

On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 10.10.2018 um 03:05 schrieb Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I�C, but still...

Clifford Heath

They even make dual-core variants (the part where the first digit in the
part number is '2'). It seems program counter, stack pointer, flag
register and accumulator are per-core, while the rest, including the ALU
is shared. In particular, the I/O registers are also shared, which means
some multiplier registers would also be - but currently all variants
with integrated multiplier are single-core.
Use of the ALU is shared byt he two cores, alternating by clock cycle.

Philipp

Interesting, that would make it easy to run a multitasking RTOS (foreground/background) monitor, which might justify the use of some
reentrant library routines :-). But in reality, the available memory
(ROM/RAM) is so small so that you could easily manage this with static
memory allocations.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Fri Oct 12 22:06:02 2018

From Newsgroup: comp.arch.embedded

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

The real issue would be the small RAM size.

Devices with this architecture go up to 256 B of RAM (but they then cost
a few cent more).

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Fri Oct 12 23:45:54 2018

From Newsgroup: comp.arch.embedded

On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

The real issue would be the small RAM size.

Devices with this architecture go up to 256 B of RAM (but they then cost
a few cent more).

Philipp

Did you find the binary encoding of various instruction formats, i.e
how many bits allocated to the operation code and how many for the
address field ?

My initial guess was that the instruction word is simple 8 bit opcode
+ 8 bit address, but the bit and word address limits for the smaller
models would suggest that for some op-codes, the op-code field might
be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
and word addressing).

--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael Kellett@mk@mkesc.co.uk to comp.arch.embedded on Sat Oct 13 09:35:36 2018

From Newsgroup: comp.arch.embedded

On 11/10/2018 14:56, David Brown wrote:

On 11/10/18 15:04, Michael Kellett wrote:

On 10/10/2018 02:05, Clifford Heath wrote:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

Has anyone actually used them - or worked out where to get the ICE and
how much it costs ?

MK

The cost of the ICE is not going to be significant for most people - you usually use a chip like this when you want huge quantities (even though
it is available in small numbers).

What turns me off here is the programming procedure for the OTP devices.
There is no information on it - just a simple one-at-a-time programmer device. That is useless for production - you need an automated system,
or support from existing automated programmers, or at the very least the programming information so that you can build your own specialist
programmer. There is no point in buying a microcontroller for $0.03 if
the time taken to manually take a device out a tube, manually program
it, and manually put it back in another tube for the pick-and-place
costs you $1 production time.

My major interest in this part was for fun - hence caring about the cost
of the ICE. From a business point of view it makes no sense - by the
time you reach numbers big enough to care about the cost of the micro
the risk of using a part like this is too great. Different if you are
next door to the manufacturer.

If you want a hardware minimal processor the Maxim 32660 looks like fun
3mm square, 24 pin Cortex M4, 96MHz, 256k flash, 96k RAM, £1.16 (10 off).

My guess is that you need to be using at least 5k of them before the
cheaper Padauk part offsets the cost of using one.

MK
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Oct 13 12:46:15 2018

From Newsgroup: comp.arch.embedded

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the
accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

It would be nice to have a C compiler, and registers help with that.

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative
adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many times can be hosted
on the target although not likely in this case. Still, you can bring
enough functionality onto the MCU to allow direct downloads and many debugging features without an ICE.

Rick C.

Forth is a good language for very small devices, but there are details
that can make a huge difference in how efficient it is. To make Forth
work well on a small chip you need a Forth-specific instruction set to
target the stack processing. For example, adding two numbers in this
chip is two instructions - load accumulator from memory X, add
accumulator to memory Y. In a Forth cpu, you'd have a single
instruction that does "pop two numbers, add them, push the result".
That gives a very efficient and compact instruction set. But it is hard
to get the same results from a chip that doesn't have this kind of
stack-based instruction set.

--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Oct 13 05:06:23 2018

From Newsgroup: comp.arch.embedded

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the >>> accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>> assembly program listing.

It would be nice to have a C compiler, and registers help with that.

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative >> adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many times can be hosted
on the target although not likely in this case. Still, you can bring
enough functionality onto the MCU to allow direct downloads and many debugging features without an ICE.

Rick C.

Forth is a good language for very small devices, but there are details
that can make a huge difference in how efficient it is. To make Forth
work well on a small chip you need a Forth-specific instruction set to target the stack processing. For example, adding two numbers in this
chip is two instructions - load accumulator from memory X, add
accumulator to memory Y. In a Forth cpu, you'd have a single
instruction that does "pop two numbers, add them, push the result".
That gives a very efficient and compact instruction set. But it is hard
to get the same results from a chip that doesn't have this kind of stack-based instruction set.

Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?
I believe others have said the instruction set is memory oriented with no registers. I think that means in general the CPU will be slow compared to a register based design. That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sat Oct 13 18:00:26 2018

From Newsgroup: comp.arch.embedded

On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >> >>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the >> >>> accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

It would be nice to have a C compiler, and registers help with that.

Looking at the instruction set, it should be possible to make a backend >> >> for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative >> >> adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many times can be hosted
on the target although not likely in this case. Still, you can bring
enough functionality onto the MCU to allow direct downloads and many
debugging features without an ICE.

Rick C.

Forth is a good language for very small devices, but there are details
that can make a huge difference in how efficient it is. To make Forth
work well on a small chip you need a Forth-specific instruction set to
target the stack processing. For example, adding two numbers in this
chip is two instructions - load accumulator from memory X, add
accumulator to memory Y. In a Forth cpu, you'd have a single
instruction that does "pop two numbers, add them, push the result".
That gives a very efficient and compact instruction set. But it is hard
to get the same results from a chip that doesn't have this kind of
stack-based instruction set.

Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

I believe others have said the instruction set is memory oriented with no registers.

Depending how you look at it, you could claim that it has 64 registers
and no RAM. It is a quite orthogonal single address architecture. You
can do practically all single operand instructions (like inc/dec,
shift/rotate etc.) either in the accumulator but equally well in any
of the 64 "registers". For two operand instructions (such as add/sub,
and/or etc,), either the source or destination can be in the memory
"register".

Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory are
valid.

Thus the accumulator is needed only for two operand instructions, but
not for single operand instructions.

I think that means in general the CPU will be slow compared to a register based design.

What is the difference, you have 64 on chip RAM bytes or 64 single
byte on chip registers. The situation would have been different with
on-chip registers and off chip RAM, with the memory bottleneck.

Of course, there were odd architectures like the TI 9900 with a set of
sixteen 16 bit general purpose register in RAN. The set could be
switched fast in interrupts, but slowed down any general purpose
register access.

That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.

For a stack computer you need a pointer register with preferably autoincrement/decrement support. This processor has indirect access
and single instruction increment or decrement support without
disturbing the accumulator.Thus not so bad after all for stack
computing.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 18:31:25 2018

From Newsgroup: comp.arch.embedded

On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the
accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

The data-sheet describes the OTP program memory as "1KW", probably
meaning 1024 instructions. The length of an instruction is not defined,
as far as I could see.

It would be nice to have a C compiler, and registers help with that.

The data-sheet mentions something they call "Mini-C".

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative
adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be programmed
in Forth.

I don't think that an interpreted Forth is feasible for this particular
MCU. Where would the Forth program (= list of pointers to "words") be
stored? I found no instructions for reading data from the OTP program
memory, and the 64-byte RAM will not hold a non-trivial program together
with the data for that program.

Moreover, there is no indirect jump instruction -- "jump to a computed address". The closest is "pcadd a", which can be used to implement a
256-entry case statement. You would be limited to a total of 256 words.

Moreover, each RAM-resident pointer to RAM uses 2 octets of RAM, giving
a 16-bit RAM address, although for this MCU a 6-bit address would be
enough. Apparently the same architecture has implementations with more
RAM and 16-bit RAM addresses.

That said, one could perhaps implement a compiled Forth for this machine.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Oct 13 18:21:46 2018

From Newsgroup: comp.arch.embedded

On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory
locations, so why would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic
through the accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
commented assembly program listing.

It would be nice to have a C compiler, and registers help
with that.

Looking at the instruction set, it should be possible to make a
backend for this in SDCC; the architecture looks more
C-friendly than the existing pic14 and pic16 backends. But it
surely isn't as nice as stm8 or z80. reentrant functions will
be inefficent: No registers, and no sp-relative adressing mode.
On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many
times can be hosted on the target although not likely in this
case. Still, you can bring enough functionality onto the MCU to
allow direct downloads and many debugging features without an
ICE.

Rick C.

Forth is a good language for very small devices, but there are
details that can make a huge difference in how efficient it is. To
make Forth work well on a small chip you need a Forth-specific
instruction set to target the stack processing. For example,
adding two numbers in this chip is two instructions - load
accumulator from memory X, add accumulator to memory Y. In a Forth
cpu, you'd have a single instruction that does "pop two numbers,
add them, push the result". That gives a very efficient and compact
instruction set. But it is hard to get the same results from a
chip that doesn't have this kind of stack-based instruction set.

Your point is what exactly? You are comparing running forth on some
other chip to running forth on this chip. How is that useful? There
are many other chips that run very fast. So?

My point is that /this/ CPU is not a good match for Forth, though many
other very cheap CPUs are. Whether or not you think that matches "CPUs
like this should be programmed in Forth" depends on what you mean by
"CPUs like this", and what you think the benefits of Forth are.

I believe others have said the instruction set is memory oriented
with no registers. I think that means in general the CPU will be
slow compared to a register based design. That actually means it is
easier to have a fast Forth implementation compared to other
compilers since there won't be a significant penalty for using a
stack.

It has a single register, not unlike the "W" register in small PIC
devices. Yes, I expect it is going to be slower than you would get from having a few more registers. But it is missing (AFAICS) auto-increment
and decrement modes, and has only load/store operations with indirect
access.

So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:

mov a, y; // 1 clock
add x, a; // 1 clock

If you have a data stack pointer "dsp", and want a standard Forth "+" operation, you have:

idxm a, dsp; // 2 clock
mov temp, a; // 1 clock
dec dsp; // 1 clock
idxm a, dsp; // 2 clock
add a, temp; // 1 clock
idxm dsp, a; // 2 clock

That is 9 clocks, instead of 2, and 6 instructions instead of 3.

Of course you could make a Forth compiler for the device - but you would
have to make an optimising Forth compiler that avoids needing a data
stack, just as you do on many other small microcontollers (and just as a
C compiler would do). This is /not/ a processor that fits well with
Forth or that would give a clear translation from Forth to assembly, as
is the case on some very small microcontrollers.

--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Oct 13 18:27:13 2018

From Newsgroup: comp.arch.embedded

On 13/10/18 17:00, upsidedown@downunder.com wrote:

On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>>>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the >>>>>> accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>>>> assembly program listing.

It would be nice to have a C compiler, and registers help with that. >>>>>>

Looking at the instruction set, it should be possible to make a backend >>>>> for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >>>>> or z80.
reentrant functions will be inefficent: No registers, and no sp-relative >>>>> adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many times can be hosted
on the target although not likely in this case. Still, you can bring
enough functionality onto the MCU to allow direct downloads and many
debugging features without an ICE.

Rick C.

Forth is a good language for very small devices, but there are details
that can make a huge difference in how efficient it is. To make Forth
work well on a small chip you need a Forth-specific instruction set to
target the stack processing. For example, adding two numbers in this
chip is two instructions - load accumulator from memory X, add
accumulator to memory Y. In a Forth cpu, you'd have a single
instruction that does "pop two numbers, add them, push the result".
That gives a very efficient and compact instruction set. But it is hard >>> to get the same results from a chip that doesn't have this kind of
stack-based instruction set.

Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

I believe others have said the instruction set is memory oriented with no registers.

Depending how you look at it, you could claim that it has 64 registers
and no RAM. It is a quite orthogonal single address architecture. You
can do practically all single operand instructions (like inc/dec, shift/rotate etc.) either in the accumulator but equally well in any
of the 64 "registers". For two operand instructions (such as add/sub,
and/or etc,), either the source or destination can be in the memory "register".

Not quite, no. Only the first 16 memory addresses are directly
accessible for most instructions, with the first 32 addresses being
available for word-based instructions. So you could liken it to a
device with 16 registers and indirect memory access to the rest of ram.

Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory are
valid.

Thus the accumulator is needed only for two operand instructions, but
not for single operand instructions.

I think that means in general the CPU will be slow compared to a register based design.

What is the difference, you have 64 on chip RAM bytes or 64 single
byte on chip registers. The situation would have been different with
on-chip registers and off chip RAM, with the memory bottleneck.

Of course, there were odd architectures like the TI 9900 with a set of sixteen 16 bit general purpose register in RAN. The set could be
switched fast in interrupts, but slowed down any general purpose
register access.

That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.

For a stack computer you need a pointer register with preferably autoincrement/decrement support. This processor has indirect access
and single instruction increment or decrement support without
disturbing the accumulator.Thus not so bad after all for stack
computing.

But you can't use the indirect memory accesses for any ALU instructions
- only for loading or saving the accumulator. So all indirect accesses
need to go via the accumulator - and if you want to operate on two
indirect accesses (like adding the top two elements on the stack), you
have to use another "register" address to store one element temporarily.
Yes, it would be bad for stack computing.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 19:46:30 2018

From Newsgroup: comp.arch.embedded

On 18-10-13 18:31 , Niklas Holsti wrote:

I don't think that an interpreted Forth is feasible for this particular
MCU. ...
Moreover, there is no indirect jump instruction -- "jump to a computed address".

Ok, before anyone else notices, I admit I forgot about implementing an indirect jump by pushing the target address on the stack and executing a return instruction. That would work for this machine.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
--- Synchronet 3.20a-Linux NewsLink 1.114

From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 19:59:06 2018

From Newsgroup: comp.arch.embedded

And one more iteration (sorry...)

On 18-10-13 19:46 , Niklas Holsti wrote:

On 18-10-13 18:31 , Niklas Holsti wrote:

I don't think that an interpreted Forth is feasible for this particular
MCU. ...
Moreover, there is no indirect jump instruction -- "jump to a computed
address".

Ok, before anyone else notices, I admit I forgot about implementing an indirect jump by pushing the target address on the stack and executing a return instruction. That would work for this machine.

Except that one can only "push" the accumulator and flag registers,
combined, and the flag register cannot be set directly, and has only 4
working bits.

What would work, as an indirect jump, is to set the Stack Pointer (sp)
to point at a RAM word that contains the target address, and then
execute a return. But then one has lost the actual Stack Pointer value.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sat Oct 13 20:50:33 2018

From Newsgroup: comp.arch.embedded

On Sat, 13 Oct 2018 18:27:13 +0200, David Brown
<david.brown@hesbynett.no> wrote:

On 13/10/18 17:00, upsidedown@downunder.com wrote:

On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>>>>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the >>>>>>> accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>>>>> assembly program listing.

It would be nice to have a C compiler, and registers help with that. >>>>>>>

Looking at the instruction set, it should be possible to make a backend >>>>>> for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >>>>>> or z80.
reentrant functions will be inefficent: No registers, and no sp-relative >>>>>> adressing mode. On would want to reserve a few memory locations as >>>>>> pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many times can be hosted
on the target although not likely in this case. Still, you can bring >>>>> enough functionality onto the MCU to allow direct downloads and many >>>>> debugging features without an ICE.

Rick C.

Forth is a good language for very small devices, but there are details >>>> that can make a huge difference in how efficient it is. To make Forth >>>> work well on a small chip you need a Forth-specific instruction set to >>>> target the stack processing. For example, adding two numbers in this
chip is two instructions - load accumulator from memory X, add
accumulator to memory Y. In a Forth cpu, you'd have a single
instruction that does "pop two numbers, add them, push the result".
That gives a very efficient and compact instruction set. But it is hard >>>> to get the same results from a chip that doesn't have this kind of
stack-based instruction set.

Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

I believe others have said the instruction set is memory oriented with no registers.

Depending how you look at it, you could claim that it has 64 registers
and no RAM. It is a quite orthogonal single address architecture. You
can do practically all single operand instructions (like inc/dec,
shift/rotate etc.) either in the accumulator but equally well in any
of the 64 "registers". For two operand instructions (such as add/sub,
and/or etc,), either the source or destination can be in the memory
"register".

Not quite, no. Only the first 16 memory addresses are directly
accessible for most instructions, with the first 32 addresses being >available for word-based instructions. So you could liken it to a
device with 16 registers and indirect memory access to the rest of ram.

Really ?

In the manual

M.n Only addressed in 0~0xF (0~15) is allowed

The M.n notation is for bit operations, in which M is the byte address
and n is the bit number in byte. Restricting M to 4 bits makes sense,
since n requires 3 bits, thus the total address size for bit
operations would be 7 bits.

I couldn't find a reference that the restriction on M also applies to
byte access. Where is it ?

--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sat Oct 13 21:03:28 2018

From Newsgroup: comp.arch.embedded

On Sat, 13 Oct 2018 18:31:25 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote: >>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the >>>> accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>> assembly program listing.

The data-sheet describes the OTP program memory as "1KW", probably
meaning 1024 instructions. The length of an instruction is not defined,
as far as I could see.

Yes, I misread the data sheet. It is really 1 kW.

The nice feature about Harvard architecture is that the data and
instruction size can be different.

I have tried to locate the bit allocation of various fields (opcode,
address etc.) ut no luck.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Sat Oct 13 11:06:29 2018

From Newsgroup: comp.arch.embedded

gnuarm.deletethisbit@gmail.com writes:

That actually means it is easier to have a fast Forth implementation
compared to other compilers since there won't be a significant penalty
for using a stack.

I think this chip is too small for traditional Forth implementation
methods. Just 64 bytes of ram and no registers. If you have 16 bit
cells and 8 levels of return and data stacks, half the ram is already
used by the stacks.

An F18 processor (GA144 node for those not familiar) has around 3x as
much ram including the stacks, and it doesn't pretend to be a complete
MCU (you usually split your application across multiple nodes). Plus it
has that very efficient 5-bit instruction encoding. On the other hand,
you have to use ram as program memory.

You might be able to concoct some usable Forth dialect compiled with an optimizing compiler and using 8-bit data when possible, but it doesn't
seem that useful for a chip like this.
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sat Oct 13 21:31:26 2018

From Newsgroup: comp.arch.embedded

On Sat, 13 Oct 2018 19:59:06 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

And one more iteration (sorry...)

On 18-10-13 19:46 , Niklas Holsti wrote:

On 18-10-13 18:31 , Niklas Holsti wrote:

I don't think that an interpreted Forth is feasible for this particular
MCU. ...
Moreover, there is no indirect jump instruction -- "jump to a computed
address".

Ok, before anyone else notices, I admit I forgot about implementing an
indirect jump by pushing the target address on the stack and executing a
return instruction. That would work for this machine.

Except that one can only "push" the accumulator and flag registers, >combined, and the flag register cannot be set directly, and has only 4 >working bits.

What would work, as an indirect jump, is to set the Stack Pointer (sp)
to point at a RAM word that contains the target address, and then
execute a return. But then one has lost the actual Stack Pointer value.

Just call a "Jumper" routine, the call pushes the return address on
stack. In "Jumper" read SP from IO address space, indirectly modify
the return address on stack as needed and perform a ret instruction,
causing a jump to the modified return address and it also restores the
SP to the value before the call.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 22:19:59 2018

From Newsgroup: comp.arch.embedded

On 18-10-13 21:31 , upsidedown@downunder.com wrote:

On Sat, 13 Oct 2018 19:59:06 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

And one more iteration (sorry...)

On 18-10-13 19:46 , Niklas Holsti wrote:

On 18-10-13 18:31 , Niklas Holsti wrote:

I don't think that an interpreted Forth is feasible for this particular >>>> MCU. ...
Moreover, there is no indirect jump instruction -- "jump to a computed >>>> address".

Ok, before anyone else notices, I admit I forgot about implementing an
indirect jump by pushing the target address on the stack and executing a >>> return instruction. That would work for this machine.

Except that one can only "push" the accumulator and flag registers,
combined, and the flag register cannot be set directly, and has only 4
working bits.

What would work, as an indirect jump, is to set the Stack Pointer (sp)
to point at a RAM word that contains the target address, and then
execute a return. But then one has lost the actual Stack Pointer value.

Just call a "Jumper" routine, the call pushes the return address on
stack. In "Jumper" read SP from IO address space, indirectly modify
the return address on stack as needed and perform a ret instruction,
causing a jump to the modified return address and it also restores the
SP to the value before the call.

Right, that sounds possible. But wow what a circumlocution.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sat Oct 13 21:24:27 2018

From Newsgroup: comp.arch.embedded

Am 13.10.2018 um 18:59 schrieb Niklas Holsti:

Except that one can only "push" the accumulator and flag registers,
combined, and the flag register cannot be set directly, and has only 4 working bits.

It seems unclear to me which of acc and sp is pushed first.
But if acc is pushed first, one could do

pushaf;
mov a, sp;
inc a;
mov sp, a;

to push any desired byte onto the stack.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sat Oct 13 21:47:48 2018

From Newsgroup: comp.arch.embedded

Am 12.10.2018 um 22:45 schrieb upsidedown@downunder.com:

On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

The real issue would be the small RAM size.

Devices with this architecture go up to 256 B of RAM (but they then cost
a few cent more).

Philipp

Did you find the binary encoding of various instruction formats, i.e
how many bits allocated to the operation code and how many for the
address field ?

My initial guess was that the instruction word is simple 8 bit opcode
+ 8 bit address, but the bit and word address limits for the smaller
models would suggest that for some op-codes, the op-code field might
be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
and word addressing).

People have tried before (https://www.mikrocontroller.net/topic/449689, https://stackoverflow.com/questions/49842256/reverse-engineer-assembler-which-probably-encrypts-code).
Apparently, even with access to the tools it is not obvious.

However, a Chinese manual contains these examples:

5E0A MOV A BB1
1B21 COMP A #0x21
2040 T0SN CF
5C0B MOV BB2 A
C028 GOTO 0x28
0030 WDRESET
1F00 MOV A #0x0
0082 MOV SP A

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 22:50:57 2018

From Newsgroup: comp.arch.embedded

On 18-10-13 22:24 , Philipp Klaus Krause wrote:

Am 13.10.2018 um 18:59 schrieb Niklas Holsti:

Except that one can only "push" the accumulator and flag registers,
combined, and the flag register cannot be set directly, and has only 4
working bits.

It seems unclear to me which of acc and sp is pushed first.
But if acc is pushed first, one could do

pushaf;
mov a, sp;
inc a;
mov sp, a;

to push any desired byte onto the stack.

There's also a rule that the sp must always contain an even address, at
least if interrupts are enabled, as I understand it.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Sat Oct 13 13:06:06 2018

From Newsgroup: comp.arch.embedded

Michael Kellett <mk@mkesc.co.uk> writes:

If you want a hardware minimal processor the Maxim 32660 looks like fun
3mm square, 24 pin Cortex M4, 96MHz, 256k flash, 96k RAM, £1.16 (10 off).

That's not minimal ;). More practically, the 3mm square package sounds
like a WLCSP which I think requires specialized ($$$) board fab
facilities (it can't be hand soldered or done with normal reflow
processes). Part of the Padauk part's attraction is the 6-pin SOT23
package.

Here's a complete STM8 board for 0.77 USD shipped:

https://www.aliexpress.com/item//32527571163.html

It has 8k of program flash and 1k of ram and can run a resident Forth interpreter. I think they also make a SOIC-8 version of the cpu. I
bought a few of those boards for around 0.50 each last year so I guess
they have gotten a bit more expensive since then.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim@cpldcpu+usenet@gmail.com to comp.arch.embedded on Sun Oct 14 01:46:37 2018

From Newsgroup: comp.arch.embedded

On 10/10/2018 03:05 AM, Clifford Heath wrote:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

This is quite curious. I wonder

- Has anyone actually received the devices they ordered? The cheaper variants seem to be sold out.
- Any success in setting up a programmer?
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Oct 13 18:20:56 2018

From Newsgroup: comp.arch.embedded

On Saturday, October 13, 2018 at 11:00:30 AM UTC-4, upsid...@downunder.com wrote:

On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why
would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the >> >>> accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >> >>>> assembly program listing.

It would be nice to have a C compiler, and registers help with that. >> >>>

Looking at the instruction set, it should be possible to make a backend >> >> for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >> >> or z80.
reentrant functions will be inefficent: No registers, and no sp-relative
adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many times can be hosted
on the target although not likely in this case. Still, you can bring
enough functionality onto the MCU to allow direct downloads and many
debugging features without an ICE.

Rick C.

Forth is a good language for very small devices, but there are details
that can make a huge difference in how efficient it is. To make Forth
work well on a small chip you need a Forth-specific instruction set to
target the stack processing. For example, adding two numbers in this
chip is two instructions - load accumulator from memory X, add
accumulator to memory Y. In a Forth cpu, you'd have a single
instruction that does "pop two numbers, add them, push the result".
That gives a very efficient and compact instruction set. But it is hard >> to get the same results from a chip that doesn't have this kind of
stack-based instruction set.

Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

I believe others have said the instruction set is memory oriented with no registers.

Depending how you look at it, you could claim that it has 64 registers
and no RAM. It is a quite orthogonal single address architecture. You
can do practically all single operand instructions (like inc/dec, shift/rotate etc.) either in the accumulator but equally well in any
of the 64 "registers". For two operand instructions (such as add/sub,
and/or etc,), either the source or destination can be in the memory "register".

Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory are
valid.

Thus the accumulator is needed only for two operand instructions, but
not for single operand instructions.

How fast are instructions that access memory? Most MCUs will perform register operations in a single cycle. Even though RAM may be on chip, it typically is not as fast as registers because it is usually not multiported. DSP chips are an exception with dual and even triple ported on chip RAM.

I think that means in general the CPU will be slow compared to a register based design.

What is the difference, you have 64 on chip RAM bytes or 64 single
byte on chip registers. The situation would have been different with
on-chip registers and off chip RAM, with the memory bottleneck.

Of course, there were odd architectures like the TI 9900 with a set of sixteen 16 bit general purpose register in RAN. The set could be
switched fast in interrupts, but slowed down any general purpose
register access.

Yeah, I'm familiar with the 9900. In the 990 it worked well because the CPU was TTL and not so fast. Once the CPU was on a single chip the external RAM was not fast enough to keep up really and instruction timings were dominated by the memory.

That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.

For a stack computer you need a pointer register with preferably autoincrement/decrement support. This processor has indirect access
and single instruction increment or decrement support without
disturbing the accumulator.Thus not so bad after all for stack
computing.

The stack in memory is usually a bottle neck because memory is typically slow so optimizations would be done to keep operands in registers. In this chip no optimizations are possible, but likely it wouldn't be too bad as long as the stack operations are flexible enough. But then I don't think you said this CPU has the sort of addressing that allows an operand in memory to be used and popped off the stack in one opcode as many, higher level CPUs do. So adding the two numbers on the stack would involve keeping the top of stack in the accumulator, adding the next item on the stack from memory to the accumulator, then another instruction to adjust the stack pointer which is also in memory. So two instructions? How many clock cycles?
What happens when there is a change in the instruction pointer of the Forth virtual machine? Calling a new word would require saving the current value of the Forth IP on the return stack (separate from the data stack) and loading a new value into the Forth IP? This is a piece of code typically called "next". It varies a bit between indirect and direct threaded code. Then there is subroutine threaded code that just uses the CPU IP as the Forth IP and each address is actually a CPU call instruction.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Oct 13 18:25:25 2018

From Newsgroup: comp.arch.embedded

On Saturday, October 13, 2018 at 11:31:30 AM UTC-4, Niklas Holsti wrote:

On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the >>> accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>> assembly program listing.

The data-sheet describes the OTP program memory as "1KW", probably
meaning 1024 instructions. The length of an instruction is not defined,
as far as I could see.

It would be nice to have a C compiler, and registers help with that.

The data-sheet mentions something they call "Mini-C".

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative >> adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be programmed
in Forth.

I don't think that an interpreted Forth is feasible for this particular
MCU. Where would the Forth program (= list of pointers to "words") be stored? I found no instructions for reading data from the OTP program memory, and the 64-byte RAM will not hold a non-trivial program together with the data for that program.

Moreover, there is no indirect jump instruction -- "jump to a computed address". The closest is "pcadd a", which can be used to implement a 256-entry case statement. You would be limited to a total of 256 words.

For programs on such a small MCU 256 words is likely much overkill. But you don't need to have the above features for Forth. Subroutine threading uses call and return instructions instead of an address list.

Moreover, each RAM-resident pointer to RAM uses 2 octets of RAM, giving
a 16-bit RAM address, although for this MCU a 6-bit address would be
enough. Apparently the same architecture has implementations with more
RAM and 16-bit RAM addresses.

That said, one could perhaps implement a compiled Forth for this machine.

Yeah, I'm pretty sure it is too small for a resident Forth, so a host would be required and a Forth can be compiled and subroutine threaded.

Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Oct 13 18:32:51 2018

From Newsgroup: comp.arch.embedded

On Saturday, October 13, 2018 at 12:21:51 PM UTC-4, David Brown wrote:

On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory
locations, so why would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic
through the accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
commented assembly program listing.

It would be nice to have a C compiler, and registers help
with that.

Looking at the instruction set, it should be possible to make a
backend for this in SDCC; the architecture looks more
C-friendly than the existing pic14 and pic16 backends. But it
surely isn't as nice as stm8 or z80. reentrant functions will
be inefficent: No registers, and no sp-relative adressing mode.
On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many
times can be hosted on the target although not likely in this
case. Still, you can bring enough functionality onto the MCU to
allow direct downloads and many debugging features without an
ICE.

Rick C.

Forth is a good language for very small devices, but there are
details that can make a huge difference in how efficient it is. To
make Forth work well on a small chip you need a Forth-specific
instruction set to target the stack processing. For example,
adding two numbers in this chip is two instructions - load
accumulator from memory X, add accumulator to memory Y. In a Forth
cpu, you'd have a single instruction that does "pop two numbers,
add them, push the result". That gives a very efficient and compact
instruction set. But it is hard to get the same results from a
chip that doesn't have this kind of stack-based instruction set.

Your point is what exactly? You are comparing running forth on some
other chip to running forth on this chip. How is that useful? There
are many other chips that run very fast. So?

My point is that /this/ CPU is not a good match for Forth, though many
other very cheap CPUs are. Whether or not you think that matches "CPUs
like this should be programmed in Forth" depends on what you mean by
"CPUs like this", and what you think the benefits of Forth are.

I believe others have said the instruction set is memory oriented
with no registers. I think that means in general the CPU will be
slow compared to a register based design. That actually means it is
easier to have a fast Forth implementation compared to other
compilers since there won't be a significant penalty for using a
stack.

It has a single register, not unlike the "W" register in small PIC
devices. Yes, I expect it is going to be slower than you would get from having a few more registers. But it is missing (AFAICS) auto-increment
and decrement modes, and has only load/store operations with indirect access.

So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:

mov a, y; // 1 clock
add x, a; // 1 clock

Keep the TOS in the accumulator and I think you end up with

add a, x; // 1 clock
inc DSTKPTR; // adjust stack pointer - 1 clock?

Does that work? Reading below, I guess not.

If you have a data stack pointer "dsp", and want a standard Forth "+" operation, you have:

idxm a, dsp; // 2 clock
mov temp, a; // 1 clock
dec dsp; // 1 clock
idxm a, dsp; // 2 clock
add a, temp; // 1 clock
idxm dsp, a; // 2 clock

That is 9 clocks, instead of 2, and 6 instructions instead of 3.

What does idxm do? Looks like an indirect load? Can this address mode be combined with any operations? Are operations limited in the addressing modes? This seems like a very, very simple CPU, but for the money, I guess I get it.

Of course you could make a Forth compiler for the device - but you would have to make an optimising Forth compiler that avoids needing a data
stack, just as you do on many other small microcontollers (and just as a
C compiler would do). This is /not/ a processor that fits well with
Forth or that would give a clear translation from Forth to assembly, as
is the case on some very small microcontrollers.

OK

Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Sat Oct 13 19:02:38 2018

From Newsgroup: comp.arch.embedded

gnuarm.deletethisbit@gmail.com writes:

Keep the TOS in the accumulator

Do you mean you want a Forth with 8-bit data cells? What about the
cells on the return stack, if there is one?

What does idxm do? Looks like an indirect load?

Yes.

Can this address mode be combined with any operations?

No. Just load or store.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sun Oct 14 08:53:15 2018

From Newsgroup: comp.arch.embedded

Am 14.10.2018 um 03:20 schrieb gnuarm.deletethisbit@gmail.com:

How fast are instructions that access memory? Most MCUs will perform register operations in a single cycle. Even though RAM may be on
chip, it typically is not as fast as registers because it is usually
not multiported. DSP chips are an exception with dual and even
triple ported on chip RAM.

All instructions except for jumps are 1 cycle. Jumps if taken are 2
cycles, 1 otherwise.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sun Oct 14 08:55:22 2018

From Newsgroup: comp.arch.embedded

Am 14.10.2018 um 08:53 schrieb Philipp Klaus Krause:

Am 14.10.2018 um 03:20 schrieb gnuarm.deletethisbit@gmail.com:

How fast are instructions that access memory? Most MCUs will perform
register operations in a single cycle. Even though RAM may be on
chip, it typically is not as fast as registers because it is usually
not multiported. DSP chips are an exception with dual and even
triple ported on chip RAM.

All instructions except for jumps are 1 cycle. Jumps if taken are 2
cycles, 1 otherwise.

Philipp

idxm and ldxm seem to be 2 cycles, too.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sun Oct 14 11:58:08 2018

From Newsgroup: comp.arch.embedded

On Sat, 13 Oct 2018 21:47:48 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 12.10.2018 um 22:45 schrieb upsidedown@downunder.com:

On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

The real issue would be the small RAM size.

Devices with this architecture go up to 256 B of RAM (but they then cost >>> a few cent more).

Philipp

Did you find the binary encoding of various instruction formats, i.e
how many bits allocated to the operation code and how many for the
address field ?

My initial guess was that the instruction word is simple 8 bit opcode
+ 8 bit address, but the bit and word address limits for the smaller
models would suggest that for some op-codes, the op-code field might
be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
and word addressing).

People have tried before (https://www.mikrocontroller.net/topic/449689, >https://stackoverflow.com/questions/49842256/reverse-engineer-assembler-which-probably-encrypts-code).
Apparently, even with access to the tools it is not obvious.

However, a Chinese manual contains these examples:

5E0A MOV A BB1
1B21 COMP A #0x21
2040 T0SN CF
5C0B MOV BB2 A
C028 GOTO 0x28
0030 WDRESET
1F00 MOV A #0x0
0082 MOV SP A

Philipp

Interesting, this at least confirms that the instruction word is 16
bits. In a Harvard architecture, the word length could have been
13-17 bits, with some dirty encodings in 113 bit case., but a cleaner
encoding with 14-17 bit instruction words.

Assuming one would like to make an encoding for exactly 1024 code
words and 64 byte data memory, a tighter encoding would be possible.
Of course a manufacturer with small and larger processors, would make
sense to use the same encoding for all processors, which is slightly inefficient for smaller models.

Anyway 1 kW/64 byes case, the following code points would be required:

2048 = 2 x 1024 call, goto
1792 = 7 x 256 Immediate data (8 bit)
2304 = 36 x 64 M-referense (6 bit)
1024 = 8 x 128 Bit ref (M and IO 3+4 bits
others

This might barely fit into 13 bits, with some nasty encoding.

Limiting M-refeence to 4 bits (0-15), but you still can't fit into 12
bit instruction length.

So with 16 bit word length, I do not understand why word reference is
limited to 4-5 bits.The bit address limit makes more sense, so that it
would not consume 4096 code points.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Theo@theom+news@chiark.greenend.org.uk to comp.arch.embedded on Sun Oct 14 10:55:00 2018

From Newsgroup: comp.arch.embedded

Tim <cpldcpu+usenet@gmail.com> wrote:

This is quite curious. I wonder

- Has anyone actually received the devices they ordered? The cheaper variants seem to be sold out.

I think they've sold out since they went viral. EEVblog did a video showing 550 in stock - that's only $16 worth of parts, not hard to imagine they've
been bought up.

The other option is they're some kind of EOL part and 3c is the 'reduced to clear' price - which they have done, very successfully.

Theo
--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael Kellett@mk@mkesc.co.uk to comp.arch.embedded on Sun Oct 14 11:30:13 2018

From Newsgroup: comp.arch.embedded

On 13/10/2018 21:06, Paul Rubin wrote:

Michael Kellett <mk@mkesc.co.uk> writes:

If you want a hardware minimal processor the Maxim 32660 looks like fun
3mm square, 24 pin Cortex M4, 96MHz, 256k flash, 96k RAM, £1.16 (10 off).

That's not minimal ;). More practically, the 3mm square package sounds
like a WLCSP which I think requires specialized ($$$) board fab
facilities (it can't be hand soldered or done with normal reflow
processes). Part of the Padauk part's attraction is the 6-pin SOT23
package.

Here's a complete STM8 board for 0.77 USD shipped:

https://www.aliexpress.com/item//32527571163.html

It has 8k of program flash and 1k of ram and can run a resident Forth interpreter. I think they also make a SOIC-8 version of the cpu. I
bought a few of those boards for around 0.50 each last year so I guess
they have gotten a bit more expensive since then.

No - the BGA part is 1.6mm square (0.3mm pitch) - the 3mm is for 0.4mm
pitch QFN and there is a 0.5mm pitch QFN part at 4mm square.
The QFNs are reasonably prototype-able - needing only 0.15mm track and
gap design rules and no filled vias in pads or other horrors.
The point about the 32660 is that it is HARDWARE minimal but not
constrained in software. At low volumes cost of the parts is nothing - a
day of effort is $500 or more, in that context the difference between a
free processor and a $2 processor is invisible.

MK

--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 14 14:26:00 2018

From Newsgroup: comp.arch.embedded

On 13/10/18 19:50, upsidedown@downunder.com wrote:

On Sat, 13 Oct 2018 18:27:13 +0200, David Brown
<david.brown@hesbynett.no> wrote:

On 13/10/18 17:00, upsidedown@downunder.com wrote:

On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote: >>>>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why
would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the >>>>>>>> accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>>>>>> assembly program listing.

It would be nice to have a C compiler, and registers help with that. >>>>>>>>

Looking at the instruction set, it should be possible to make a backend >>>>>>> for this in SDCC; the architecture looks more C-friendly than the >>>>>>> existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >>>>>>> or z80.
reentrant functions will be inefficent: No registers, and no sp-relative
adressing mode. On would want to reserve a few memory locations as >>>>>>> pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many times can be hosted
on the target although not likely in this case. Still, you can bring >>>>>> enough functionality onto the MCU to allow direct downloads and many >>>>>> debugging features without an ICE.

Rick C.

Forth is a good language for very small devices, but there are details >>>>> that can make a huge difference in how efficient it is. To make Forth >>>>> work well on a small chip you need a Forth-specific instruction set to >>>>> target the stack processing. For example, adding two numbers in this >>>>> chip is two instructions - load accumulator from memory X, add
accumulator to memory Y. In a Forth cpu, you'd have a single
instruction that does "pop two numbers, add them, push the result".
That gives a very efficient and compact instruction set. But it is hard >>>>> to get the same results from a chip that doesn't have this kind of
stack-based instruction set.

Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

I believe others have said the instruction set is memory oriented with no registers.

Depending how you look at it, you could claim that it has 64 registers
and no RAM. It is a quite orthogonal single address architecture. You
can do practically all single operand instructions (like inc/dec,
shift/rotate etc.) either in the accumulator but equally well in any
of the 64 "registers". For two operand instructions (such as add/sub,
and/or etc,), either the source or destination can be in the memory
"register".

Not quite, no. Only the first 16 memory addresses are directly
accessible for most instructions, with the first 32 addresses being
available for word-based instructions. So you could liken it to a
device with 16 registers and indirect memory access to the rest of ram.

Really ?

In the manual

M.n Only addressed in 0~0xF (0~15) is allowed

The M.n notation is for bit operations, in which M is the byte address
and n is the bit number in byte. Restricting M to 4 bits makes sense,
since n requires 3 bits, thus the total address size for bit
operations would be 7 bits.

I couldn't find a reference that the restriction on M also applies to
byte access. Where is it ?

My interpretation of the manual was that you only had access to the
first 16 addresses with the M instructions. But it is entirely possible
that I am wrong and your interpretation is right. I haven't tried the devices, or the IDE, and the manual does not have details of things like instruction format.

Certainly it would be nicer for the chip if you are right!

--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 14 14:32:32 2018

From Newsgroup: comp.arch.embedded

On 14/10/18 03:20, gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 11:00:30 AM UTC-4,
upsid...@downunder.com wrote:

On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp
Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory
locations, so why would you need a lot of CPU
registers.

Being able to (say) add register to register saves
traffic through the accumulator and therefore
instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages
of commented assembly program listing.

It would be nice to have a C compiler, and registers help
with that.

Looking at the instruction set, it should be possible to
make a backend for this in SDCC; the architecture looks
more C-friendly than the existing pic14 and pic16 backends.
But it surely isn't as nice as stm8 or z80. reentrant
functions will be inefficent: No registers, and no
sp-relative adressing mode. On would want to reserve a few
memory locations as pseudo-registers to help with that, but
that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and
many times can be hosted on the target although not likely in
this case. Still, you can bring enough functionality onto the
MCU to allow direct downloads and many debugging features
without an ICE.

Rick C.

Forth is a good language for very small devices, but there are
details that can make a huge difference in how efficient it is.
To make Forth work well on a small chip you need a
Forth-specific instruction set to target the stack processing.
For example, adding two numbers in this chip is two
instructions - load accumulator from memory X, add accumulator
to memory Y. In a Forth cpu, you'd have a single instruction
that does "pop two numbers, add them, push the result". That
gives a very efficient and compact instruction set. But it is
hard to get the same results from a chip that doesn't have this
kind of stack-based instruction set.

Your point is what exactly? You are comparing running forth on
some other chip to running forth on this chip. How is that
useful? There are many other chips that run very fast. So?

I believe others have said the instruction set is memory oriented
with no registers.

Depending how you look at it, you could claim that it has 64
registers and no RAM. It is a quite orthogonal single address
architecture. You can do practically all single operand
instructions (like inc/dec, shift/rotate etc.) either in the
accumulator but equally well in any of the 64 "registers". For two
operand instructions (such as add/sub, and/or etc,), either the
source or destination can be in the memory "register".

Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory
are valid.

Thus the accumulator is needed only for two operand instructions,
but not for single operand instructions.

How fast are instructions that access memory? Most MCUs will perform register operations in a single cycle. Even though RAM may be on
chip, it typically is not as fast as registers because it is usually
not multiported. DSP chips are an exception with dual and even
triple ported on chip RAM.

Single cycle, according to the manual. Instructions involving 16-bit
values are two cycle, the conditional branch instructions may be one or
two cycles, and everything else is one cycle.

It is not so hard to make the RAM dual ported when there is only 64
bytes of it. Or perhaps the core is clocked on both falling and rising
edges, so that the instructions are effectively 2/4 clocks rather than 1
or two. We can only guess.

--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 14 14:37:05 2018

From Newsgroup: comp.arch.embedded

On 14/10/18 03:32, gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 12:21:51 PM UTC-4, David Brown wrote:

On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:

On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
wrote:

On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:

On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory
locations, so why would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic
through the accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
commented assembly program listing.

It would be nice to have a C compiler, and registers help
with that.

Looking at the instruction set, it should be possible to make a
backend for this in SDCC; the architecture looks more
C-friendly than the existing pic14 and pic16 backends. But it
surely isn't as nice as stm8 or z80. reentrant functions will
be inefficent: No registers, and no sp-relative adressing mode.
On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

CPUs like this (and others that aren't like this) should be
programmed in Forth. It's a great tool for small MCUs and many
times can be hosted on the target although not likely in this
case. Still, you can bring enough functionality onto the MCU to
allow direct downloads and many debugging features without an
ICE.

Rick C.

Forth is a good language for very small devices, but there are
details that can make a huge difference in how efficient it is. To
make Forth work well on a small chip you need a Forth-specific
instruction set to target the stack processing. For example,
adding two numbers in this chip is two instructions - load
accumulator from memory X, add accumulator to memory Y. In a Forth
cpu, you'd have a single instruction that does "pop two numbers,
add them, push the result". That gives a very efficient and compact
instruction set. But it is hard to get the same results from a
chip that doesn't have this kind of stack-based instruction set.

Your point is what exactly? You are comparing running forth on some
other chip to running forth on this chip. How is that useful? There
are many other chips that run very fast. So?

My point is that /this/ CPU is not a good match for Forth, though many
other very cheap CPUs are. Whether or not you think that matches "CPUs
like this should be programmed in Forth" depends on what you mean by
"CPUs like this", and what you think the benefits of Forth are.

I believe others have said the instruction set is memory oriented
with no registers. I think that means in general the CPU will be
slow compared to a register based design. That actually means it is
easier to have a fast Forth implementation compared to other
compilers since there won't be a significant penalty for using a
stack.

It has a single register, not unlike the "W" register in small PIC
devices. Yes, I expect it is going to be slower than you would get from
having a few more registers. But it is missing (AFAICS) auto-increment
and decrement modes, and has only load/store operations with indirect
access.

So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:

mov a, y; // 1 clock
add x, a; // 1 clock

Keep the TOS in the accumulator and I think you end up with

add a, x; // 1 clock
inc DSTKPTR; // adjust stack pointer - 1 clock?

Does that work? Reading below, I guess not.

If you have a data stack pointer "dsp", and want a standard Forth "+"
operation, you have:

idxm a, dsp; // 2 clock
mov temp, a; // 1 clock
dec dsp; // 1 clock
idxm a, dsp; // 2 clock
add a, temp; // 1 clock
idxm dsp, a; // 2 clock

That is 9 clocks, instead of 2, and 6 instructions instead of 3.

What does idxm do? Looks like an indirect load? Can this address
mode be combined with any operations? Are operations limited in the addressing modes? This seems like a very, very simple CPU, but for the
money, I guess I get it.

"idxm" is an indirect load or store (depending on the order of the
operands). No, there are no other operations that can be combined with indirect accesses.

If you want to keep the TOS in the accumulator, then Forth "+" becomes:

mov temp, a; // 1 clock
dec dsp; // 1 clock
idxm a, dsp; // 2 clock
add a, temp; // 1 clock

5 clocks is a good deal better than 9 clocks, but still a good deal
worse than 2 clocks.

Of course you could make a Forth compiler for the device - but you would
have to make an optimising Forth compiler that avoids needing a data
stack, just as you do on many other small microcontollers (and just as a
C compiler would do). This is /not/ a processor that fits well with
Forth or that would give a clear translation from Forth to assembly, as
is the case on some very small microcontrollers.

OK

A stack-based system is often a good choice for very small cpus - it is certainly popular for 4-bit microcontrollers. But it seems that the
designers of this device simply haven't considered support for
Forth-style coding to be important.
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 14 14:39:07 2018

From Newsgroup: comp.arch.embedded

On 13/10/18 18:59, Niklas Holsti wrote:

And one more iteration (sorry...)

On 18-10-13 19:46 , Niklas Holsti wrote:

On 18-10-13 18:31 , Niklas Holsti wrote:

I don't think that an interpreted Forth is feasible for this particular
MCU. ...
Moreover, there is no indirect jump instruction -- "jump to a computed
address".

Ok, before anyone else notices, I admit I forgot about implementing an
indirect jump by pushing the target address on the stack and executing a
return instruction. That would work for this machine.

Except that one can only "push" the accumulator and flag registers, combined, and the flag register cannot be set directly, and has only 4 working bits.

What would work, as an indirect jump, is to set the Stack Pointer (sp)
to point at a RAM word that contains the target address, and then
execute a return. But then one has lost the actual Stack Pointer value.

Or you could read the SP, put that address into a different word memory location, and use that for indirect access to write to the stack.

It is all possible, but not particularly efficient.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sun Oct 14 14:59:48 2018

From Newsgroup: comp.arch.embedded

Am 14.10.2018 um 14:37 schrieb David Brown:

A stack-based system is often a good choice for very small cpus - it is certainly popular for 4-bit microcontrollers. But it seems that the designers of this device simply haven't considered support for
Forth-style coding to be important.

Efficient stack acccess is important for C, too. Putting local variables
on the stack makes functions reentrant (not so important for small
devices), and also saves memory (bery important for small devices).

The STM8 and S08 with their efficent sp-relative adressing and the Z80
with the index registers thus make better targets for C compilers than
the MCS-51 and HC08.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 08:14:22 2018

From Newsgroup: comp.arch.embedded

Am 14.10.2018 um 10:58 schrieb upsidedown@downunder.com:

Interesting, this at least confirms that the instruction word is 16
bits. In a Harvard architecture, the word length could have been
13-17 bits, with some dirty encodings in 113 bit case., but a cleaner encoding with 14-17 bit instruction words.

Assuming one would like to make an encoding for exactly 1024 code
words and 64 byte data memory, a tighter encoding would be possible.
Of course a manufacturer with small and larger processors, would make
sense to use the same encoding for all processors, which is slightly inefficient for smaller models.

Indeed Padauk makes variants with up to 256 B of RAM.

Anyway 1 kW/64 byes case, the following code points would be required:

2048 = 2 x 1024 call, goto
1792 = 7 x 256 Immediate data (8 bit)
2304 = 36 x 64 M-referense (6 bit)
1024 = 8 x 128 Bit ref (M and IO 3+4 bits
others

This might barely fit into 13 bits, with some nasty encoding.

Limiting M-refeence to 4 bits (0-15), but you still can't fit into 12
bit instruction length.

So with 16 bit word length, I do not understand why word reference is
limited to 4-5 bits.The bit address limit makes more sense, so that it
would not consume 4096 code points.

Maybe the M-reference limit only applies to the bit manipulation
instructions? The line in the manual explains M.n, there is no seprate
line for M; maybe they only documented the restrictions, with M then
referring to the full 8-bit range outside of bit manipulation instructions?

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 10:44:07 2018

From Newsgroup: comp.arch.embedded

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

With such small ROM/RAM sizes, who needs reentrant functions ?

Everyone. With an efficent stack-pointer-relative addresing mode, you
put all local varibles on the stack and only need as much RAM as the
local variables along the longest path in the call tree.

If your local variables are all static, the local variables of two
functions that never get called at the same time still both takespace in
RAM at the same time.

Compilers can sometimes overly local variables on non-reentrant
functions as an optimization, but that will only work for some cases;
often it would require link-timeoptimization, which is not that common
in compilers for small µCs.

Example: main() calls f() and g(); both f() and g() call h(). All four functions are in different translation units, f() and g() both use a lot
of local variables, while main() and h() use little. Without link-time optimization, the compiler will use about as much RAM as f() and g()
together, when the local variables are static. When they are put on the
stack, it will only need as much RAM as either f() or g().

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Mon Oct 15 05:11:52 2018

From Newsgroup: comp.arch.embedded

On Sunday, October 14, 2018 at 8:39:10 AM UTC-4, David Brown wrote:

On 13/10/18 18:59, Niklas Holsti wrote:

And one more iteration (sorry...)

On 18-10-13 19:46 , Niklas Holsti wrote:

On 18-10-13 18:31 , Niklas Holsti wrote:

I don't think that an interpreted Forth is feasible for this particular >>> MCU. ...
Moreover, there is no indirect jump instruction -- "jump to a computed >>> address".

Ok, before anyone else notices, I admit I forgot about implementing an
indirect jump by pushing the target address on the stack and executing a >> return instruction. That would work for this machine.

Except that one can only "push" the accumulator and flag registers, combined, and the flag register cannot be set directly, and has only 4 working bits.

What would work, as an indirect jump, is to set the Stack Pointer (sp)
to point at a RAM word that contains the target address, and then
execute a return. But then one has lost the actual Stack Pointer value.

Or you could read the SP, put that address into a different word memory location, and use that for indirect access to write to the stack.

It is all possible, but not particularly efficient.

Efficiency has to be relative on such a limited machine. If there are no registers nearly everything is going to be clumsy and slow. I'm not sure using this CPU with Forth would be at all bad even if the CPU is not intended for Forth.
One of the things that makes Forth so useful is that it can be tailored to the target. Rather than use the standard words you can write your own words that better fit the architecture. I'm not a Forth system designer, but I have designed CPUs in FPGAs and being able to target my CPU design with Forth is great. My CPU uses an 8 or 9 bit instruction size with multibyte instructions by prepending immediate addresses or data. I was able to make that work easily in Forth while it would have been a bear in C or other languages.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 14:19:19 2018

From Newsgroup: comp.arch.embedded

Am 12.10.2018 um 10:18 schrieb Philipp Klaus Krause:

They even make dual-core variants […]

And there is the MCS11, with 8 cores.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 14:20:23 2018

From Newsgroup: comp.arch.embedded

Am 12.10.2018 um 08:50 schrieb Philipp Klaus Krause:

On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

On the other hand, saving those pseudo-registers at interrupts and
across function calls will be painful.

Philipp

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 14:35:18 2018

From Newsgroup: comp.arch.embedded

Am 10.10.2018 um 03:05 schrieb Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

If you are willing to pay 0.04$, you can get twice the RAM and program
memory (not OTP for this one):

https://detail.1688.com/offer/562502806054.html

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From raimond.dragomir@raimond.dragomir@gmail.com to comp.arch.embedded on Mon Oct 15 06:05:16 2018

From Newsgroup: comp.arch.embedded

luni, 15 octombrie 2018, 15:35:22 UTC+3, Philipp Klaus Krause a scris:

Am 10.10.2018 um 03:05 schrieb Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

If you are willing to pay 0.04$, you can get twice the RAM and program
memory (not OTP for this one):

https://detail.1688.com/offer/562502806054.html

Philipp

Nah... not sure. 4c is too much... :-D
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Mon Oct 15 07:19:00 2018

From Newsgroup: comp.arch.embedded

On Monday, October 15, 2018 at 9:05:23 AM UTC-4, raimond....@gmail.com wrote:

luni, 15 octombrie 2018, 15:35:22 UTC+3, Philipp Klaus Krause a scris:

Am 10.10.2018 um 03:05 schrieb Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

If you are willing to pay 0.04$, you can get twice the RAM and program memory (not OTP for this one):

https://detail.1688.com/offer/562502806054.html

Philipp

Nah... not sure. 4c is too much... :-D

Too much you say? How about THIS deal???
http://www.youboy.com/s504250937.html
Three for a penny! But wait, there's MORE!!! It also has more memory and an ADC.
Not sure how you actually order any of this stuff.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Mon Oct 15 07:26:46 2018

From Newsgroup: comp.arch.embedded

gnuarm.deletethisbit@gmail.com writes:

http://www.youboy.com/s504250937.html
Three for a penny! But wait, there's MORE!!! It also has more memory
and an ADC.

That's 0.35 Chinese Yuan (not Japanese Yen, which uses a similar-looking currency symbol) so about 0.05 USD.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Mon Oct 15 07:29:03 2018

From Newsgroup: comp.arch.embedded

Philipp Klaus Krause <pkk@spth.de> writes:

Compilers can sometimes overly local variables on non-reentrant
functions as an optimization, but that will only work for some cases;
often it would require link-timeoptimization, which is not that common
in compilers for small µCs.

Normally you'd use whole-program optimization, I thought. I don't know
if SDCC supports that, but GCC does, as do the more serious commercial
embedded compilers.
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Mon Oct 15 07:30:26 2018

From Newsgroup: comp.arch.embedded

On Monday, October 15, 2018 at 10:26:51 AM UTC-4, Paul Rubin wrote:

gnuarm.deletethisbit@gmail.com writes:

http://www.youboy.com/s504250937.html
Three for a penny! But wait, there's MORE!!! It also has more memory
and an ADC.

That's 0.35 Chinese Yuan (not Japanese Yen, which uses a similar-looking currency symbol) so about 0.05 USD.

Ok, thanks. So much for using Google to translate currency. You just saved my fledgling import, export, arbitrage business!

Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Mon Oct 15 21:34:10 2018

From Newsgroup: comp.arch.embedded

On Mon, 15 Oct 2018 10:44:07 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

With such small ROM/RAM sizes, who needs reentrant functions ?

Everyone. With an efficent stack-pointer-relative addresing mode, you
put all local varibles on the stack and only need as much RAM as the
local variables along the longest path in the call tree.

If you do not have efficient stack pointer relative addressing modes,
why would you put local variables on stack ?

If your local variables are all static, the local variables of two
functions that never get called at the same time still both takespace in
RAM at the same time.

Just create global variables Tmp1, Tmp2, Tmp3 ... and use these as
function local variables. As long as two functions do not call each
other directly or indirectly, you can safely use these global
variables as function local variables.

To make your program even prettier, use function specific aliases for
Tmp1, Tnp2 etc.. by using #define statements in C or multiple labels
in assembly language storage allocation.

Compilers can sometimes overly local variables on non-reentrant
functions as an optimization, but that will only work for some cases;
often it would require link-timeoptimization, which is not that common
in compilers for small �Cs.

Why do you need a linker for such small processor.? Since you are
going to use a cross-copiler on a PC with mega/gigabytes of memory,
you just compile/assemble everything into binary at once.

Example: main() calls f() and g(); both f() and g() call h(). All four >functions are in different translation units, f() and g() both use a lot
of local variables, while main() and h() use little. Without link-time >optimization, the compiler will use about as much RAM as f() and g() >together, when the local variables are static. When they are put on the >stack, it will only need as much RAM as either f() or g().

See above, no need for linker or stack variables.

The question is, do you even need full scale parameter passing ?

Function h() could use some predefined memory locations and both f()
and g() can put the parameter into those memory locations before
calling h(). Only those parameters that are different when calling
from f() or g() needs to be passed, h() can get the parameters that
are the same in both cases do not need to be passed, h() knows the
parameter already at startup.

Of course such tricks becomes impractical with larger system, but with
1 KW / 64 B, this should definitely be doable.

--- Synchronet 3.20a-Linux NewsLink 1.114

From =?UTF-8?Q?Hans-Bernhard_Br=c3=b6ker?=@HBBroeker@t-online.de to comp.arch.embedded on Mon Oct 15 21:22:37 2018

From Newsgroup: comp.arch.embedded

Am 15.10.2018 um 10:44 schrieb Philipp Klaus Krause:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

With such small ROM/RAM sizes, who needs reentrant functions ?

Everyone.

Absolutely not. Reentrant functions are a massive nuisance on fully
embedded systems, if only because they routinely make it impossible to determine the actual stack size usage.

With an efficent stack-pointer-relative addresing mode, you
put all local varibles on the stack and only need as much RAM as the
local variables along the longest path in the call tree.

And without such an addressing mode, you don't, because you'll suffer
badly in every conceivable aspect.

If your local variables are all static, the local variables of two
functions that never get called at the same time still both takespace in
RAM at the same time.

So don't mark them 'static', unless they actually have to be.

Compilers can sometimes overly local variables on non-reentrant
functions as an optimization, but that will only work for some cases;
often it would require link-timeoptimization, which is not that common
in compilers for small µCs.

On the contrary: it's precisely the compilers for such stack-starved architectures (e.g. the 8051) that have been coupling behind-the-scenes
static allocation of automatic variables with whole-program overlay
analysis since effectively forever. They really had to, because the alternative would be painful to the point of being unusable.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Tue Oct 16 09:19:44 2018

From Newsgroup: comp.arch.embedded

Am 15.10.2018 um 16:29 schrieb Paul Rubin:

Philipp Klaus Krause <pkk@spth.de> writes:

Compilers can sometimes overly local variables on non-reentrant
functions as an optimization, but that will only work for some cases;
often it would require link-timeoptimization, which is not that common
in compilers for small µCs.

Normally you'd use whole-program optimization, I thought. I don't know
if SDCC supports that, but GCC does, as do the more serious commercial embedded compilers.

Does GCC support any of these very simple µC architectures? I thought
anyhting supported by GCC tends to have rather powerful insturction sets
and plenty of registers aynway, so functions could be made reentrant by
default without any problems resulting.

While some link-time optimizations are commonly requested features for
SDCC, currently none are supported. In SDCC, even inter-procedural optimizations within the same translation unit are not as powerful as
they should be.
Well, there always is a lot of work to do on SDCC, and there are only a
few volunteers with time to work on it. So SDCC developers priorize
(usually by personal preferences).

Still, when looking at the big picture, SDCC is doing quite well
compared to other compilers for the same architectures (see e.g. http://www.colecovision.eu/stm8/compilers.shtml - comparison from early
2018, around the time of the SDCC 3.7.0 release - current SDCC is 3.8.0).

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Tue Oct 16 09:29:30 2018

From Newsgroup: comp.arch.embedded

Am 15.10.2018 um 20:34 schrieb upsidedown@downunder.com:

On Mon, 15 Oct 2018 10:44:07 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

With such small ROM/RAM sizes, who needs reentrant functions ?

Everyone. With an efficent stack-pointer-relative addresing mode, you
put all local varibles on the stack and only need as much RAM as the
local variables along the longest path in the call tree.

If you do not have efficient stack pointer relative addressing modes,
why would you put local variables on stack ?

1) The question was "With such small ROM/RAM sizes, who needs reentrant functions ?". And it was a reply to a post were the lack of a effiednt sp-relative addressing mode was cited as a disadvatage of the Padauk.
So my reasoning was, that one would want local variables on the stack,
even for small RAM / ROM, so the lack of sp-relative addressing is a disadvatage - as one has to either put local variables elsewhere or
handle stackk accesses in an inefficent way.

2) There will still be some use cases for reentrant functions. And since
the Padauk has a relatively large ROM - at least compared to the RAM
(the ROM/RAM ratio seems far higher than on typical STM8 or MCS-51
devices), when speed doesn't matter, maybe it might still be worth
putting variables on the stack. A compiler should provide an option for
that (as, e.g. SDCC does for architectures without efficnet stack
access, such as MCS-51 or HC08).

If your local variables are all static, the local variables of two
functions that never get called at the same time still both takespace in
RAM at the same time.

Just create global variables Tmp1, Tmp2, Tmp3 ... and use these as
function local variables. As long as two functions do not call each
other directly or indirectly, you can safely use these global
variables as function local variables.

I'd rather write idiomatic C anmd leave such optimizations to the compiler.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Tue Oct 16 10:00:59 2018

From Newsgroup: comp.arch.embedded

Am 15.10.2018 um 21:22 schrieb Hans-Bernhard Bröker:

Am 15.10.2018 um 10:44 schrieb Philipp Klaus Krause:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

With such small ROM/RAM sizes, who needs reentrant functions ?

Everyone.

Absolutely not. Reentrant functions are a massive nuisance on fully
embedded systems, if only because they routinely make it impossible to determine the actual stack size usage.

What is the problem?
Either you use recursion - in which case the functions need to be
reentrant, there is no alternative, or you don't. In the latter case
you'd need to do whole-program analysis to efficiently overlay the
variables - a very similar analysis could tell you the the total stack
usage.

With an efficent stack-pointer-relative addresing mode, you
put all local varibles on the stack and only need as much RAM as the
local variables along the longest path in the call tree.

And without such an addressing mode, you don't, because you'll suffer
badly in every conceivable aspect.

Yes. So compilers like SDCC when targeting MCS-51 or HC08 don't use the
stack by default (--stack-auto puts local variables on the stack
per-file, __reentrant does so per function).

Compilers can sometimes overly local variables on non-reentrant
functions as an optimization, but that will only work for some cases;
often it would require link-timeoptimization, which is not that common
in compilers for small µCs.

On the contrary: it's precisely the compilers for such stack-starved architectures (e.g. the 8051) that have been coupling behind-the-scenes static allocation of automatic variables with whole-program overlay
analysis since effectively forever. They really had to, because the alternative would be painful to the point of being unusable.

Well, SDCC when targeting MCS-51 or HC08 would be the combination that I
know a bit about (though personally, I mostly use SDCC to target Z80 or
STM8, which can both use the stack efficiently). SDCC doesn't really
have link-time optimization yet, compilation units are handled
independently. Regarding different compilation units, it can still
overlay the variables of leaf functions - i.e. non-reentrant differnet
that do not call non-reentrant functions, but not much more.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Oct 16 13:48:37 2018

From Newsgroup: comp.arch.embedded

On 16/10/18 09:19, Philipp Klaus Krause wrote:

Am 15.10.2018 um 16:29 schrieb Paul Rubin:

Philipp Klaus Krause <pkk@spth.de> writes:

Compilers can sometimes overly local variables on non-reentrant
functions as an optimization, but that will only work for some cases;
often it would require link-timeoptimization, which is not that common
in compilers for small µCs.

Normally you'd use whole-program optimization, I thought. I don't know
if SDCC supports that, but GCC does, as do the more serious commercial
embedded compilers.

Does GCC support any of these very simple µC architectures?

No.

I thought
anyhting supported by GCC tends to have rather powerful insturction sets
and plenty of registers aynway, so functions could be made reentrant by default without any problems resulting.

Most gcc targets are quite powerful, with plenty of registers - and
re-entrancy is not a problem. Some are a bit weaker, like the 8-bit
AVR, and get inefficient with complicated stack usage. But it does not
support the 8-bit CISC accumulator-based devices that SDCC targets.

While some link-time optimizations are commonly requested features for
SDCC, currently none are supported. In SDCC, even inter-procedural optimizations within the same translation unit are not as powerful as
they should be.
Well, there always is a lot of work to do on SDCC, and there are only a
few volunteers with time to work on it. So SDCC developers priorize
(usually by personal preferences).

Still, when looking at the big picture, SDCC is doing quite well
compared to other compilers for the same architectures (see e.g. http://www.colecovision.eu/stm8/compilers.shtml - comparison from early
2018, around the time of the SDCC 3.7.0 release - current SDCC is 3.8.0).

Philipp

--- Synchronet 3.20a-Linux NewsLink 1.114

From =?UTF-8?Q?Hans-Bernhard_Br=c3=b6ker?=@HBBroeker@t-online.de to comp.arch.embedded on Tue Oct 16 22:52:37 2018

From Newsgroup: comp.arch.embedded

Am 16.10.2018 um 10:00 schrieb Philipp Klaus Krause:

Am 15.10.2018 um 21:22 schrieb Hans-Bernhard Bröker:

Am 15.10.2018 um 10:44 schrieb Philipp Klaus Krause:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

With such small ROM/RAM sizes, who needs reentrant functions ?

Everyone.

Absolutely not. Reentrant functions are a massive nuisance on fully
embedded systems, if only because they routinely make it impossible to
determine the actual stack size usage.

What is the problem?

The major part of it is that I mixed up Reentrance with Recursion there
... sorry for that.

OTOH, one does tend to influence the other. Without recursion, one
would only really need reentrance to be able to call the same function
from separate threads of execution. On controllers this small, that
would only happen if you're calling the same function from inside an
interrupt handler and the main loop. And frankly: you really don't want
to do that. If an ISR on this kind of hardware becomes big enough you
feel the need to split it into sub-functions, that almost certainly
means you've picked entirely the wrong tool for the job.

In other words: for this kind of system (very small, with rotten
stack-based addressing), not only doesn't everyone need re-entrant
functions, it's more like nobody does.

On the contrary: it's precisely the compilers for such stack-starved
architectures (e.g. the 8051) that have been coupling behind-the-scenes
static allocation of automatic variables with whole-program overlay
analysis since effectively forever. They really had to, because the
alternative would be painful to the point of being unusable.

Well, SDCC when targeting MCS-51 or HC08 would be the combination that I
know a bit about

I don't think anyone has ever seriously claimed SDCC to be anywhere near
the pinnacle of compiler design for the 8051. ;-P

Frankly, just looking at statements in this thread has me thinking that
the usual suspects among commercial offerings from 20 years ago might
still run circles around it today.

SDCC doesn't really have link-time optimization yet, compilation
units are handled independently.

Well, given the gigantic scale differences between the target hardware
and the build host, just turning the whole thing into a single
compilation unit (by force, if necessary) should really be a no-brainer.
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Tue Oct 16 14:01:43 2018

From Newsgroup: comp.arch.embedded

On Tuesday, October 16, 2018 at 4:52:44 PM UTC-4, Hans-Bernhard Bröker wrote:

Am 16.10.2018 um 10:00 schrieb Philipp Klaus Krause:

Am 15.10.2018 um 21:22 schrieb Hans-Bernhard Bröker:

Am 15.10.2018 um 10:44 schrieb Philipp Klaus Krause:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

With such small ROM/RAM sizes, who needs reentrant functions ?

Everyone.

Absolutely not. Reentrant functions are a massive nuisance on fully
embedded systems, if only because they routinely make it impossible to
determine the actual stack size usage.

What is the problem?

The major part of it is that I mixed up Reentrance with Recursion there
... sorry for that.

OTOH, one does tend to influence the other. Without recursion, one
would only really need reentrance to be able to call the same function
from separate threads of execution. On controllers this small, that
would only happen if you're calling the same function from inside an interrupt handler and the main loop.

I don't believe this is correct. Reentrance is a problem any time a routine is entered again before it is exited from a prior call. This can happen without multiple threads when a routine is called from a routine that was ultimately called from within the routine. I suppose you might consider this to be recursion, but my point is this can happen without the intent of using recursion.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From =?UTF-8?Q?Hans-Bernhard_Br=c3=b6ker?=@HBBroeker@t-online.de to comp.arch.embedded on Wed Oct 17 00:03:51 2018

From Newsgroup: comp.arch.embedded

Am 16.10.2018 um 23:01 schrieb gnuarm.deletethisbit@gmail.com:

On Tuesday, October 16, 2018 at 4:52:44 PM UTC-4, Hans-Bernhard
Bröker wrote:

OTOH, one does tend to influence the other. Without recursion,
one would only really need reentrance to be able to call the same
function from separate threads of execution. On controllers this
small, that would only happen if you're calling the same function
from inside an interrupt handler and the main loop.

I don't believe this is correct. Reentrance is a problem any time a
routine is entered again before it is exited from a prior call. This
can happen without multiple threads when a routine is called from a
routine that was ultimately called from within the routine. I
suppose you might consider this to be recursion,

Oh, there's no doubt about it: that's recursion all right.

Some might prefer to qualify it as indirect recursion, a.k.a. a loop in
the call graph, but it's still recursion.

but my point is this
can happen without the intent of using recursion.

I'll asume we agree on this: unintended recursion is clear a bug in the
code, every time.

That could arguably be classified an actual benefit of using a such a stack-starved CPU architecture: any competent C compiler for it will
have to perform call tree analysis anyway, so it finds that particular
bug "en passant".

More typical C toolchains relying on stack-centric calling conventions
might not bother with such analysis, and thus won't see the bug. Until
you use the accompanying stack size calculation tool, that is, which
will barf.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Clifford Heath@no.spam@please.net to comp.arch.embedded on Wed Oct 17 09:22:58 2018

From Newsgroup: comp.arch.embedded

On 17/10/18 09:03, Hans-Bernhard Bröker wrote:

Am 16.10.2018 um 23:01 schrieb gnuarm.deletethisbit@gmail.com:
Reentrance is a problem any time a

routine is entered again before it is exited from a prior call. This
can happen without multiple threads when a routine is called from a
routine that was ultimately called from within the routine. I
suppose you might consider this to be recursion,

Oh, there's no doubt about it: that's recursion all right.
Some might prefer to qualify it as indirect recursion, a.k.a. a loop in
the call graph, but it's still recursion.

We call it mutual recursion.
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Tue Oct 16 15:29:29 2018

From Newsgroup: comp.arch.embedded

On Tuesday, October 16, 2018 at 6:03:55 PM UTC-4, Hans-Bernhard Bröker wrote:

Am 16.10.2018 um 23:01 schrieb gnuarm.deletethisbit@gmail.com:

On Tuesday, October 16, 2018 at 4:52:44 PM UTC-4, Hans-Bernhard
Bröker wrote:

OTOH, one does tend to influence the other. Without recursion,
one would only really need reentrance to be able to call the same
function from separate threads of execution. On controllers this
small, that would only happen if you're calling the same function
from inside an interrupt handler and the main loop.

I don't believe this is correct. Reentrance is a problem any time a routine is entered again before it is exited from a prior call. This
can happen without multiple threads when a routine is called from a
routine that was ultimately called from within the routine. I
suppose you might consider this to be recursion,

Oh, there's no doubt about it: that's recursion all right.

Some might prefer to qualify it as indirect recursion, a.k.a. a loop in
the call graph, but it's still recursion.

but my point is this
can happen without the intent of using recursion.

I'll asume we agree on this: unintended recursion is clear a bug in the
code, every time.

Clearly there would be a bug, but it is just as much that the routine wasn't designed for recursion and that would be the most likely fix.

That could arguably be classified an actual benefit of using a such a stack-starved CPU architecture: any competent C compiler for it will
have to perform call tree analysis anyway, so it finds that particular
bug "en passant".

Are you swearing at me in French? ;)

More typical C toolchains relying on stack-centric calling conventions
might not bother with such analysis, and thus won't see the bug. Until
you use the accompanying stack size calculation tool, that is, which
will barf.

Yeah, I'm not much of a C programmer, so I wouldn't know about such tools. What made me think of this is a problem often encountered by novices in Forth. Some system words use globally static data and can be called twice from different code before the first call has ended use of the data structure. Not quite the same thing as recursion, but the same result.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Oct 17 00:46:57 2018

From Newsgroup: comp.arch.embedded

On 17/10/2018 00:03, Hans-Bernhard Bröker wrote:

I'll asume we agree on this: unintended recursion is clear a bug in the
code, every time.

I think we can agree that /any/ unintended action is a clear bug in the
code!

But recursion or re-entrancy without a clear purpose and careful limits
on depths is a bug in the /design/, not just the code.

When I am faced with someone else's code to examine or maintain, I often
run it through Doxygen with "generate documentation for /everything/ -
caller graphs, callee graphs, cross-linked source, etc." It can make it
quick to jump around in the code. And recursive (or re-entrant,
whichever you prefer) code stands out like a sore thumb, as long as the
code is single-threaded - you get loops in the call graphs.

The only other case is if interrupts call other functions - that won't
be seen so easily.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Wed Oct 17 08:23:57 2018

From Newsgroup: comp.arch.embedded

Am 16.10.2018 um 22:52 schrieb Hans-Bernhard Bröker:

Without recursion, one
would only really need reentrance to be able to call the same function
from separate threads of execution. On controllers this small, that
would only happen if you're calling the same function from inside an interrupt handler and the main loop. And frankly: you really don't want
to do that. If an ISR on this kind of hardware becomes big enough you
feel the need to split it into sub-functions, that almost certainly
means you've picked entirely the wrong tool for the job.

In other words: for this kind of system (very small, with rotten
stack-based addressing), not only doesn't everyone need re-entrant
functions, it's more like nobody does.

Multithreading matters here. It is not common on such small devices, but
this one is an exception: Padauk sells multiple dual-core variants of
this controller and one 8-core variant.
And there is always the support functions the compilers tend to need on
small systems (while I assume people would think twice before using an expensive division in an interrupt handler, the situation looks
different for multithreading).

I don't think anyone has ever seriously claimed SDCC to be anywhere near
the pinnacle of compiler design for the 8051. ;-P

Frankly, just looking at statements in this thread has me thinking that
the usual suspects among commercial offerings from 20 years ago might
still run circles around it today.

I don't know of a current comparison for the MCS-51.

For MCS-51, I do not know of a good compiler comparison; I did some
benchmarks a while ago (https://sourceforge.net/p/sdcc/mailman/message/36359114/), and SDCC
still has a bit of catching-up to do.

On the other hand, for the STM8, SDCC seems to be doing more than just
okay: http://www.colecovision.eu/stm8/compilers.shtml

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Wed Oct 17 09:35:45 2018

From Newsgroup: comp.arch.embedded

On 18-10-17 01:46 , David Brown wrote:
...

When I am faced with someone else's code to examine or maintain, I often
run it through Doxygen with "generate documentation for /everything/ -
caller graphs, callee graphs, cross-linked source, etc." It can make it quick to jump around in the code. And recursive (or re-entrant,
whichever you prefer) code stands out like a sore thumb, as long as the
code is single-threaded - you get loops in the call graphs.

Anecdote: some years ago, when I was applying a WCET analysis tool to
someone else's program, the tool found recursion. This surprised the
people I was working with, because they had generated call graphs for
the program, analysed them visually, and found no recursive, looping paths.

Turned out that they had asked the call-graph tool to optimize the size
of the window used to display the call-graphs. The tool did as it was
told, with the result that the line segments on the path for the
recursive call went down to the bottom edge of the diagram, then
*merged* with the lower border line of the diagram, followed that lower border, went up one side of the diagram -- still merged with the border
line -- and then reentered the diagram to point at the source of the
recursive call, effectively making the loop very hard to see...

(It turned out that this recursion was intentional. At this point, the
program was sending an alarm message, but the alarm buffer was full, so
the alarm routine called itself to send an alarm about the full buffer
-- and that worked, because one buffer slot was reserved, by design, for
this "buffer full" alarm.)
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Oct 17 09:31:17 2018

From Newsgroup: comp.arch.embedded

On 17/10/18 08:35, Niklas Holsti wrote:

On 18-10-17 01:46 , David Brown wrote:
...

When I am faced with someone else's code to examine or maintain, I often
run it through Doxygen with "generate documentation for /everything/ -
caller graphs, callee graphs, cross-linked source, etc." It can make it
quick to jump around in the code. And recursive (or re-entrant,
whichever you prefer) code stands out like a sore thumb, as long as the
code is single-threaded - you get loops in the call graphs.

Anecdote: some years ago, when I was applying a WCET analysis tool to
someone else's program, the tool found recursion. This surprised the
people I was working with, because they had generated call graphs for
the program, analysed them visually, and found no recursive, looping paths.

Turned out that they had asked the call-graph tool to optimize the size
of the window used to display the call-graphs. The tool did as it was
told, with the result that the line segments on the path for the
recursive call went down to the bottom edge of the diagram, then
*merged* with the lower border line of the diagram, followed that lower border, went up one side of the diagram -- still merged with the border
line -- and then reentered the diagram to point at the source of the recursive call, effectively making the loop very hard to see...

Visual tools are helpful, but don't show everything!

On the other hand, they can show things that can be hard to quantify in
more rigorous tools. It is easy to look at the call graph of a function
and say "that function is a bowl of spaghetti, and needs restructured" -
it's harder to define rules or limits for an automatic checker that make
such judgements.

(It turned out that this recursion was intentional. At this point, the program was sending an alarm message, but the alarm buffer was full, so
the alarm routine called itself to send an alarm about the full buffer
-- and that worked, because one buffer slot was reserved, by design, for
this "buffer full" alarm.)

I can appreciate the purpose here, but I would rather have this:

static bool putAlarmInLog(alarmPtr slot, ...) { ... }

static alarmSlot_t alarmSlots[maxAlarmSlots];
static alarmSlot_t emergencyAlarmSlot;

static alarmPtr findFreeAlarmSlot(void) { ... }

void logAlarm(...) {
alarmPtr = findFreeAlarmSlot();
if (alarmPtr) {
putAlarmInLog(alarmPtr, ...);
} else {
putAlarmInLog(&emergencyAlarmSlot, "Buffer full");
}
}

Hoist the condition checks up a step, and put the actual storage
mechanism down a step, and you no longer have the re-entrancy. The code
is a lot easier to write, read, analyse and test.

--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Wed Oct 17 07:08:30 2018

From Newsgroup: comp.arch.embedded

On Wednesday, October 17, 2018 at 2:35:46 AM UTC-4, Niklas Holsti wrote:

On 18-10-17 01:46 , David Brown wrote:
...

When I am faced with someone else's code to examine or maintain, I often run it through Doxygen with "generate documentation for /everything/ - caller graphs, callee graphs, cross-linked source, etc." It can make it quick to jump around in the code. And recursive (or re-entrant,
whichever you prefer) code stands out like a sore thumb, as long as the code is single-threaded - you get loops in the call graphs.

Anecdote: some years ago, when I was applying a WCET analysis tool to someone else's program, the tool found recursion. This surprised the
people I was working with, because they had generated call graphs for
the program, analysed them visually, and found no recursive, looping paths.

Turned out that they had asked the call-graph tool to optimize the size
of the window used to display the call-graphs. The tool did as it was
told, with the result that the line segments on the path for the
recursive call went down to the bottom edge of the diagram, then
*merged* with the lower border line of the diagram, followed that lower border, went up one side of the diagram -- still merged with the border
line -- and then reentered the diagram to point at the source of the recursive call, effectively making the loop very hard to see...

(It turned out that this recursion was intentional. At this point, the program was sending an alarm message, but the alarm buffer was full, so
the alarm routine called itself to send an alarm about the full buffer
-- and that worked, because one buffer slot was reserved, by design, for this "buffer full" alarm.)

Seems to me what actually failed was that they knew they had recursion in the design but didn't realize the fact that they didn't see the recursion in the call graphs was an error that should have been caught.

Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Wed Oct 17 11:31:27 2018

From Newsgroup: comp.arch.embedded

On Tue, 16 Oct 2018 22:52:37 +0200, Hans-Bernhard Bröker <HBBroeker@t-online.de> wrote:

The major part of it is that I mixed up Reentrance with Recursion there
... sorry for that.

OTOH, one does tend to influence the other. Without recursion, one
would only really need reentrance to be able to call the same function
from separate threads of execution.

Recursion simply is a particlar useage of re-entrance: a function
calling itself (possibly indirectly through other functions).

You can use re-entrant functions without using recursion, but you
can't recurse without re-entrant functions.

Of course, re-entrance doesn't depend on a CPU stack ... it requires
only that the local variables of each instance be kept separate. That
can be done with auxiliary data structures.
[It's interesting to watch programming students reinvent recursion - accidentally, or as an exercise - and realize all the effort saved by
having it built into the language.]

George

--- Synchronet 3.20a-Linux NewsLink 1.114

From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Wed Oct 17 18:37:14 2018

From Newsgroup: comp.arch.embedded

On 18-10-17 17:08 , gnuarm.deletethisbit@gmail.com wrote:

On Wednesday, October 17, 2018 at 2:35:46 AM UTC-4, Niklas Holsti
wrote:

On 18-10-17 01:46 , David Brown wrote: ...

When I am faced with someone else's code to examine or maintain,
I often run it through Doxygen with "generate documentation for
/everything/ - caller graphs, callee graphs, cross-linked source,
etc." It can make it quick to jump around in the code. And
recursive (or re-entrant, whichever you prefer) code stands out
like a sore thumb, as long as the code is single-threaded - you
get loops in the call graphs.

Anecdote: some years ago, when I was applying a WCET analysis tool
to someone else's program, the tool found recursion. This surprised
the people I was working with, because they had generated call
graphs for the program, analysed them visually, and found no
recursive, looping paths.

Turned out that they had asked the call-graph tool to optimize the
size of the window used to display the call-graphs. The tool did as
it was told, with the result that the line segments on the path for
the recursive call went down to the bottom edge of the diagram,
then *merged* with the lower border line of the diagram, followed
that lower border, went up one side of the diagram -- still merged
with the border line -- and then reentered the diagram to point at
the source of the recursive call, effectively making the loop very
hard to see...

(It turned out that this recursion was intentional. At this point,
the program was sending an alarm message, but the alarm buffer was
full, so the alarm routine called itself to send an alarm about the
full buffer -- and that worked, because one buffer slot was
reserved, by design, for this "buffer full" alarm.)

Seems to me what actually failed was that they knew they had
recursion in the design but didn't realize the fact that they didn't
see the recursion in the call graphs was an error that should have
been caught.

The guys creating and viewing the call-graphs were not the designers of
the program, either, so they didn't know, but for sure it was something
they should have discovered and remarked on as part of their work.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Wed Oct 17 19:43:09 2018

From Newsgroup: comp.arch.embedded

On Wed, 17 Oct 2018 08:23:57 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 16.10.2018 um 22:52 schrieb Hans-Bernhard Br�ker:

Without recursion, one
would only really need reentrance to be able to call the same function
from separate threads of execution. On controllers this small, that
would only happen if you're calling the same function from inside an
interrupt handler and the main loop. And frankly: you really don't want
to do that. If an ISR on this kind of hardware becomes big enough you
feel the need to split it into sub-functions, that almost certainly
means you've picked entirely the wrong tool for the job.

In other words: for this kind of system (very small, with rotten
stack-based addressing), not only doesn't everyone need re-entrant
functions, it's more like nobody does.

Multithreading matters here. It is not common on such small devices, but
this one is an exception: Padauk sells multiple dual-core variants of
this controller and one 8-core variant.

While I have been playing around with the idea of making some RTOS for
such 1kW/64B machine (realistically supporting 2-3 tasks such as a foregroud/bacground monitor) realistically having 2 or 8 thread is no
very realistic, even if the hardware supports it.

The 8 core version might be usable for xCore style "pseudo-interupts"
running a single DSP sample or PLC loop at a time. This would require
8 input pins, each starting its own thread.

But of course, the same rules should apply to pseudo-interrupts as
real interrupts regarding re-entrancy etc.

--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Wed Oct 17 10:04:33 2018

From Newsgroup: comp.arch.embedded

On Wednesday, October 17, 2018 at 11:37:14 AM UTC-4, Niklas Holsti wrote:

On 18-10-17 17:08 , gnuarm.deletethisbit@gmail.com wrote:

On Wednesday, October 17, 2018 at 2:35:46 AM UTC-4, Niklas Holsti
wrote:

On 18-10-17 01:46 , David Brown wrote: ...

When I am faced with someone else's code to examine or maintain,
I often run it through Doxygen with "generate documentation for
/everything/ - caller graphs, callee graphs, cross-linked source,
etc." It can make it quick to jump around in the code. And
recursive (or re-entrant, whichever you prefer) code stands out
like a sore thumb, as long as the code is single-threaded - you
get loops in the call graphs.

Anecdote: some years ago, when I was applying a WCET analysis tool
to someone else's program, the tool found recursion. This surprised
the people I was working with, because they had generated call
graphs for the program, analysed them visually, and found no
recursive, looping paths.

Turned out that they had asked the call-graph tool to optimize the
size of the window used to display the call-graphs. The tool did as
it was told, with the result that the line segments on the path for
the recursive call went down to the bottom edge of the diagram,
then *merged* with the lower border line of the diagram, followed
that lower border, went up one side of the diagram -- still merged
with the border line -- and then reentered the diagram to point at
the source of the recursive call, effectively making the loop very
hard to see...

(It turned out that this recursion was intentional. At this point,
the program was sending an alarm message, but the alarm buffer was
full, so the alarm routine called itself to send an alarm about the
full buffer -- and that worked, because one buffer slot was
reserved, by design, for this "buffer full" alarm.)

Seems to me what actually failed was that they knew they had
recursion in the design but didn't realize the fact that they didn't
see the recursion in the call graphs was an error that should have
been caught.

The guys creating and viewing the call-graphs were not the designers of
the program, either, so they didn't know, but for sure it was something
they should have discovered and remarked on as part of their work.

Do you know the intended purpose of the call graphs? It seems to me that it would be to match expectations to what was coded. It shouldn't matter who was doing the evaluation, there should have been an accounting of expectations regarding the presence and/or absence of recursion.
Much like a check list, it doesn't just assure the presence of everything on the list, it can be used to verify the absence of anything not on the list.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Wed Oct 17 20:24:29 2018

From Newsgroup: comp.arch.embedded

Am 17.10.2018 um 18:43 schrieb upsidedown@downunder.com:

While I have been playing around with the idea of making some RTOS for
such 1kW/64B machine (realistically supporting 2-3 tasks such as a foregroud/bacground monitor) realistically having 2 or 8 thread is no
very realistic, even if the hardware supports it.

The 8 core version might be usable for xCore style "pseudo-interupts"
running a single DSP sample or PLC loop at a time. This would require
8 input pins, each starting its own thread.

But of course, the same rules should apply to pseudo-interrupts as
real interrupts regarding re-entrancy etc.

Since the Padauk doesn't have much in term of integrated peripherals,
there is another use for hardware threads: Have each thread do some I/O protocol (I²C, UART, etc) in software.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Wed Oct 17 23:07:12 2018

From Newsgroup: comp.arch.embedded

On 18-10-17 20:04 , gnuarm.deletethisbit@gmail.com wrote:

On Wednesday, October 17, 2018 at 11:37:14 AM UTC-4, Niklas Holsti
wrote:

On 18-10-17 17:08 , gnuarm.deletethisbit@gmail.com wrote:

On Wednesday, October 17, 2018 at 2:35:46 AM UTC-4, Niklas
Holsti wrote:

On 18-10-17 01:46 , David Brown wrote: ...

When I am faced with someone else's code to examine or
maintain, I often run it through Doxygen with "generate
documentation for /everything/ - caller graphs, callee
graphs, cross-linked source, etc." It can make it quick to
jump around in the code. And recursive (or re-entrant,
whichever you prefer) code stands out like a sore thumb, as
long as the code is single-threaded - you get loops in the
call graphs.

Anecdote: some years ago, when I was applying a WCET analysis
tool to someone else's program, the tool found recursion. This
surprised the people I was working with, because they had
generated call graphs for the program, analysed them visually,
and found no recursive, looping paths.

Turned out that they had asked the call-graph tool to optimize
the size of the window used to display the call-graphs. The
tool did as it was told, with the result that the line segments
on the path for the recursive call went down to the bottom edge
of the diagram, then *merged* with the lower border line of the
diagram, followed that lower border, went up one side of the
diagram -- still merged with the border line -- and then
reentered the diagram to point at the source of the recursive
call, effectively making the loop very hard to see...

(It turned out that this recursion was intentional. At this
point, the program was sending an alarm message, but the alarm
buffer was full, so the alarm routine called itself to send an
alarm about the full buffer -- and that worked, because one
buffer slot was reserved, by design, for this "buffer full"
alarm.)

Seems to me what actually failed was that they knew they had
recursion in the design but didn't realize the fact that they
didn't see the recursion in the call graphs was an error that
should have been caught.

The guys creating and viewing the call-graphs were not the
designers of the program, either, so they didn't know, but for sure
it was something they should have discovered and remarked on as
part of their work.

Do you know the intended purpose of the call graphs?

IIRC they were doing independent SW verification & validation of the
program (and the WCET analysis was also a part of that). But it was many
years ago, and I don't remember the details well enough to say much
more, nor can I say why the program was recursive in this way, or if it
could as easily have been made non-recursive.
--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sun Oct 21 16:27:31 2018

From Newsgroup: comp.arch.embedded

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I�C, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just
an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
I like that it's in a 6-pin SOT23 package since there aren't many other
MCUs that small.

Slightly OT, but I have often wonder how primitive a computer
architecture can be and still do some useful work. In the
tube/discrete/SSI times, there were quite a lot 1 bit processors.
There were at least two types, the PLC (programmable Logic Controller)
type replacing relay logic. These had typically at least AND, OR, NOT,
(XOR) instructions.The other group was used as truly serial computers
with the same instructions as the PLC but also at least a 1 bit SUB
(and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions.

One that immediately comes in mind is the MC14500B PLC building block,
from the 1970's, which requires quite lot of support chips (code
memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA
(Serial Boolean Analyser)
http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four
banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
For the re-entrance enthusiasts, it contains stack pointer relative
addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

--- Synchronet 3.20a-Linux NewsLink 1.114

From jim.brakefield@jim.brakefield@ieee.org to comp.arch.embedded on Sun Oct 21 07:47:21 2018

From Newsgroup: comp.arch.embedded

On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I涎, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just
an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
I like that it's in a 6-pin SOT23 package since there aren't many other >MCUs that small.

Slightly OT, but I have often wonder how primitive a computer
architecture can be and still do some useful work. In the
tube/discrete/SSI times, there were quite a lot 1 bit processors.
There were at least two types, the PLC (programmable Logic Controller)
type replacing relay logic. These had typically at least AND, OR, NOT,
(XOR) instructions.The other group was used as truly serial computers
with the same instructions as the PLC but also at least a 1 bit SUB
(and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions.

One that immediately comes in mind is the MC14500B PLC building block,
from the 1970's, which requires quite lot of support chips (code
memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA
(Serial Boolean Analyser)
http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four
banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose
(Logic Emulation Machine) https://opencores.org/project/lem1_9min
Jim Brakefield
--- Synchronet 3.20a-Linux NewsLink 1.114

From Phil Martel@pomartel@comcast.net to comp.arch.embedded on Sun Oct 21 11:03:18 2018

From Newsgroup: comp.arch.embedded

On 10/21/2018 09:27, upsidedown@downunder.com wrote:

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I²C, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just
an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram,
enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
I like that it's in a 6-pin SOT23 package since there aren't many other
MCUs that small.

Slightly OT, but I have often wonder how primitive a computer
architecture can be and still do some useful work. In the
tube/discrete/SSI times, there were quite a lot 1 bit processors.
There were at least two types, the PLC (programmable Logic Controller)
type replacing relay logic. These had typically at least AND, OR, NOT,
(XOR) instructions.The other group was used as truly serial computers
with the same instructions as the PLC but also at least a 1 bit SUB
(and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions.

One that immediately comes in mind is the MC14500B PLC building block,
from the 1970's, which requires quite lot of support chips (code
memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA
(Serial Boolean Analyser)
http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four
banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

I have a memory of a 1-bit GPU from the late 70's, but can't pin it
down. There is an article on Wikipedia https://en.wikipedia.org/wiki/1-bit_architecture
--
Best wishes,
--Phil
pomartel At Comcast(ignore_this) dot net
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sun Oct 21 08:08:02 2018

From Newsgroup: comp.arch.embedded

On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:

On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I涎, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just
an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
I like that it's in a 6-pin SOT23 package since there aren't many other >MCUs that small.

Slightly OT, but I have often wonder how primitive a computer
architecture can be and still do some useful work. In the
tube/discrete/SSI times, there were quite a lot 1 bit processors.
There were at least two types, the PLC (programmable Logic Controller)
type replacing relay logic. These had typically at least AND, OR, NOT, (XOR) instructions.The other group was used as truly serial computers
with the same instructions as the PLC but also at least a 1 bit SUB
(and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions.

One that immediately comes in mind is the MC14500B PLC building block,
from the 1970's, which requires quite lot of support chips (code
memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA
(Serial Boolean Analyser)
http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four
banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose (Logic Emulation Machine) https://opencores.org/project/lem1_9min

Jim Brakefield

It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.
I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From jim.brakefield@jim.brakefield@ieee.org to comp.arch.embedded on Sun Oct 21 09:31:29 2018

From Newsgroup: comp.arch.embedded

On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote:

On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:

On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I涎, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just >an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >enough for plenty of MCU things. Didn't check if it has an ADC or PWM. >I like that it's in a 6-pin SOT23 package since there aren't many other >MCUs that small.

Slightly OT, but I have often wonder how primitive a computer architecture can be and still do some useful work. In the tube/discrete/SSI times, there were quite a lot 1 bit processors.
There were at least two types, the PLC (programmable Logic Controller) type replacing relay logic. These had typically at least AND, OR, NOT, (XOR) instructions.The other group was used as truly serial computers with the same instructions as the PLC but also at least a 1 bit SUB
(and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions.

One that immediately comes in mind is the MC14500B PLC building block, from the 1970's, which requires quite lot of support chips (code
memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA
(Serial Boolean Analyser)
http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four
banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose (Logic Emulation Machine) https://opencores.org/project/lem1_9min

Jim Brakefield

It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs.

Rick C.

It's hard to picture an application where you couldn't spare a few hundred LUTs.

There are advantages to using several soft core processors, each sized and customized to the need.

I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

There are many under 600 LUTs, including 32-bit. Had hoped the full featured LEM design would be under 100 LUTs.
Have done some rough research of whats available for under 600 LUTs: https://opencores.org/project/up_core_list/downloads
select: "By Performance Metric"
A big rational for small soft core processors is that they replace LUTs (slow speed logic) with block RAM (instructions). And they are completely deterministic as opposed to doing the same by time slicing a ASIC (ARM) processor.
Jim Brakefield
--- Synchronet 3.20a-Linux NewsLink 1.114

From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sun Oct 21 10:51:29 2018

From Newsgroup: comp.arch.embedded

On Sunday, October 21, 2018 at 12:31:34 PM UTC-4, jim.bra...@ieee.org wrote:

On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote:

On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:

On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin <no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I涎, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just >an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram,
enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
I like that it's in a 6-pin SOT23 package since there aren't many other
MCUs that small.

Slightly OT, but I have often wonder how primitive a computer architecture can be and still do some useful work. In the tube/discrete/SSI times, there were quite a lot 1 bit processors.
There were at least two types, the PLC (programmable Logic Controller) type replacing relay logic. These had typically at least AND, OR, NOT, (XOR) instructions.The other group was used as truly serial computers with the same instructions as the PLC but also at least a 1 bit SUB (and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions.

One that immediately comes in mind is the MC14500B PLC building block, from the 1970's, which requires quite lot of support chips (code memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA (Serial Boolean Analyser)
http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

Anyone seen more modern 1 bit chips either for relay replacement or for truly serial computers ?

LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose (Logic Emulation Machine) https://opencores.org/project/lem1_9min

Jim Brakefield

It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs.

Rick C.

It's hard to picture an application where you couldn't spare a few hundred LUTs.

There are advantages to using several soft core processors, each sized and customized to the need.

I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

There are many under 600 LUTs, including 32-bit. Had hoped the full featured LEM design would be under 100 LUTs.
Have done some rough research of whats available for under 600 LUTs: https://opencores.org/project/up_core_list/downloads
select: "By Performance Metric"

A big rational for small soft core processors is that they replace LUTs (slow speed logic) with block RAM (instructions). And they are completely deterministic as opposed to doing the same by time slicing a ASIC (ARM) processor.

I won't argue a bit that softcores and especially *customizable* softcore CPUs aren't useful. I was talking about there being at best a very tiny region of utility for 1-bit processors.
My 600 LUT processor didn't trade off much for performance. It would run pretty fast and was pretty capable. In addition the word size was independent of the instruction set. That said, there are apps where a much less powerful processor would do fine and saving a few more LUTs would be useful.
Rick C.
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 21 21:43:43 2018

From Newsgroup: comp.arch.embedded

On 21/10/2018 17:08, gnuarm.deletethisbit@gmail.com wrote:

It is hard for me to imagine applications where a 1 bit processor
would be useful. A useful N bit processor can be built in a small
number of LUTs. I've built a 16 bit processor in just 600 LUTs and
I've seen processors in a bit less.

I discussed this with someone once and he imagined apps where the
processing speed requirement was quite low and you can save LUTs with
a bit serial processor. I just don't know how many or why it would
matter. Even the smallest FPGAs have thousands of LUTs. It's hard
to picture an application where you couldn't spare a few hundred
LUTs.

There is not much point in 1-bit processing with modern architectures
and FPGAs. But it used to be more useful, for cheap and scalable
solutions. You got systems that scaled in parallel, using bit-slice processors to make cpus as wide as you want. And you got serial
scaling, giving you practical numbers of bits with minimal die area
(like the COP8 microcontrollers).

--- Synchronet 3.20a-Linux NewsLink 1.114

From jim.brakefield@jim.brakefield@ieee.org to comp.arch.embedded on Sun Oct 21 12:44:39 2018

From Newsgroup: comp.arch.embedded

On Sunday, October 21, 2018 at 12:51:34 PM UTC-5, gnuarm.del...@gmail.com wrote:

On Sunday, October 21, 2018 at 12:31:34 PM UTC-4, jim.bra...@ieee.org wrote:

On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote:

On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:

On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin <no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I涎, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just
an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram,
enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
I like that it's in a 6-pin SOT23 package since there aren't many other
MCUs that small.

Slightly OT, but I have often wonder how primitive a computer architecture can be and still do some useful work. In the tube/discrete/SSI times, there were quite a lot 1 bit processors. There were at least two types, the PLC (programmable Logic Controller)
type replacing relay logic. These had typically at least AND, OR, NOT,
(XOR) instructions.The other group was used as truly serial computers with the same instructions as the PLC but also at least a 1 bit SUB (and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions.

One that immediately comes in mind is the MC14500B PLC building block,
from the 1970's, which requires quite lot of support chips (code memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA (Serial Boolean Analyser) http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package. For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC
environment.

Anyone seen more modern 1 bit chips either for relay replacement or for truly serial computers ?

Anyone seen more modern 1 bit chips either for relay replacement or for truly serial computers ?

LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose
(Logic Emulation Machine) https://opencores.org/project/lem1_9min

Jim Brakefield

It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs.

Rick C.

It's hard to picture an application where you couldn't spare a few hundred LUTs.

There are advantages to using several soft core processors, each sized and customized to the need.

I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

There are many under 600 LUTs, including 32-bit. Had hoped the full featured LEM design would be under 100 LUTs.
Have done some rough research of whats available for under 600 LUTs: https://opencores.org/project/up_core_list/downloads
select: "By Performance Metric"

A big rational for small soft core processors is that they replace LUTs (slow speed logic) with block RAM (instructions). And they are completely deterministic as opposed to doing the same by time slicing a ASIC (ARM) processor.

I won't argue a bit that softcores and especially *customizable* softcore CPUs aren't useful. I was talking about there being at best a very tiny region of utility for 1-bit processors.

My 600 LUT processor didn't trade off much for performance. It would run pretty fast and was pretty capable. In addition the word size was independent of the instruction set. That said, there are apps where a much less powerful processor would do fine and saving a few more LUTs would be useful.

Rick C.

there being at best a very tiny region of utility for 1-bit processors

There are a small number of examples:
Bit serial processors such as DEC PDP8L, early vacuum tube & drum machines, for example Bendix G-15.
Bit serial Cordic
Also telling, is that 4-bit processors for calculators have been replaced by 8-bit processors.
My inspiration was EDIF, which was/is output from VHDL & Verilog compilers. E.g. use EDIF as a machine language. In the context of logic simulation, greater FPGA capacity possible for slow logic.
This effort also lead to a theoretical insight for brain modelling: There is greater information content in the wiring than in the logic. The human brain has 2<<36+ neurons requiring 36-bits of information for each connection and only 16 or so bits for the state/configuration of each synapse. Also a FPGA requires 60+ bits to route each LUT input (assuming all LUT inputs in use) whereas each possible input can be specified by 20 bits or less (1M LUT FPGA).
Of course optimizing simulators convert the EDIF to an existing machine language. Likewise for industrial automation (ladder logic, ...).
Jim Brakefield
--- Synchronet 3.20a-Linux NewsLink 1.114

From Brett@ggtgp@yahoo.com to comp.arch.embedded on Mon Oct 22 00:28:51 2018

From Newsgroup: comp.arch.embedded

<jim.brakefield@ieee.org> wrote:

On Sunday, October 21, 2018 at 12:51:34 PM UTC-5, gnuarm.del...@gmail.com wrote:

On Sunday, October 21, 2018 at 12:31:34 PM UTC-4, jim.bra...@ieee.org wrote: >>> On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote:

On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:

On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:

On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
<no.email@nospam.invalid> wrote:

Clifford Heath <no.spam@please.net> writes:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
<http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
OTP, no SPI, UART or I涎, but still...

That is impressive! Seems to be an 8-bit RISC with no registers, just >>>>>>> an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >>>>>>> enough for plenty of MCU things. Didn't check if it has an ADC or PWM. >>>>>>> I like that it's in a 6-pin SOT23 package since there aren't many other >>>>>>> MCUs that small.

Slightly OT, but I have often wonder how primitive a computer
architecture can be and still do some useful work. In the
tube/discrete/SSI times, there were quite a lot 1 bit processors.
There were at least two types, the PLC (programmable Logic Controller) >>>>>> type replacing relay logic. These had typically at least AND, OR, NOT, >>>>>> (XOR) instructions.The other group was used as truly serial computers >>>>>> with the same instructions as the PLC but also at least a 1 bit SUB >>>>>> (and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions. >>>>>>
One that immediately comes in mind is the MC14500B PLC building block, >>>>>> from the 1970's, which requires quite lot of support chips (code
memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA
(Serial Boolean Analyser)
http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four >>>>>> banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package. >>>>>> For the re-entrance enthusiasts, it contains stack pointer relative >>>>>> addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 >>>>>> Darlington buffers may be needed to drive loads typically found in PLC >>>>>> environment.

Anyone seen more modern 1 bit chips either for relay replacement or >>>>>> for truly serial computers ?

Anyone seen more modern 1 bit chips either for relay replacement or >>>>> ]> for truly serial computers ?

LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose >>>>> (Logic Emulation Machine) https://opencores.org/project/lem1_9min

Jim Brakefield

It is hard for me to imagine applications where a 1 bit processor
would be useful. A useful N bit processor can be built in a small
number of LUTs. I've built a 16 bit processor in just 600 LUTs and
I've seen processors in a bit less.

I discussed this with someone once and he imagined apps where the
processing speed requirement was quite low and you can save LUTs with
a bit serial processor. I just don't know how many or why it would
matter. Even the smallest FPGAs have thousands of LUTs. It's hard to >>>> picture an application where you couldn't spare a few hundred LUTs.

Rick C.

It's hard to picture an application where you couldn't spare a few hundred LUTs.

There are advantages to using several soft core processors, each sized
and customized to the need.

I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

There are many under 600 LUTs, including 32-bit. Had hoped the full
featured LEM design would be under 100 LUTs.
Have done some rough research of whats available for under 600 LUTs:
https://opencores.org/project/up_core_list/downloads
select: "By Performance Metric"

A big rational for small soft core processors is that they replace LUTs
(slow speed logic) with block RAM (instructions). And they are
completely deterministic as opposed to doing the same by time slicing a
ASIC (ARM) processor.

I won't argue a bit that softcores and especially *customizable*
softcore CPUs aren't useful. I was talking about there being at best a
very tiny region of utility for 1-bit processors.

My 600 LUT processor didn't trade off much for performance. It would
run pretty fast and was pretty capable. In addition the word size was
independent of the instruction set. That said, there are apps where a
much less powerful processor would do fine and saving a few more LUTs would be useful.

Rick C.

there being at best a very tiny region of utility for 1-bit processors

There are a small number of examples:
Bit serial processors such as DEC PDP8L, early vacuum tube & drum
machines, for example Bendix G-15.
Bit serial Cordic

Also telling, is that 4-bit processors for calculators have been replaced
by 8-bit processors.

My inspiration was EDIF, which was/is output from VHDL & Verilog
compilers. E.g. use EDIF as a machine language. In the context of logic simulation, greater FPGA capacity possible for slow logic.

This effort also lead to a theoretical insight for brain modelling: There
is greater information content in the wiring than in the logic. The
human brain has 2<<36+ neurons requiring 36-bits of information for each connection and only 16 or so bits for the state/configuration of each synapse. Also a FPGA requires 60+ bits to route each LUT input (assuming
all LUT inputs in use) whereas each possible input can be specified by 20 bits or less (1M LUT FPGA).

The clock speed is quite low, 2 Hz?
So the wetware is is not quite impossible to emulate with current tech.
Raising a baby and training the resultant adult to do a task is still many orders of magnitude cheaper.
;)

Of course optimizing simulators convert the EDIF to an existing machine language. Likewise for industrial automation (ladder logic, ...).

Jim Brakefield

--- Synchronet 3.20a-Linux NewsLink 1.114

From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Sun Oct 21 20:59:55 2018

From Newsgroup: comp.arch.embedded

On Sun, 21 Oct 2018 16:27:31 +0300, upsidedown@downunder.com wrote:

Slightly OT, but I have often wonder how primitive a computer
architecture can be and still do some useful work. In the
tube/discrete/SSI times, there were quite a lot 1 bit processors.
There were at least two types, the PLC (programmable Logic Controller)
type replacing relay logic. These had typically at least AND, OR, NOT,
(XOR) instructions.The other group was used as truly serial computers
with the same instructions as the PLC but also at least a 1 bit SUB
(and ADD) instructions to implement all mathematical functions.

However, in the LSI era, there down't seem to be many implement ions.

One that immediately comes in mind is the MC14500B PLC building block,
from the 1970's, which requires quite lot of support chips (code
memory, PC, /O chips) to do some useful work.

After much searching, I found the (NI) National Instruments SBA
(Serial Boolean Analyser)
http://www.wass.net/othermanuals/GI%20SBA.pdf
from the same era, with 1024 word instructions (8 bit) ROM and four
banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
For the re-entrance enthusiasts, it contains stack pointer relative >addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 >Darlington buffers may be needed to drive loads typically found in PLC >environment.

Anyone seen more modern 1 bit chips either for relay replacement or
for truly serial computers ?

Circa 1985-1993, Thinking Machines Connection Machine.
Circa 1987-1996, MasPar MP series.

The CM-1, 2, 2a, and 200 all were SIMD parallel using 1-bit serial
integer-only CPUs. Sizes ranged from 8K CPUs at the low end to 64K
CPUs at the high end. Each CPU had 4K *bits* of private RAM, and the
CPUs were connected in a multidimensional hypercube network.

The CM-2, 2a, and 200 were augmented with 32-bit FPUs (1 per 32 CPUs),
and the 200 featured a higher clock speed.

The MP-1 was SIMD parallel using 4-bit serial integer-only CPUs in
sizes from 1K to 16K CPUs. It also had 32-bit FPUs, but I don't
remember how many / what ratio. I remember that it had an accumulator
register rather than going memory->memory like the CM.

[I can't find much information now about the MP-1 ... unfortunately
MasPar didn't last very long in the marketplace. The Wikipedia
article has some information about the MP-2, but the MP-2 was a later
full 32-bit design, very different from the MP-1.]

My college had both an 8K CM-2 and a 1K MP-1, accessible to those who
took various parallel processing electives. I never got to use the
MP-1 much - it was new at the end of my time and I only ever played
with it a bit. But I spent 2 semesters working with the CM-2.

Even though the CM's clock speed was only ~8MHz, the performance was
amazing IF the problem was a good fit to the architecture. E.g., at
that time, I owned a 66MHz (dx2) i486. Converted for the CM-2
architecture, O(n^4) array processing on the i486 became O(n) on the
CM-2. I had a physics simulation that took over 3 hours on my i486
that ran in ~10 minutes on the CM.

George
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Wed Oct 24 15:57:55 2018

From Newsgroup: comp.arch.embedded

Am 14.10.2018 um 11:55 schrieb Theo:

Tim <cpldcpu+usenet@gmail.com> wrote:

This is quite curious. I wonder

- Has anyone actually received the devices they ordered? The cheaper
variants seem to be sold out.

I think they've sold out since they went viral. EEVblog did a video showing 550 in stock - that's only $16 worth of parts, not hard to imagine they've been bought up.

The other option is they're some kind of EOL part and 3c is the 'reduced to clear' price - which they have done, very successfully.

Theo

They're back in stock, though the price rose by 21% to 0.046$.
Also, LCSC seems to now be stocking more Padauk parts, including more
dual-core devices. Unfortunately, the programmer seems to be out of
stock, and they have neither the flash nor the DIP variants.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Nov 5 12:41:27 2018

From Newsgroup: comp.arch.embedded

Am 12.10.2018 um 09:44 schrieb David Brown:

On 12/10/18 08:50, Philipp Klaus Krause wrote:

Am 12.10.2018 um 01:08 schrieb Paul Rubin:

upsidedown@downunder.com writes:

There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

Being able to (say) add register to register saves traffic through the
accumulator and therefore instructions.

1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
assembly program listing.

It would be nice to have a C compiler, and registers help with that.

Looking at the instruction set, it should be possible to make a backend
for this in SDCC; the architecture looks more C-friendly than the
existing pic14 and pic16 backends. But it surely isn't as nice as stm8
or z80.
reentrant functions will be inefficent: No registers, and no sp-relative
adressing mode. On would want to reserve a few memory locations as
pseudo-registers to help with that, but that only goes so far.

It looks like the lowest 16 memory addresses could be considered pseudo-registers - they are the ones that can be used for direct memory access rather than needing indirect access.

Considering the multi-core variants of the Padauk µCs:
Those adresses are shared across all cores. Each core only has its own
A, SP, F, PC.
How do we handle local variables?

Option 1: Make functions non-reentrant. Requires duplication of code (we
need per-thread copies of functions), and link-time analysis to ensure
that each thread only calls the function implementation meant for it.
Functions pointers get complicated.

Option 2: Use an inefficient combination of thread-local storage and stack.

Since this is a small µC, we need a lot of support functions, which the compiler inserts (e.g. for multiplication); of course those are affected
by the same problems.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Thu Nov 8 13:53:48 2018

From Newsgroup: comp.arch.embedded

Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com:

On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 10.10.2018 um 03:05 schrieb Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

They even make dual-core variants (the part where the first digit in the
part number is '2'). It seems program counter, stack pointer, flag
register and accumulator are per-core, while the rest, including the ALU
is shared. In particular, the I/O registers are also shared, which means
some multiplier registers would also be - but currently all variants
with integrated multiplier are single-core.
Use of the ALU is shared byt he two cores, alternating by clock cycle.

Philipp

Interesting, that would make it easy to run a multitasking RTOS (foreground/background) monitor, which might justify the use of some reentrant library routines :-). But in reality, the available memory (ROM/RAM) is so small so that you could easily manage this with static
memory allocations.

But static memory allocation would require one copy of each function per thread. And the linker would have to analyze the call graph to always
call the correct function for each thread. Function pointers get
complicated.

Unfortunately, reentrancy becomes even harder with
hardware-multithreading: TO access the stack, one has to construct a
pointer to the stack location in a memory location. That memory location
(as any pseudo-registers) is then shared among all running instances of
the function. So it needs to be protected (e.g. with a spinlock), making
access even more inefficient. And that spinlock will cause issues with interrupts (a solution might be to heavily restrict interrupt routines, essentially allowing not much more than setting some global variables).

The there is the trade-off of using one such memory location per
function vs. per program (the latter reducing memroy usage, but
resulting in less paralellism).

The pseudo-registers one would want to use are not so much a problem for interrupt routines (they would just need saving and thus increase
interrupt overhead a bit), but for hardware parallelism. Essentially all
access to them would again have to be protected by a spinlock.

All these problems could have relatively easily been avoided by
providing an efficient stack-pointer-relative addressing mode. Having a
few general-purpose or index registers would have somewhat helped as well.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tauno Voipio@tauno.voipio@notused.fi.invalid to comp.arch.embedded on Thu Nov 8 15:08:24 2018

From Newsgroup: comp.arch.embedded

On 8.11.18 14:53, Philipp Klaus Krause wrote:

Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com:

On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 10.10.2018 um 03:05 schrieb Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I²C, but still...

Clifford Heath

They even make dual-core variants (the part where the first digit in the >>> part number is '2'). It seems program counter, stack pointer, flag
register and accumulator are per-core, while the rest, including the ALU >>> is shared. In particular, the I/O registers are also shared, which means >>> some multiplier registers would also be - but currently all variants
with integrated multiplier are single-core.
Use of the ALU is shared byt he two cores, alternating by clock cycle.

Philipp

Interesting, that would make it easy to run a multitasking RTOS
(foreground/background) monitor, which might justify the use of some
reentrant library routines :-). But in reality, the available memory
(ROM/RAM) is so small so that you could easily manage this with static
memory allocations.

But static memory allocation would require one copy of each function per thread. And the linker would have to analyze the call graph to always
call the correct function for each thread. Function pointers get
complicated.

Unfortunately, reentrancy becomes even harder with
hardware-multithreading: TO access the stack, one has to construct a
pointer to the stack location in a memory location. That memory location
(as any pseudo-registers) is then shared among all running instances of
the function. So it needs to be protected (e.g. with a spinlock), making access even more inefficient. And that spinlock will cause issues with interrupts (a solution might be to heavily restrict interrupt routines, essentially allowing not much more than setting some global variables).

The there is the trade-off of using one such memory location per
function vs. per program (the latter reducing memroy usage, but
resulting in less paralellism).

The pseudo-registers one would want to use are not so much a problem for interrupt routines (they would just need saving and thus increase
interrupt overhead a bit), but for hardware parallelism. Essentially all access to them would again have to be protected by a spinlock.

All these problems could have relatively easily been avoided by
providing an efficient stack-pointer-relative addressing mode. Having a
few general-purpose or index registers would have somewhat helped as well.

Philipp

And you'll end up with a low-end Cortex ...
--

-TV

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Thu Nov 8 14:34:44 2018

From Newsgroup: comp.arch.embedded

Am 08.11.18 um 14:08 schrieb Tauno Voipio:

And you'll end up with a low-end Cortex ...

A low-end Cortex would still be far heavier than a Padauk variant with
an sp-relative adressing mode or a few registers added.
I think a more multithreading-friendly variant of the Padauk would even
still be simpler than an STM8.
But one could surely create a nice STM8-like (with a few STM8 weaknesses
fixed) processor with hardware multihreading.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Thu Nov 8 21:52:49 2018

From Newsgroup: comp.arch.embedded

On Thu, 8 Nov 2018 13:53:48 +0100, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com:

On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 10.10.2018 um 03:05 schrieb Clifford Heath:

<https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

OTP, no SPI, UART or I�C, but still...

Clifford Heath

They even make dual-core variants (the part where the first digit in the >>> part number is '2'). It seems program counter, stack pointer, flag
register and accumulator are per-core, while the rest, including the ALU >>> is shared. In particular, the I/O registers are also shared, which means >>> some multiplier registers would also be - but currently all variants
with integrated multiplier are single-core.
Use of the ALU is shared byt he two cores, alternating by clock cycle.

Philipp

Interesting, that would make it easy to run a multitasking RTOS
(foreground/background) monitor, which might justify the use of some
reentrant library routines :-). But in reality, the available memory
(ROM/RAM) is so small so that you could easily manage this with static
memory allocations.

But static memory allocation would require one copy of each function per >thread.

For a foreground/background monitor, the worst case would be two
copies of static data, if both threads use the same rubroutine.

And the linker would have to analyze the call graph to always
call the correct function for each thread.

Linker for such small target ?

With such small processor, just track any dependencies manually.

Function pointers get complicated.

Do you really insist of using function pointer with such small
targets?

Unfortunately, reentrancy becomes even harder with
hardware-multithreading:

With two hardware threads, you would need at most two copies of static
data.

TO access the stack, one has to construct a
pointer to the stack location in a memory location.

Why would you want to access the stack ?

The stack is usable for handling return addresses, but I guess that a
hardware thread must have its own return address stack pointer.

In fact many minicomputers from the 1960's did not even have a stack
at all. The calling program just stored the return address in the
first word of the subroutine and the at the end o the subroutine,
performed an indirect jump through the first word of the subroutine to
return to the calling program. Of course, this is not re-entrant and
in those days one did not have to worry about multiple CPUs accessing
the same routines:-).

BTW, who needs a program counter (PC), many microprograms run without
a PC, with the next instruction address stored at the end of the long instruction word :-)

That memory location
(as any pseudo-registers) is then shared among all running instances of
the function. So it needs to be protected (e.g. with a spinlock), making >access even more inefficient. And that spinlock will cause issues with >interrupts (a solution might be to heavily restrict interrupt routines, >essentially allowing not much more than setting some global variables).

Disabling all interrupts for the duration of some critical operations
is often enough, but of course, the number of instructions executed
during interrupt disabled should be minimized. In MACRO-11 assembler,
the standard practice was to start the comment field with a semicolon,
when task switching was disabled with two semicolons and when
interrupt disabled with three semicolons, it was visually easy to
detect when interrupts were disabled and not mess too much with such
code sections.

The there is the trade-off of using one such memory location per
function vs. per program (the latter reducing memroy usage, but
resulting in less paralellism).

The pseudo-registers one would want to use are not so much a problem for >interrupt routines (they would just need saving and thus increase
interrupt overhead a bit), but for hardware parallelism. Essentially all >access to them would again have to be protected by a spinlock.

All these problems could have relatively easily been avoided by
providing an efficient stack-pointer-relative addressing mode. Having a
few general-purpose or index registers would have somewhat helped as well.

Philipp

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Thu Nov 8 21:56:16 2018

From Newsgroup: comp.arch.embedded

Am 08.11.18 um 20:52 schrieb upsidedown@downunder.com:

But static memory allocation would require one copy of each function per
thread.

For a foreground/background monitor, the worst case would be two
copies of static data, if both threads use the same rubroutine.

And the linker would have to analyze the call graph to always
call the correct function for each thread.

Linker for such small target ?

Of course. The support routines the compiler uses reside in some
library, the linker links them in if necessary. Also, the larger
variants are not that small, with up to 256 B of RAM and 8 KB of ROM.
One might want to e.g. have one .c file for handling I²", one for the
soft UART, etc.

With such small processor, just track any dependencies manually.

See above.

Function pointers get complicated.

Do you really insist of using function pointer with such small
targets?

I want to have C, function pointers are part of it.

Unfortunately, reentrancy becomes even harder with
hardware-multithreading:

With two hardware threads, you would need at most two copies of static
data.

Padauk still makes one chip with 8 hardware threads (and it looks to me
as if there were more in the past, though they are not currently listed
on their website, one can find them e.g. in their IDE).

TO access the stack, one has to construct a
pointer to the stack location in a memory location.

Why would you want to access the stack ?

For reentrency, so I can use one function implementation for all
threads. It would also be useful to dynamically assign threads to
hardware threads (so no thread is tied to specific hardware, and some OS schedules them).

The stack is usable for handling return addresses, but I guess that a hardware thread must have its own return address stack pointer.

Each hardware thread has its flag register (4 bits) accumulator (8
bits), pc (12 bits) and stack pointer (8 bits).

That memory location
(as any pseudo-registers) is then shared among all running instances of
the function. So it needs to be protected (e.g. with a spinlock), making
access even more inefficient. And that spinlock will cause issues with
interrupts (a solution might be to heavily restrict interrupt routines,
essentially allowing not much more than setting some global variables).

Disabling all interrupts for the duration of some critical operations
is often enough, but of course, the number of instructions executed
during interrupt disabled should be minimized.

Disabling interrupts any time a spinlock is held or a thread is wating
for one might be too much, especially if there are many threads, so the spinlock is held often.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Fri Nov 9 00:35:55 2018

From Newsgroup: comp.arch.embedded

On Thu, 8 Nov 2018 21:56:16 +0100, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 08.11.18 um 20:52 schrieb upsidedown@downunder.com:

But static memory allocation would require one copy of each function per >>> thread.

For a foreground/background monitor, the worst case would be two
copies of static data, if both threads use the same rubroutine.

And the linker would have to analyze the call graph to always
call the correct function for each thread.

Linker for such small target ?

Of course. The support routines the compiler uses reside in some
library, the linker links them in if necessary. Also, the larger
variants are not that small, with up to 256 B of RAM and 8 KB of ROM.
One might want to e.g. have one .c file for handling I�", one for the
soft UART, etc.

A linker is required, if the libraries are (for copyright reasons)
delivered as binary object code only.

However, if the library are delivered as source files and the compiler/assembler has even a rudimentary #include mechanism, just
include those library files you need. With a include or macro
processor with parameter passing, just invoke same include file or
macro twice with different parameters for different static variable
instances.

Of course, linkers are also needed, if very primitive compilation
machines are used, such as floppy based Intellecs or Exorcisers. It
could take a day to compile a large program all the way from sources,
with multiple floppy changes to get the final absolute file to a
single floppy, ready to be burnt into EPROMS for an additional hour or
two. In such environment compiling, linking and burning only the
source file changed would speed up program development a lot.

When using a modern PC for compilation, there are no such issues.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Fri Nov 9 09:00:41 2018

From Newsgroup: comp.arch.embedded

Am 08.11.18 um 23:35 schrieb upsidedown@downunder.com:

And the linker would have to analyze the call graph to always
call the correct function for each thread.

Linker for such small target ?

Of course. The support routines the compiler uses reside in some
library, the linker links them in if necessary. Also, the larger
variants are not that small, with up to 256 B of RAM and 8 KB of ROM.
One might want to e.g. have one .c file for handling I²", one for the
soft UART, etc.

A linker is required, if the libraries are (for copyright reasons)
delivered as binary object code only.

However, if the library are delivered as source files and the compiler/assembler has even a rudimentary #include mechanism, just
include those library files you need. With a include or macro
processor with parameter passing, just invoke same include file or
macro twice with different parameters for different static variable instances.

Of course, linkers are also needed, if very primitive compilation
machines are used, such as floppy based Intellecs or Exorcisers. It
could take a day to compile a large program all the way from sources,
with multiple floppy changes to get the final absolute file to a
single floppy, ready to be burnt into EPROMS for an additional hour or
two. In such environment compiling, linking and burning only the
source file changed would speed up program development a lot.

When using a modern PC for compilation, there are no such issues.

Separate compilation and then linking is the normal thing to, and a
common workflow for small devices. This is e.g. how most people use
SDCC, a mainstream free compiler targeting various 8-bit architectures.

That doesn't mean it is the only way (and since SDCC does not have
link-time optimization it might not be the optimal way either). But it
is something people use and expect to work reasonably well.

So for anyone designing an architecture it would be wise to not put too
many obstacles into that workflow.

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sun Nov 11 09:27:20 2018

From Newsgroup: comp.arch.embedded

Am 12.10.18 um 22:45 schrieb upsidedown@downunder.com:

On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
wrote:

Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

The real issue would be the small RAM size.

Devices with this architecture go up to 256 B of RAM (but they then cost
a few cent more).

Philipp

Did you find the binary encoding of various instruction formats, i.e
how many bits allocated to the operation code and how many for the
address field ?

My initial guess was that the instruction word is simple 8 bit opcode
+ 8 bit address, but the bit and word address limits for the smaller
models would suggest that for some op-codes, the op-code field might
be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
and word addressing).

It is more complicated. Apparently the encoding changed from a 16-bit instruction word used by older types (https://www.mikrocontroller.net/topic/461002#5616813) to a 14-bit
instruction word used by newer types (https://www.mikrocontroller.net/topic/461002#5616603).

Padauk also dropped and added various instructions at some points (e.g.
ldtabh, ldtabl, mul, pushw, popw).

Philipp
--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Microbot
  Sat Aug 23 00:05:56 2025
  from Moore, Ok via Telnet
- Noozle
  Fri Aug 22 11:07:42 2025
  from Noozle City via Telnet
- Microbot
  Fri Aug 22 01:53:59 2025
  from Moore, Ok via Telnet
- Microbot
  Thu Aug 21 03:21:53 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,064
Nodes:	10 (0 / 10)
Uptime:	148:16:08
Calls:	13,691
Calls today:	1
Files:	186,936
D/L today:	33 files (6,120K bytes)
Messages:	2,410,934

$0.03 microcontroller

Who's Online

Recent Visitors

System Info