• $0.03 microcontroller

    From Clifford Heath@no.spam@please.net to comp.arch.embedded on Wed Oct 10 12:05:23 2018
    From Newsgroup: comp.arch.embedded

    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

    OTP, no SPI, UART or I²C, but still...

    Clifford Heath
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From lasselangwadtchristensen@lasselangwadtchristensen@gmail.com to comp.arch.embedded on Wed Oct 10 16:12:50 2018
    From Newsgroup: comp.arch.embedded

    onsdag den 10. oktober 2018 kl. 03.05.27 UTC+2 skrev Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

    OTP, no SPI, UART or I²C, but still...

    Clifford Heath
    https://youtu.be/VYhAGnsnO7w
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Wed Oct 10 17:51:12 2018
    From Newsgroup: comp.arch.embedded

    On Tuesday, October 9, 2018 at 9:05:27 PM UTC-4, Clifford Heath wrote:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

    OTP, no SPI, UART or I²C, but still...

    Clifford Heath
    Interesting. They have some very off-brand FPGA type devices as well at very low prices, but they still don't do me any favors with the packages.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From lasselangwadtchristensen@lasselangwadtchristensen@gmail.com to comp.arch.embedded on Wed Oct 10 18:56:13 2018
    From Newsgroup: comp.arch.embedded

    torsdag den 11. oktober 2018 kl. 02.51.17 UTC+2 skrev gnuarm.del...@gmail.com:
    On Tuesday, October 9, 2018 at 9:05:27 PM UTC-4, Clifford Heath wrote:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>

    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    Interesting. They have some very off-brand FPGA type devices as well at very low prices, but they still don't do me any favors with the packages.

    they also do PCBs jlcpcb.com and I've heard they also have a dirt cheap assembly service as long as you only use their list of components, though it seems it
    is so far only available in China
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Wed Oct 10 19:29:13 2018
    From Newsgroup: comp.arch.embedded

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or I²C, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just
    an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram,
    enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
    I like that it's in a 6-pin SOT23 package since there aren't many other
    MCUs that small.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Thu Oct 11 11:39:56 2018
    From Newsgroup: comp.arch.embedded

    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
    <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or I²C, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just
    an accumulator, a cute concept.

    There is a lot of operations that will update memory locations, so why
    would you need a lot of CPU registers.

    1K of program OTP and 64 bytes of ram,

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    enough for plenty of MCU things. Didn't check if it has an ADC or PWM.

    At least the 8 pin version has both a PWM as well as a comparator, so
    making an ADC wouldn't be too hard.

    I like that it's in a 6-pin SOT23 package since there aren't many other
    MCUs that small.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael Kellett@mk@mkesc.co.uk to comp.arch.embedded on Thu Oct 11 14:04:39 2018
    From Newsgroup: comp.arch.embedded

    On 10/10/2018 02:05, Clifford Heath wrote:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    Has anyone actually used them - or worked out where to get the ICE and
    how much it costs ?

    MK
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Thu Oct 11 15:56:52 2018
    From Newsgroup: comp.arch.embedded

    On 11/10/18 15:04, Michael Kellett wrote:
    On 10/10/2018 02:05, Clifford Heath wrote:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    Has anyone actually used them - or worked out where to get the ICE and
    how much it costs ?

    MK

    The cost of the ICE is not going to be significant for most people - you usually use a chip like this when you want huge quantities (even though
    it is available in small numbers).

    What turns me off here is the programming procedure for the OTP devices.
    There is no information on it - just a simple one-at-a-time programmer
    device. That is useless for production - you need an automated system,
    or support from existing automated programmers, or at the very least the programming information so that you can build your own specialist
    programmer. There is no point in buying a microcontroller for $0.03 if
    the time taken to manually take a device out a tube, manually program
    it, and manually put it back in another tube for the pick-and-place
    costs you $1 production time.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Thu Oct 11 16:08:00 2018
    From Newsgroup: comp.arch.embedded

    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why
    would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    It would be nice to have a C compiler, and registers help with that.

    At least the 8 pin version has both a PWM as well as a comparator, so
    making an ADC wouldn't be too hard.

    Thanks.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Fri Oct 12 08:50:49 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why
    would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    It would be nice to have a C compiler, and registers help with that.


    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative adressing mode. On would want to reserve a few memory locations as pseudo-registers to help with that, but that only goes so far.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Oct 12 09:44:15 2018
    From Newsgroup: comp.arch.embedded

    On 12/10/18 08:50, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why
    would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the
    accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    It would be nice to have a C compiler, and registers help with that.


    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative adressing mode. On would want to reserve a few memory locations as pseudo-registers to help with that, but that only goes so far.


    It looks like the lowest 16 memory addresses could be considered pseudo-registers - they are the ones that can be used for direct memory
    access rather than needing indirect access.

    And I don't think inefficient reentrant functions would be much of a
    worry on a device with so little code space!

    Some of the examples in the datasheet were given in C - that implies
    that there already is a C compiler for the device. Has anyone tried the
    IDE?


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Fri Oct 12 10:18:56 2018
    From Newsgroup: comp.arch.embedded

    Am 10.10.2018 um 03:05 schrieb Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    They even make dual-core variants (the part where the first digit in the
    part number is '2'). It seems program counter, stack pointer, flag
    register and accumulator are per-core, while the rest, including the ALU
    is shared. In particular, the I/O registers are also shared, which means
    some multiplier registers would also be - but currently all variants
    with integrated multiplier are single-core.
    Use of the ALU is shared byt he two cores, alternating by clock cycle.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Fri Oct 12 09:11:02 2018
    From Newsgroup: comp.arch.embedded

    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why
    would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    It would be nice to have a C compiler, and registers help with that.


    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative adressing mode. On would want to reserve a few memory locations as pseudo-registers to help with that, but that only goes so far.
    CPUs like this (and others that aren't like this) should be programmed in Forth. It's a great tool for small MCUs and many times can be hosted on the target although not likely in this case. Still, you can bring enough functionality onto the MCU to allow direct downloads and many debugging features without an ICE.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Fri Oct 12 21:30:42 2018
    From Newsgroup: comp.arch.embedded

    On Fri, 12 Oct 2018 09:44:15 +0200, David Brown
    <david.brown@hesbynett.no> wrote:

    On 12/10/18 08:50, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the
    accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    It would be nice to have a C compiler, and registers help with that.


    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative
    adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.


    It looks like the lowest 16 memory addresses could be considered >pseudo-registers - they are the ones that can be used for direct memory >access rather than needing indirect access.

    And I don't think inefficient reentrant functions would be much of a
    worry on a device with so little code space!

    The real issue would be the small RAM size.

    With such small ROM/RAM sizes, who needs reentrant functions ?
    Possibly only if you think that every problem must be solved by
    recursion :-).

    Reentrancy is nice, when writing run time library (RTL) routines that
    might be called from different contexts, but who in their right mind
    would call RTL routines from the ISR ?

    OK, some might put an RTOS into that processor, but even in that case,
    the RTOS might consist only of a simple foreground/background monitor,
    so unlikely need reentran routines.

    If you insist of using "C", just declare all variables as

    static uint8_t (and a few static uint16_t)

    so no reentrant code is generated.

    However, you could as well use some old 8 bitter languages such as
    PL/M-80.



    Some of the examples in the datasheet were given in C - that implies
    that there already is a C compiler for the device. Has anyone tried the
    IDE?


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Fri Oct 12 21:39:06 2018
    From Newsgroup: comp.arch.embedded

    On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 10.10.2018 um 03:05 schrieb Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    They even make dual-core variants (the part where the first digit in the
    part number is '2'). It seems program counter, stack pointer, flag
    register and accumulator are per-core, while the rest, including the ALU
    is shared. In particular, the I/O registers are also shared, which means
    some multiplier registers would also be - but currently all variants
    with integrated multiplier are single-core.
    Use of the ALU is shared byt he two cores, alternating by clock cycle.

    Philipp


    Interesting, that would make it easy to run a multitasking RTOS (foreground/background) monitor, which might justify the use of some
    reentrant library routines :-). But in reality, the available memory
    (ROM/RAM) is so small so that you could easily manage this with static
    memory allocations.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Fri Oct 12 22:06:02 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    The real issue would be the small RAM size.

    Devices with this architecture go up to 256 B of RAM (but they then cost
    a few cent more).

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Fri Oct 12 23:45:54 2018
    From Newsgroup: comp.arch.embedded

    On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    The real issue would be the small RAM size.

    Devices with this architecture go up to 256 B of RAM (but they then cost
    a few cent more).

    Philipp

    Did you find the binary encoding of various instruction formats, i.e
    how many bits allocated to the operation code and how many for the
    address field ?

    My initial guess was that the instruction word is simple 8 bit opcode
    + 8 bit address, but the bit and word address limits for the smaller
    models would suggest that for some op-codes, the op-code field might
    be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
    and word addressing).

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael Kellett@mk@mkesc.co.uk to comp.arch.embedded on Sat Oct 13 09:35:36 2018
    From Newsgroup: comp.arch.embedded

    On 11/10/2018 14:56, David Brown wrote:
    On 11/10/18 15:04, Michael Kellett wrote:
    On 10/10/2018 02:05, Clifford Heath wrote:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    Has anyone actually used them - or worked out where to get the ICE and
    how much it costs ?

    MK

    The cost of the ICE is not going to be significant for most people - you usually use a chip like this when you want huge quantities (even though
    it is available in small numbers).

    What turns me off here is the programming procedure for the OTP devices.
    There is no information on it - just a simple one-at-a-time programmer device. That is useless for production - you need an automated system,
    or support from existing automated programmers, or at the very least the programming information so that you can build your own specialist
    programmer. There is no point in buying a microcontroller for $0.03 if
    the time taken to manually take a device out a tube, manually program
    it, and manually put it back in another tube for the pick-and-place
    costs you $1 production time.

    My major interest in this part was for fun - hence caring about the cost
    of the ICE. From a business point of view it makes no sense - by the
    time you reach numbers big enough to care about the cost of the micro
    the risk of using a part like this is too great. Different if you are
    next door to the manufacturer.

    If you want a hardware minimal processor the Maxim 32660 looks like fun
    3mm square, 24 pin Cortex M4, 96MHz, 256k flash, 96k RAM, £1.16 (10 off).

    My guess is that you need to be using at least 5k of them before the
    cheaper Padauk part offsets the cost of using one.

    MK
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Oct 13 12:46:15 2018
    From Newsgroup: comp.arch.embedded

    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the
    accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    It would be nice to have a C compiler, and registers help with that.


    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative
    adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many times can be hosted
    on the target although not likely in this case. Still, you can bring
    enough functionality onto the MCU to allow direct downloads and many debugging features without an ICE.

    Rick C.


    Forth is a good language for very small devices, but there are details
    that can make a huge difference in how efficient it is. To make Forth
    work well on a small chip you need a Forth-specific instruction set to
    target the stack processing. For example, adding two numbers in this
    chip is two instructions - load accumulator from memory X, add
    accumulator to memory Y. In a Forth cpu, you'd have a single
    instruction that does "pop two numbers, add them, push the result".
    That gives a very efficient and compact instruction set. But it is hard
    to get the same results from a chip that doesn't have this kind of
    stack-based instruction set.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Oct 13 05:06:23 2018
    From Newsgroup: comp.arch.embedded

    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the >>> accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>> assembly program listing.

    It would be nice to have a C compiler, and registers help with that.


    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative >> adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many times can be hosted
    on the target although not likely in this case. Still, you can bring
    enough functionality onto the MCU to allow direct downloads and many debugging features without an ICE.

    Rick C.


    Forth is a good language for very small devices, but there are details
    that can make a huge difference in how efficient it is. To make Forth
    work well on a small chip you need a Forth-specific instruction set to target the stack processing. For example, adding two numbers in this
    chip is two instructions - load accumulator from memory X, add
    accumulator to memory Y. In a Forth cpu, you'd have a single
    instruction that does "pop two numbers, add them, push the result".
    That gives a very efficient and compact instruction set. But it is hard
    to get the same results from a chip that doesn't have this kind of stack-based instruction set.
    Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?
    I believe others have said the instruction set is memory oriented with no registers. I think that means in general the CPU will be slow compared to a register based design. That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sat Oct 13 18:00:26 2018
    From Newsgroup: comp.arch.embedded

    On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
    gnuarm.deletethisbit@gmail.com wrote:

    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >> >>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the >> >>> accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    It would be nice to have a C compiler, and registers help with that.


    Looking at the instruction set, it should be possible to make a backend >> >> for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative >> >> adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many times can be hosted
    on the target although not likely in this case. Still, you can bring
    enough functionality onto the MCU to allow direct downloads and many
    debugging features without an ICE.

    Rick C.


    Forth is a good language for very small devices, but there are details
    that can make a huge difference in how efficient it is. To make Forth
    work well on a small chip you need a Forth-specific instruction set to
    target the stack processing. For example, adding two numbers in this
    chip is two instructions - load accumulator from memory X, add
    accumulator to memory Y. In a Forth cpu, you'd have a single
    instruction that does "pop two numbers, add them, push the result".
    That gives a very efficient and compact instruction set. But it is hard
    to get the same results from a chip that doesn't have this kind of
    stack-based instruction set.

    Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

    I believe others have said the instruction set is memory oriented with no registers.

    Depending how you look at it, you could claim that it has 64 registers
    and no RAM. It is a quite orthogonal single address architecture. You
    can do practically all single operand instructions (like inc/dec,
    shift/rotate etc.) either in the accumulator but equally well in any
    of the 64 "registers". For two operand instructions (such as add/sub,
    and/or etc,), either the source or destination can be in the memory
    "register".

    Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory are
    valid.

    Thus the accumulator is needed only for two operand instructions, but
    not for single operand instructions.

    I think that means in general the CPU will be slow compared to a register based design.

    What is the difference, you have 64 on chip RAM bytes or 64 single
    byte on chip registers. The situation would have been different with
    on-chip registers and off chip RAM, with the memory bottleneck.

    Of course, there were odd architectures like the TI 9900 with a set of
    sixteen 16 bit general purpose register in RAN. The set could be
    switched fast in interrupts, but slowed down any general purpose
    register access.

    That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.

    For a stack computer you need a pointer register with preferably autoincrement/decrement support. This processor has indirect access
    and single instruction increment or decrement support without
    disturbing the accumulator.Thus not so bad after all for stack
    computing.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 18:31:25 2018
    From Newsgroup: comp.arch.embedded

    On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the
    accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    The data-sheet describes the OTP program memory as "1KW", probably
    meaning 1024 instructions. The length of an instruction is not defined,
    as far as I could see.

    It would be nice to have a C compiler, and registers help with that.

    The data-sheet mentions something they call "Mini-C".

    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative
    adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be programmed
    in Forth.

    I don't think that an interpreted Forth is feasible for this particular
    MCU. Where would the Forth program (= list of pointers to "words") be
    stored? I found no instructions for reading data from the OTP program
    memory, and the 64-byte RAM will not hold a non-trivial program together
    with the data for that program.

    Moreover, there is no indirect jump instruction -- "jump to a computed address". The closest is "pcadd a", which can be used to implement a
    256-entry case statement. You would be limited to a total of 256 words.

    Moreover, each RAM-resident pointer to RAM uses 2 octets of RAM, giving
    a 16-bit RAM address, although for this MCU a 6-bit address would be
    enough. Apparently the same architecture has implementations with more
    RAM and 16-bit RAM addresses.

    That said, one could perhaps implement a compiled Forth for this machine.
    --
    Niklas Holsti
    Tidorum Ltd
    niklas holsti tidorum fi
    . @ .
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Oct 13 18:21:46 2018
    From Newsgroup: comp.arch.embedded

    On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:
    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
    wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
    Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory
    locations, so why would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic
    through the accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
    commented assembly program listing.

    It would be nice to have a C compiler, and registers help
    with that.


    Looking at the instruction set, it should be possible to make a
    backend for this in SDCC; the architecture looks more
    C-friendly than the existing pic14 and pic16 backends. But it
    surely isn't as nice as stm8 or z80. reentrant functions will
    be inefficent: No registers, and no sp-relative adressing mode.
    On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many
    times can be hosted on the target although not likely in this
    case. Still, you can bring enough functionality onto the MCU to
    allow direct downloads and many debugging features without an
    ICE.

    Rick C.


    Forth is a good language for very small devices, but there are
    details that can make a huge difference in how efficient it is. To
    make Forth work well on a small chip you need a Forth-specific
    instruction set to target the stack processing. For example,
    adding two numbers in this chip is two instructions - load
    accumulator from memory X, add accumulator to memory Y. In a Forth
    cpu, you'd have a single instruction that does "pop two numbers,
    add them, push the result". That gives a very efficient and compact
    instruction set. But it is hard to get the same results from a
    chip that doesn't have this kind of stack-based instruction set.

    Your point is what exactly? You are comparing running forth on some
    other chip to running forth on this chip. How is that useful? There
    are many other chips that run very fast. So?

    My point is that /this/ CPU is not a good match for Forth, though many
    other very cheap CPUs are. Whether or not you think that matches "CPUs
    like this should be programmed in Forth" depends on what you mean by
    "CPUs like this", and what you think the benefits of Forth are.


    I believe others have said the instruction set is memory oriented
    with no registers. I think that means in general the CPU will be
    slow compared to a register based design. That actually means it is
    easier to have a fast Forth implementation compared to other
    compilers since there won't be a significant penalty for using a
    stack.


    It has a single register, not unlike the "W" register in small PIC
    devices. Yes, I expect it is going to be slower than you would get from having a few more registers. But it is missing (AFAICS) auto-increment
    and decrement modes, and has only load/store operations with indirect
    access.

    So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:

    mov a, y; // 1 clock
    add x, a; // 1 clock

    If you have a data stack pointer "dsp", and want a standard Forth "+" operation, you have:

    idxm a, dsp; // 2 clock
    mov temp, a; // 1 clock
    dec dsp; // 1 clock
    idxm a, dsp; // 2 clock
    add a, temp; // 1 clock
    idxm dsp, a; // 2 clock

    That is 9 clocks, instead of 2, and 6 instructions instead of 3.

    Of course you could make a Forth compiler for the device - but you would
    have to make an optimising Forth compiler that avoids needing a data
    stack, just as you do on many other small microcontollers (and just as a
    C compiler would do). This is /not/ a processor that fits well with
    Forth or that would give a clear translation from Forth to assembly, as
    is the case on some very small microcontrollers.






    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Oct 13 18:27:13 2018
    From Newsgroup: comp.arch.embedded

    On 13/10/18 17:00, upsidedown@downunder.com wrote:
    On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
    gnuarm.deletethisbit@gmail.com wrote:

    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>>>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the >>>>>> accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>>>> assembly program listing.

    It would be nice to have a C compiler, and registers help with that. >>>>>>

    Looking at the instruction set, it should be possible to make a backend >>>>> for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >>>>> or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative >>>>> adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many times can be hosted
    on the target although not likely in this case. Still, you can bring
    enough functionality onto the MCU to allow direct downloads and many
    debugging features without an ICE.

    Rick C.


    Forth is a good language for very small devices, but there are details
    that can make a huge difference in how efficient it is. To make Forth
    work well on a small chip you need a Forth-specific instruction set to
    target the stack processing. For example, adding two numbers in this
    chip is two instructions - load accumulator from memory X, add
    accumulator to memory Y. In a Forth cpu, you'd have a single
    instruction that does "pop two numbers, add them, push the result".
    That gives a very efficient and compact instruction set. But it is hard >>> to get the same results from a chip that doesn't have this kind of
    stack-based instruction set.

    Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

    I believe others have said the instruction set is memory oriented with no registers.

    Depending how you look at it, you could claim that it has 64 registers
    and no RAM. It is a quite orthogonal single address architecture. You
    can do practically all single operand instructions (like inc/dec, shift/rotate etc.) either in the accumulator but equally well in any
    of the 64 "registers". For two operand instructions (such as add/sub,
    and/or etc,), either the source or destination can be in the memory "register".

    Not quite, no. Only the first 16 memory addresses are directly
    accessible for most instructions, with the first 32 addresses being
    available for word-based instructions. So you could liken it to a
    device with 16 registers and indirect memory access to the rest of ram.


    Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory are
    valid.

    Thus the accumulator is needed only for two operand instructions, but
    not for single operand instructions.

    I think that means in general the CPU will be slow compared to a register based design.

    What is the difference, you have 64 on chip RAM bytes or 64 single
    byte on chip registers. The situation would have been different with
    on-chip registers and off chip RAM, with the memory bottleneck.

    Of course, there were odd architectures like the TI 9900 with a set of sixteen 16 bit general purpose register in RAN. The set could be
    switched fast in interrupts, but slowed down any general purpose
    register access.

    That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.

    For a stack computer you need a pointer register with preferably autoincrement/decrement support. This processor has indirect access
    and single instruction increment or decrement support without
    disturbing the accumulator.Thus not so bad after all for stack
    computing.


    But you can't use the indirect memory accesses for any ALU instructions
    - only for loading or saving the accumulator. So all indirect accesses
    need to go via the accumulator - and if you want to operate on two
    indirect accesses (like adding the top two elements on the stack), you
    have to use another "register" address to store one element temporarily.
    Yes, it would be bad for stack computing.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 19:46:30 2018
    From Newsgroup: comp.arch.embedded

    On 18-10-13 18:31 , Niklas Holsti wrote:

    I don't think that an interpreted Forth is feasible for this particular
    MCU. ...
    Moreover, there is no indirect jump instruction -- "jump to a computed address".

    Ok, before anyone else notices, I admit I forgot about implementing an indirect jump by pushing the target address on the stack and executing a return instruction. That would work for this machine.
    --
    Niklas Holsti
    Tidorum Ltd
    niklas holsti tidorum fi
    . @ .
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 19:59:06 2018
    From Newsgroup: comp.arch.embedded

    And one more iteration (sorry...)

    On 18-10-13 19:46 , Niklas Holsti wrote:
    On 18-10-13 18:31 , Niklas Holsti wrote:

    I don't think that an interpreted Forth is feasible for this particular
    MCU. ...
    Moreover, there is no indirect jump instruction -- "jump to a computed
    address".

    Ok, before anyone else notices, I admit I forgot about implementing an indirect jump by pushing the target address on the stack and executing a return instruction. That would work for this machine.

    Except that one can only "push" the accumulator and flag registers,
    combined, and the flag register cannot be set directly, and has only 4
    working bits.

    What would work, as an indirect jump, is to set the Stack Pointer (sp)
    to point at a RAM word that contains the target address, and then
    execute a return. But then one has lost the actual Stack Pointer value.
    --
    Niklas Holsti
    Tidorum Ltd
    niklas holsti tidorum fi
    . @ .
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sat Oct 13 20:50:33 2018
    From Newsgroup: comp.arch.embedded

    On Sat, 13 Oct 2018 18:27:13 +0200, David Brown
    <david.brown@hesbynett.no> wrote:

    On 13/10/18 17:00, upsidedown@downunder.com wrote:
    On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
    gnuarm.deletethisbit@gmail.com wrote:

    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>>>>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the >>>>>>> accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>>>>> assembly program listing.

    It would be nice to have a C compiler, and registers help with that. >>>>>>>

    Looking at the instruction set, it should be possible to make a backend >>>>>> for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >>>>>> or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative >>>>>> adressing mode. On would want to reserve a few memory locations as >>>>>> pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many times can be hosted
    on the target although not likely in this case. Still, you can bring >>>>> enough functionality onto the MCU to allow direct downloads and many >>>>> debugging features without an ICE.

    Rick C.


    Forth is a good language for very small devices, but there are details >>>> that can make a huge difference in how efficient it is. To make Forth >>>> work well on a small chip you need a Forth-specific instruction set to >>>> target the stack processing. For example, adding two numbers in this
    chip is two instructions - load accumulator from memory X, add
    accumulator to memory Y. In a Forth cpu, you'd have a single
    instruction that does "pop two numbers, add them, push the result".
    That gives a very efficient and compact instruction set. But it is hard >>>> to get the same results from a chip that doesn't have this kind of
    stack-based instruction set.

    Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

    I believe others have said the instruction set is memory oriented with no registers.

    Depending how you look at it, you could claim that it has 64 registers
    and no RAM. It is a quite orthogonal single address architecture. You
    can do practically all single operand instructions (like inc/dec,
    shift/rotate etc.) either in the accumulator but equally well in any
    of the 64 "registers". For two operand instructions (such as add/sub,
    and/or etc,), either the source or destination can be in the memory
    "register".

    Not quite, no. Only the first 16 memory addresses are directly
    accessible for most instructions, with the first 32 addresses being >available for word-based instructions. So you could liken it to a
    device with 16 registers and indirect memory access to the rest of ram.

    Really ?

    In the manual

    M.n Only addressed in 0~0xF (0~15) is allowed

    The M.n notation is for bit operations, in which M is the byte address
    and n is the bit number in byte. Restricting M to 4 bits makes sense,
    since n requires 3 bits, thus the total address size for bit
    operations would be 7 bits.

    I couldn't find a reference that the restriction on M also applies to
    byte access. Where is it ?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sat Oct 13 21:03:28 2018
    From Newsgroup: comp.arch.embedded

    On Sat, 13 Oct 2018 18:31:25 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

    On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote: >>> Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the >>>> accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>> assembly program listing.

    The data-sheet describes the OTP program memory as "1KW", probably
    meaning 1024 instructions. The length of an instruction is not defined,
    as far as I could see.

    Yes, I misread the data sheet. It is really 1 kW.

    The nice feature about Harvard architecture is that the data and
    instruction size can be different.

    I have tried to locate the bit allocation of various fields (opcode,
    address etc.) ut no luck.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Sat Oct 13 11:06:29 2018
    From Newsgroup: comp.arch.embedded

    gnuarm.deletethisbit@gmail.com writes:
    That actually means it is easier to have a fast Forth implementation
    compared to other compilers since there won't be a significant penalty
    for using a stack.

    I think this chip is too small for traditional Forth implementation
    methods. Just 64 bytes of ram and no registers. If you have 16 bit
    cells and 8 levels of return and data stacks, half the ram is already
    used by the stacks.

    An F18 processor (GA144 node for those not familiar) has around 3x as
    much ram including the stacks, and it doesn't pretend to be a complete
    MCU (you usually split your application across multiple nodes). Plus it
    has that very efficient 5-bit instruction encoding. On the other hand,
    you have to use ram as program memory.

    You might be able to concoct some usable Forth dialect compiled with an optimizing compiler and using 8-bit data when possible, but it doesn't
    seem that useful for a chip like this.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sat Oct 13 21:31:26 2018
    From Newsgroup: comp.arch.embedded

    On Sat, 13 Oct 2018 19:59:06 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

    And one more iteration (sorry...)

    On 18-10-13 19:46 , Niklas Holsti wrote:
    On 18-10-13 18:31 , Niklas Holsti wrote:

    I don't think that an interpreted Forth is feasible for this particular
    MCU. ...
    Moreover, there is no indirect jump instruction -- "jump to a computed
    address".

    Ok, before anyone else notices, I admit I forgot about implementing an
    indirect jump by pushing the target address on the stack and executing a
    return instruction. That would work for this machine.

    Except that one can only "push" the accumulator and flag registers, >combined, and the flag register cannot be set directly, and has only 4 >working bits.

    What would work, as an indirect jump, is to set the Stack Pointer (sp)
    to point at a RAM word that contains the target address, and then
    execute a return. But then one has lost the actual Stack Pointer value.

    Just call a "Jumper" routine, the call pushes the return address on
    stack. In "Jumper" read SP from IO address space, indirectly modify
    the return address on stack as needed and perform a ret instruction,
    causing a jump to the modified return address and it also restores the
    SP to the value before the call.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 22:19:59 2018
    From Newsgroup: comp.arch.embedded

    On 18-10-13 21:31 , upsidedown@downunder.com wrote:
    On Sat, 13 Oct 2018 19:59:06 +0300, Niklas Holsti <niklas.holsti@tidorum.invalid> wrote:

    And one more iteration (sorry...)

    On 18-10-13 19:46 , Niklas Holsti wrote:
    On 18-10-13 18:31 , Niklas Holsti wrote:

    I don't think that an interpreted Forth is feasible for this particular >>>> MCU. ...
    Moreover, there is no indirect jump instruction -- "jump to a computed >>>> address".

    Ok, before anyone else notices, I admit I forgot about implementing an
    indirect jump by pushing the target address on the stack and executing a >>> return instruction. That would work for this machine.

    Except that one can only "push" the accumulator and flag registers,
    combined, and the flag register cannot be set directly, and has only 4
    working bits.

    What would work, as an indirect jump, is to set the Stack Pointer (sp)
    to point at a RAM word that contains the target address, and then
    execute a return. But then one has lost the actual Stack Pointer value.

    Just call a "Jumper" routine, the call pushes the return address on
    stack. In "Jumper" read SP from IO address space, indirectly modify
    the return address on stack as needed and perform a ret instruction,
    causing a jump to the modified return address and it also restores the
    SP to the value before the call.

    Right, that sounds possible. But wow what a circumlocution.
    --
    Niklas Holsti
    Tidorum Ltd
    niklas holsti tidorum fi
    . @ .
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sat Oct 13 21:24:27 2018
    From Newsgroup: comp.arch.embedded

    Am 13.10.2018 um 18:59 schrieb Niklas Holsti:

    Except that one can only "push" the accumulator and flag registers,
    combined, and the flag register cannot be set directly, and has only 4 working bits.

    It seems unclear to me which of acc and sp is pushed first.
    But if acc is pushed first, one could do

    pushaf;
    mov a, sp;
    inc a;
    mov sp, a;

    to push any desired byte onto the stack.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sat Oct 13 21:47:48 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.2018 um 22:45 schrieb upsidedown@downunder.com:
    On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    The real issue would be the small RAM size.

    Devices with this architecture go up to 256 B of RAM (but they then cost
    a few cent more).

    Philipp

    Did you find the binary encoding of various instruction formats, i.e
    how many bits allocated to the operation code and how many for the
    address field ?

    My initial guess was that the instruction word is simple 8 bit opcode
    + 8 bit address, but the bit and word address limits for the smaller
    models would suggest that for some op-codes, the op-code field might
    be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
    and word addressing).


    People have tried before (https://www.mikrocontroller.net/topic/449689, https://stackoverflow.com/questions/49842256/reverse-engineer-assembler-which-probably-encrypts-code).
    Apparently, even with access to the tools it is not obvious.

    However, a Chinese manual contains these examples:

    5E0A MOV A BB1
    1B21 COMP A #0x21
    2040 T0SN CF
    5C0B MOV BB2 A
    C028 GOTO 0x28
    0030 WDRESET
    1F00 MOV A #0x0
    0082 MOV SP A

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Sat Oct 13 22:50:57 2018
    From Newsgroup: comp.arch.embedded

    On 18-10-13 22:24 , Philipp Klaus Krause wrote:
    Am 13.10.2018 um 18:59 schrieb Niklas Holsti:

    Except that one can only "push" the accumulator and flag registers,
    combined, and the flag register cannot be set directly, and has only 4
    working bits.

    It seems unclear to me which of acc and sp is pushed first.
    But if acc is pushed first, one could do

    pushaf;
    mov a, sp;
    inc a;
    mov sp, a;

    to push any desired byte onto the stack.

    There's also a rule that the sp must always contain an even address, at
    least if interrupts are enabled, as I understand it.
    --
    Niklas Holsti
    Tidorum Ltd
    niklas holsti tidorum fi
    . @ .
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Sat Oct 13 13:06:06 2018
    From Newsgroup: comp.arch.embedded

    Michael Kellett <mk@mkesc.co.uk> writes:
    If you want a hardware minimal processor the Maxim 32660 looks like fun
    3mm square, 24 pin Cortex M4, 96MHz, 256k flash, 96k RAM, £1.16 (10 off).

    That's not minimal ;). More practically, the 3mm square package sounds
    like a WLCSP which I think requires specialized ($$$) board fab
    facilities (it can't be hand soldered or done with normal reflow
    processes). Part of the Padauk part's attraction is the 6-pin SOT23
    package.

    Here's a complete STM8 board for 0.77 USD shipped:

    https://www.aliexpress.com/item//32527571163.html

    It has 8k of program flash and 1k of ram and can run a resident Forth interpreter. I think they also make a SOIC-8 version of the cpu. I
    bought a few of those boards for around 0.50 each last year so I guess
    they have gotten a bit more expensive since then.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim@cpldcpu+usenet@gmail.com to comp.arch.embedded on Sun Oct 14 01:46:37 2018
    From Newsgroup: comp.arch.embedded

    On 10/10/2018 03:05 AM, Clifford Heath wrote:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>



    This is quite curious. I wonder

    - Has anyone actually received the devices they ordered? The cheaper variants seem to be sold out.
    - Any success in setting up a programmer?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Oct 13 18:20:56 2018
    From Newsgroup: comp.arch.embedded

    On Saturday, October 13, 2018 at 11:00:30 AM UTC-4, upsid...@downunder.com wrote:
    On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
    gnuarm.deletethisbit@gmail.com wrote:

    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why
    would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the >> >>> accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >> >>>> assembly program listing.

    It would be nice to have a C compiler, and registers help with that. >> >>>

    Looking at the instruction set, it should be possible to make a backend >> >> for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >> >> or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative
    adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many times can be hosted
    on the target although not likely in this case. Still, you can bring
    enough functionality onto the MCU to allow direct downloads and many
    debugging features without an ICE.

    Rick C.


    Forth is a good language for very small devices, but there are details
    that can make a huge difference in how efficient it is. To make Forth
    work well on a small chip you need a Forth-specific instruction set to
    target the stack processing. For example, adding two numbers in this
    chip is two instructions - load accumulator from memory X, add
    accumulator to memory Y. In a Forth cpu, you'd have a single
    instruction that does "pop two numbers, add them, push the result".
    That gives a very efficient and compact instruction set. But it is hard >> to get the same results from a chip that doesn't have this kind of
    stack-based instruction set.

    Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

    I believe others have said the instruction set is memory oriented with no registers.

    Depending how you look at it, you could claim that it has 64 registers
    and no RAM. It is a quite orthogonal single address architecture. You
    can do practically all single operand instructions (like inc/dec, shift/rotate etc.) either in the accumulator but equally well in any
    of the 64 "registers". For two operand instructions (such as add/sub,
    and/or etc,), either the source or destination can be in the memory "register".

    Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory are
    valid.

    Thus the accumulator is needed only for two operand instructions, but
    not for single operand instructions.
    How fast are instructions that access memory? Most MCUs will perform register operations in a single cycle. Even though RAM may be on chip, it typically is not as fast as registers because it is usually not multiported. DSP chips are an exception with dual and even triple ported on chip RAM.
    I think that means in general the CPU will be slow compared to a register based design.

    What is the difference, you have 64 on chip RAM bytes or 64 single
    byte on chip registers. The situation would have been different with
    on-chip registers and off chip RAM, with the memory bottleneck.

    Of course, there were odd architectures like the TI 9900 with a set of sixteen 16 bit general purpose register in RAN. The set could be
    switched fast in interrupts, but slowed down any general purpose
    register access.
    Yeah, I'm familiar with the 9900. In the 990 it worked well because the CPU was TTL and not so fast. Once the CPU was on a single chip the external RAM was not fast enough to keep up really and instruction timings were dominated by the memory.

    That actually means it is easier to have a fast Forth implementation compared to other compilers since there won't be a significant penalty for using a stack.

    For a stack computer you need a pointer register with preferably autoincrement/decrement support. This processor has indirect access
    and single instruction increment or decrement support without
    disturbing the accumulator.Thus not so bad after all for stack
    computing.
    The stack in memory is usually a bottle neck because memory is typically slow so optimizations would be done to keep operands in registers. In this chip no optimizations are possible, but likely it wouldn't be too bad as long as the stack operations are flexible enough. But then I don't think you said this CPU has the sort of addressing that allows an operand in memory to be used and popped off the stack in one opcode as many, higher level CPUs do. So adding the two numbers on the stack would involve keeping the top of stack in the accumulator, adding the next item on the stack from memory to the accumulator, then another instruction to adjust the stack pointer which is also in memory. So two instructions? How many clock cycles?
    What happens when there is a change in the instruction pointer of the Forth virtual machine? Calling a new word would require saving the current value of the Forth IP on the return stack (separate from the data stack) and loading a new value into the Forth IP? This is a piece of code typically called "next". It varies a bit between indirect and direct threaded code. Then there is subroutine threaded code that just uses the CPU IP as the Forth IP and each address is actually a CPU call instruction.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Oct 13 18:25:25 2018
    From Newsgroup: comp.arch.embedded

    On Saturday, October 13, 2018 at 11:31:30 AM UTC-4, Niklas Holsti wrote:
    On 18-10-12 19:11 , gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the >>> accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>> assembly program listing.

    The data-sheet describes the OTP program memory as "1KW", probably
    meaning 1024 instructions. The length of an instruction is not defined,
    as far as I could see.

    It would be nice to have a C compiler, and registers help with that.

    The data-sheet mentions something they call "Mini-C".

    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative >> adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be programmed
    in Forth.

    I don't think that an interpreted Forth is feasible for this particular
    MCU. Where would the Forth program (= list of pointers to "words") be stored? I found no instructions for reading data from the OTP program memory, and the 64-byte RAM will not hold a non-trivial program together with the data for that program.

    Moreover, there is no indirect jump instruction -- "jump to a computed address". The closest is "pcadd a", which can be used to implement a 256-entry case statement. You would be limited to a total of 256 words.

    For programs on such a small MCU 256 words is likely much overkill. But you don't need to have the above features for Forth. Subroutine threading uses call and return instructions instead of an address list.


    Moreover, each RAM-resident pointer to RAM uses 2 octets of RAM, giving
    a 16-bit RAM address, although for this MCU a 6-bit address would be
    enough. Apparently the same architecture has implementations with more
    RAM and 16-bit RAM addresses.

    That said, one could perhaps implement a compiled Forth for this machine.

    Yeah, I'm pretty sure it is too small for a resident Forth, so a host would be required and a Forth can be compiled and subroutine threaded.

    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Oct 13 18:32:51 2018
    From Newsgroup: comp.arch.embedded

    On Saturday, October 13, 2018 at 12:21:51 PM UTC-4, David Brown wrote:
    On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:
    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
    wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
    Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory
    locations, so why would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic
    through the accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
    commented assembly program listing.

    It would be nice to have a C compiler, and registers help
    with that.


    Looking at the instruction set, it should be possible to make a
    backend for this in SDCC; the architecture looks more
    C-friendly than the existing pic14 and pic16 backends. But it
    surely isn't as nice as stm8 or z80. reentrant functions will
    be inefficent: No registers, and no sp-relative adressing mode.
    On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many
    times can be hosted on the target although not likely in this
    case. Still, you can bring enough functionality onto the MCU to
    allow direct downloads and many debugging features without an
    ICE.

    Rick C.


    Forth is a good language for very small devices, but there are
    details that can make a huge difference in how efficient it is. To
    make Forth work well on a small chip you need a Forth-specific
    instruction set to target the stack processing. For example,
    adding two numbers in this chip is two instructions - load
    accumulator from memory X, add accumulator to memory Y. In a Forth
    cpu, you'd have a single instruction that does "pop two numbers,
    add them, push the result". That gives a very efficient and compact
    instruction set. But it is hard to get the same results from a
    chip that doesn't have this kind of stack-based instruction set.

    Your point is what exactly? You are comparing running forth on some
    other chip to running forth on this chip. How is that useful? There
    are many other chips that run very fast. So?

    My point is that /this/ CPU is not a good match for Forth, though many
    other very cheap CPUs are. Whether or not you think that matches "CPUs
    like this should be programmed in Forth" depends on what you mean by
    "CPUs like this", and what you think the benefits of Forth are.


    I believe others have said the instruction set is memory oriented
    with no registers. I think that means in general the CPU will be
    slow compared to a register based design. That actually means it is
    easier to have a fast Forth implementation compared to other
    compilers since there won't be a significant penalty for using a
    stack.


    It has a single register, not unlike the "W" register in small PIC
    devices. Yes, I expect it is going to be slower than you would get from having a few more registers. But it is missing (AFAICS) auto-increment
    and decrement modes, and has only load/store operations with indirect access.

    So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:

    mov a, y; // 1 clock
    add x, a; // 1 clock

    Keep the TOS in the accumulator and I think you end up with

    add a, x; // 1 clock
    inc DSTKPTR; // adjust stack pointer - 1 clock?

    Does that work? Reading below, I guess not.


    If you have a data stack pointer "dsp", and want a standard Forth "+" operation, you have:

    idxm a, dsp; // 2 clock
    mov temp, a; // 1 clock
    dec dsp; // 1 clock
    idxm a, dsp; // 2 clock
    add a, temp; // 1 clock
    idxm dsp, a; // 2 clock

    That is 9 clocks, instead of 2, and 6 instructions instead of 3.

    What does idxm do? Looks like an indirect load? Can this address mode be combined with any operations? Are operations limited in the addressing modes? This seems like a very, very simple CPU, but for the money, I guess I get it.


    Of course you could make a Forth compiler for the device - but you would have to make an optimising Forth compiler that avoids needing a data
    stack, just as you do on many other small microcontollers (and just as a
    C compiler would do). This is /not/ a processor that fits well with
    Forth or that would give a clear translation from Forth to assembly, as
    is the case on some very small microcontrollers.

    OK

    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Sat Oct 13 19:02:38 2018
    From Newsgroup: comp.arch.embedded

    gnuarm.deletethisbit@gmail.com writes:
    Keep the TOS in the accumulator

    Do you mean you want a Forth with 8-bit data cells? What about the
    cells on the return stack, if there is one?

    What does idxm do? Looks like an indirect load?

    Yes.

    Can this address mode be combined with any operations?

    No. Just load or store.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sun Oct 14 08:53:15 2018
    From Newsgroup: comp.arch.embedded

    Am 14.10.2018 um 03:20 schrieb gnuarm.deletethisbit@gmail.com:

    How fast are instructions that access memory? Most MCUs will perform register operations in a single cycle. Even though RAM may be on
    chip, it typically is not as fast as registers because it is usually
    not multiported. DSP chips are an exception with dual and even
    triple ported on chip RAM.

    All instructions except for jumps are 1 cycle. Jumps if taken are 2
    cycles, 1 otherwise.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sun Oct 14 08:55:22 2018
    From Newsgroup: comp.arch.embedded

    Am 14.10.2018 um 08:53 schrieb Philipp Klaus Krause:
    Am 14.10.2018 um 03:20 schrieb gnuarm.deletethisbit@gmail.com:

    How fast are instructions that access memory? Most MCUs will perform
    register operations in a single cycle. Even though RAM may be on
    chip, it typically is not as fast as registers because it is usually
    not multiported. DSP chips are an exception with dual and even
    triple ported on chip RAM.

    All instructions except for jumps are 1 cycle. Jumps if taken are 2
    cycles, 1 otherwise.

    Philipp


    idxm and ldxm seem to be 2 cycles, too.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sun Oct 14 11:58:08 2018
    From Newsgroup: comp.arch.embedded

    On Sat, 13 Oct 2018 21:47:48 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 12.10.2018 um 22:45 schrieb upsidedown@downunder.com:
    On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    The real issue would be the small RAM size.

    Devices with this architecture go up to 256 B of RAM (but they then cost >>> a few cent more).

    Philipp

    Did you find the binary encoding of various instruction formats, i.e
    how many bits allocated to the operation code and how many for the
    address field ?

    My initial guess was that the instruction word is simple 8 bit opcode
    + 8 bit address, but the bit and word address limits for the smaller
    models would suggest that for some op-codes, the op-code field might
    be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
    and word addressing).


    People have tried before (https://www.mikrocontroller.net/topic/449689, >https://stackoverflow.com/questions/49842256/reverse-engineer-assembler-which-probably-encrypts-code).
    Apparently, even with access to the tools it is not obvious.

    However, a Chinese manual contains these examples:

    5E0A MOV A BB1
    1B21 COMP A #0x21
    2040 T0SN CF
    5C0B MOV BB2 A
    C028 GOTO 0x28
    0030 WDRESET
    1F00 MOV A #0x0
    0082 MOV SP A

    Philipp

    Interesting, this at least confirms that the instruction word is 16
    bits. In a Harvard architecture, the word length could have been
    13-17 bits, with some dirty encodings in 113 bit case., but a cleaner
    encoding with 14-17 bit instruction words.

    Assuming one would like to make an encoding for exactly 1024 code
    words and 64 byte data memory, a tighter encoding would be possible.
    Of course a manufacturer with small and larger processors, would make
    sense to use the same encoding for all processors, which is slightly inefficient for smaller models.

    Anyway 1 kW/64 byes case, the following code points would be required:

    2048 = 2 x 1024 call, goto
    1792 = 7 x 256 Immediate data (8 bit)
    2304 = 36 x 64 M-referense (6 bit)
    1024 = 8 x 128 Bit ref (M and IO 3+4 bits
    others

    This might barely fit into 13 bits, with some nasty encoding.

    Limiting M-refeence to 4 bits (0-15), but you still can't fit into 12
    bit instruction length.

    So with 16 bit word length, I do not understand why word reference is
    limited to 4-5 bits.The bit address limit makes more sense, so that it
    would not consume 4096 code points.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Theo@theom+news@chiark.greenend.org.uk to comp.arch.embedded on Sun Oct 14 10:55:00 2018
    From Newsgroup: comp.arch.embedded

    Tim <cpldcpu+usenet@gmail.com> wrote:
    This is quite curious. I wonder

    - Has anyone actually received the devices they ordered? The cheaper variants seem to be sold out.

    I think they've sold out since they went viral. EEVblog did a video showing 550 in stock - that's only $16 worth of parts, not hard to imagine they've
    been bought up.

    The other option is they're some kind of EOL part and 3c is the 'reduced to clear' price - which they have done, very successfully.

    Theo
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael Kellett@mk@mkesc.co.uk to comp.arch.embedded on Sun Oct 14 11:30:13 2018
    From Newsgroup: comp.arch.embedded

    On 13/10/2018 21:06, Paul Rubin wrote:
    Michael Kellett <mk@mkesc.co.uk> writes:
    If you want a hardware minimal processor the Maxim 32660 looks like fun
    3mm square, 24 pin Cortex M4, 96MHz, 256k flash, 96k RAM, £1.16 (10 off).

    That's not minimal ;). More practically, the 3mm square package sounds
    like a WLCSP which I think requires specialized ($$$) board fab
    facilities (it can't be hand soldered or done with normal reflow
    processes). Part of the Padauk part's attraction is the 6-pin SOT23
    package.

    Here's a complete STM8 board for 0.77 USD shipped:

    https://www.aliexpress.com/item//32527571163.html

    It has 8k of program flash and 1k of ram and can run a resident Forth interpreter. I think they also make a SOIC-8 version of the cpu. I
    bought a few of those boards for around 0.50 each last year so I guess
    they have gotten a bit more expensive since then.


    No - the BGA part is 1.6mm square (0.3mm pitch) - the 3mm is for 0.4mm
    pitch QFN and there is a 0.5mm pitch QFN part at 4mm square.
    The QFNs are reasonably prototype-able - needing only 0.15mm track and
    gap design rules and no filled vias in pads or other horrors.
    The point about the 32660 is that it is HARDWARE minimal but not
    constrained in software. At low volumes cost of the parts is nothing - a
    day of effort is $500 or more, in that context the difference between a
    free processor and a $2 processor is invisible.



    MK

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 14 14:26:00 2018
    From Newsgroup: comp.arch.embedded

    On 13/10/18 19:50, upsidedown@downunder.com wrote:
    On Sat, 13 Oct 2018 18:27:13 +0200, David Brown
    <david.brown@hesbynett.no> wrote:

    On 13/10/18 17:00, upsidedown@downunder.com wrote:
    On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
    gnuarm.deletethisbit@gmail.com wrote:

    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown wrote: >>>>> On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why
    would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the >>>>>>>> accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented >>>>>>>>> assembly program listing.

    It would be nice to have a C compiler, and registers help with that. >>>>>>>>

    Looking at the instruction set, it should be possible to make a backend >>>>>>> for this in SDCC; the architecture looks more C-friendly than the >>>>>>> existing pic14 and pic16 backends. But it surely isn't as nice as stm8 >>>>>>> or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative
    adressing mode. On would want to reserve a few memory locations as >>>>>>> pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many times can be hosted
    on the target although not likely in this case. Still, you can bring >>>>>> enough functionality onto the MCU to allow direct downloads and many >>>>>> debugging features without an ICE.

    Rick C.


    Forth is a good language for very small devices, but there are details >>>>> that can make a huge difference in how efficient it is. To make Forth >>>>> work well on a small chip you need a Forth-specific instruction set to >>>>> target the stack processing. For example, adding two numbers in this >>>>> chip is two instructions - load accumulator from memory X, add
    accumulator to memory Y. In a Forth cpu, you'd have a single
    instruction that does "pop two numbers, add them, push the result".
    That gives a very efficient and compact instruction set. But it is hard >>>>> to get the same results from a chip that doesn't have this kind of
    stack-based instruction set.

    Your point is what exactly? You are comparing running forth on some other chip to running forth on this chip. How is that useful? There are many other chips that run very fast. So?

    I believe others have said the instruction set is memory oriented with no registers.

    Depending how you look at it, you could claim that it has 64 registers
    and no RAM. It is a quite orthogonal single address architecture. You
    can do practically all single operand instructions (like inc/dec,
    shift/rotate etc.) either in the accumulator but equally well in any
    of the 64 "registers". For two operand instructions (such as add/sub,
    and/or etc,), either the source or destination can be in the memory
    "register".

    Not quite, no. Only the first 16 memory addresses are directly
    accessible for most instructions, with the first 32 addresses being
    available for word-based instructions. So you could liken it to a
    device with 16 registers and indirect memory access to the rest of ram.

    Really ?

    In the manual

    M.n Only addressed in 0~0xF (0~15) is allowed

    The M.n notation is for bit operations, in which M is the byte address
    and n is the bit number in byte. Restricting M to 4 bits makes sense,
    since n requires 3 bits, thus the total address size for bit
    operations would be 7 bits.

    I couldn't find a reference that the restriction on M also applies to
    byte access. Where is it ?


    My interpretation of the manual was that you only had access to the
    first 16 addresses with the M instructions. But it is entirely possible
    that I am wrong and your interpretation is right. I haven't tried the devices, or the IDE, and the manual does not have details of things like instruction format.

    Certainly it would be nicer for the chip if you are right!

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 14 14:32:32 2018
    From Newsgroup: comp.arch.embedded

    On 14/10/18 03:20, gnuarm.deletethisbit@gmail.com wrote:
    On Saturday, October 13, 2018 at 11:00:30 AM UTC-4,
    upsid...@downunder.com wrote:
    On Sat, 13 Oct 2018 05:06:23 -0700 (PDT),
    gnuarm.deletethisbit@gmail.com wrote:

    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
    wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp
    Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory
    locations, so why would you need a lot of CPU
    registers.

    Being able to (say) add register to register saves
    traffic through the accumulator and therefore
    instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages
    of commented assembly program listing.

    It would be nice to have a C compiler, and registers help
    with that.


    Looking at the instruction set, it should be possible to
    make a backend for this in SDCC; the architecture looks
    more C-friendly than the existing pic14 and pic16 backends.
    But it surely isn't as nice as stm8 or z80. reentrant
    functions will be inefficent: No registers, and no
    sp-relative adressing mode. On would want to reserve a few
    memory locations as pseudo-registers to help with that, but
    that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and
    many times can be hosted on the target although not likely in
    this case. Still, you can bring enough functionality onto the
    MCU to allow direct downloads and many debugging features
    without an ICE.

    Rick C.


    Forth is a good language for very small devices, but there are
    details that can make a huge difference in how efficient it is.
    To make Forth work well on a small chip you need a
    Forth-specific instruction set to target the stack processing.
    For example, adding two numbers in this chip is two
    instructions - load accumulator from memory X, add accumulator
    to memory Y. In a Forth cpu, you'd have a single instruction
    that does "pop two numbers, add them, push the result". That
    gives a very efficient and compact instruction set. But it is
    hard to get the same results from a chip that doesn't have this
    kind of stack-based instruction set.

    Your point is what exactly? You are comparing running forth on
    some other chip to running forth on this chip. How is that
    useful? There are many other chips that run very fast. So?

    I believe others have said the instruction set is memory oriented
    with no registers.

    Depending how you look at it, you could claim that it has 64
    registers and no RAM. It is a quite orthogonal single address
    architecture. You can do practically all single operand
    instructions (like inc/dec, shift/rotate etc.) either in the
    accumulator but equally well in any of the 64 "registers". For two
    operand instructions (such as add/sub, and/or etc,), either the
    source or destination can be in the memory "register".

    Both Acc = Acc Op Memory or alternatively Memory = Acc Op Memory
    are valid.

    Thus the accumulator is needed only for two operand instructions,
    but not for single operand instructions.

    How fast are instructions that access memory? Most MCUs will perform register operations in a single cycle. Even though RAM may be on
    chip, it typically is not as fast as registers because it is usually
    not multiported. DSP chips are an exception with dual and even
    triple ported on chip RAM.

    Single cycle, according to the manual. Instructions involving 16-bit
    values are two cycle, the conditional branch instructions may be one or
    two cycles, and everything else is one cycle.

    It is not so hard to make the RAM dual ported when there is only 64
    bytes of it. Or perhaps the core is clocked on both falling and rising
    edges, so that the instructions are effectively 2/4 clocks rather than 1
    or two. We can only guess.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 14 14:37:05 2018
    From Newsgroup: comp.arch.embedded

    On 14/10/18 03:32, gnuarm.deletethisbit@gmail.com wrote:
    On Saturday, October 13, 2018 at 12:21:51 PM UTC-4, David Brown wrote:
    On 13/10/18 14:06, gnuarm.deletethisbit@gmail.com wrote:
    On Saturday, October 13, 2018 at 6:46:20 AM UTC-4, David Brown
    wrote:
    On 12/10/18 18:11, gnuarm.deletethisbit@gmail.com wrote:
    On Friday, October 12, 2018 at 2:50:53 AM UTC-4, Philipp Klaus
    Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory
    locations, so why would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic
    through the accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of
    commented assembly program listing.

    It would be nice to have a C compiler, and registers help
    with that.


    Looking at the instruction set, it should be possible to make a
    backend for this in SDCC; the architecture looks more
    C-friendly than the existing pic14 and pic16 backends. But it
    surely isn't as nice as stm8 or z80. reentrant functions will
    be inefficent: No registers, and no sp-relative adressing mode.
    On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    CPUs like this (and others that aren't like this) should be
    programmed in Forth. It's a great tool for small MCUs and many
    times can be hosted on the target although not likely in this
    case. Still, you can bring enough functionality onto the MCU to
    allow direct downloads and many debugging features without an
    ICE.

    Rick C.


    Forth is a good language for very small devices, but there are
    details that can make a huge difference in how efficient it is. To
    make Forth work well on a small chip you need a Forth-specific
    instruction set to target the stack processing. For example,
    adding two numbers in this chip is two instructions - load
    accumulator from memory X, add accumulator to memory Y. In a Forth
    cpu, you'd have a single instruction that does "pop two numbers,
    add them, push the result". That gives a very efficient and compact
    instruction set. But it is hard to get the same results from a
    chip that doesn't have this kind of stack-based instruction set.

    Your point is what exactly? You are comparing running forth on some
    other chip to running forth on this chip. How is that useful? There
    are many other chips that run very fast. So?

    My point is that /this/ CPU is not a good match for Forth, though many
    other very cheap CPUs are. Whether or not you think that matches "CPUs
    like this should be programmed in Forth" depends on what you mean by
    "CPUs like this", and what you think the benefits of Forth are.


    I believe others have said the instruction set is memory oriented
    with no registers. I think that means in general the CPU will be
    slow compared to a register based design. That actually means it is
    easier to have a fast Forth implementation compared to other
    compilers since there won't be a significant penalty for using a
    stack.


    It has a single register, not unlike the "W" register in small PIC
    devices. Yes, I expect it is going to be slower than you would get from
    having a few more registers. But it is missing (AFAICS) auto-increment
    and decrement modes, and has only load/store operations with indirect
    access.

    So if you have two 8-bit bytes x and y, then adding them as "x += y;" is:

    mov a, y; // 1 clock
    add x, a; // 1 clock

    Keep the TOS in the accumulator and I think you end up with

    add a, x; // 1 clock
    inc DSTKPTR; // adjust stack pointer - 1 clock?

    Does that work? Reading below, I guess not.


    If you have a data stack pointer "dsp", and want a standard Forth "+"
    operation, you have:

    idxm a, dsp; // 2 clock
    mov temp, a; // 1 clock
    dec dsp; // 1 clock
    idxm a, dsp; // 2 clock
    add a, temp; // 1 clock
    idxm dsp, a; // 2 clock

    That is 9 clocks, instead of 2, and 6 instructions instead of 3.

    What does idxm do? Looks like an indirect load? Can this address
    mode be combined with any operations? Are operations limited in the addressing modes? This seems like a very, very simple CPU, but for the
    money, I guess I get it.

    "idxm" is an indirect load or store (depending on the order of the
    operands). No, there are no other operations that can be combined with indirect accesses.

    If you want to keep the TOS in the accumulator, then Forth "+" becomes:

    mov temp, a; // 1 clock
    dec dsp; // 1 clock
    idxm a, dsp; // 2 clock
    add a, temp; // 1 clock

    5 clocks is a good deal better than 9 clocks, but still a good deal
    worse than 2 clocks.



    Of course you could make a Forth compiler for the device - but you would
    have to make an optimising Forth compiler that avoids needing a data
    stack, just as you do on many other small microcontollers (and just as a
    C compiler would do). This is /not/ a processor that fits well with
    Forth or that would give a clear translation from Forth to assembly, as
    is the case on some very small microcontrollers.

    OK


    A stack-based system is often a good choice for very small cpus - it is certainly popular for 4-bit microcontrollers. But it seems that the
    designers of this device simply haven't considered support for
    Forth-style coding to be important.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 14 14:39:07 2018
    From Newsgroup: comp.arch.embedded

    On 13/10/18 18:59, Niklas Holsti wrote:
    And one more iteration (sorry...)

    On 18-10-13 19:46 , Niklas Holsti wrote:
    On 18-10-13 18:31 , Niklas Holsti wrote:

    I don't think that an interpreted Forth is feasible for this particular
    MCU. ...
    Moreover, there is no indirect jump instruction -- "jump to a computed
    address".

    Ok, before anyone else notices, I admit I forgot about implementing an
    indirect jump by pushing the target address on the stack and executing a
    return instruction. That would work for this machine.

    Except that one can only "push" the accumulator and flag registers, combined, and the flag register cannot be set directly, and has only 4 working bits.

    What would work, as an indirect jump, is to set the Stack Pointer (sp)
    to point at a RAM word that contains the target address, and then
    execute a return. But then one has lost the actual Stack Pointer value.


    Or you could read the SP, put that address into a different word memory location, and use that for indirect access to write to the stack.

    It is all possible, but not particularly efficient.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sun Oct 14 14:59:48 2018
    From Newsgroup: comp.arch.embedded

    Am 14.10.2018 um 14:37 schrieb David Brown:

    A stack-based system is often a good choice for very small cpus - it is certainly popular for 4-bit microcontrollers.  But it seems that the designers of this device simply haven't considered support for
    Forth-style coding to be important.

    Efficient stack acccess is important for C, too. Putting local variables
    on the stack makes functions reentrant (not so important for small
    devices), and also saves memory (bery important for small devices).

    The STM8 and S08 with their efficent sp-relative adressing and the Z80
    with the index registers thus make better targets for C compilers than
    the MCS-51 and HC08.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 08:14:22 2018
    From Newsgroup: comp.arch.embedded

    Am 14.10.2018 um 10:58 schrieb upsidedown@downunder.com:

    Interesting, this at least confirms that the instruction word is 16
    bits. In a Harvard architecture, the word length could have been
    13-17 bits, with some dirty encodings in 113 bit case., but a cleaner encoding with 14-17 bit instruction words.

    Assuming one would like to make an encoding for exactly 1024 code
    words and 64 byte data memory, a tighter encoding would be possible.
    Of course a manufacturer with small and larger processors, would make
    sense to use the same encoding for all processors, which is slightly inefficient for smaller models.

    Indeed Padauk makes variants with up to 256 B of RAM.


    Anyway 1 kW/64 byes case, the following code points would be required:

    2048 = 2 x 1024 call, goto
    1792 = 7 x 256 Immediate data (8 bit)
    2304 = 36 x 64 M-referense (6 bit)
    1024 = 8 x 128 Bit ref (M and IO 3+4 bits
    others

    This might barely fit into 13 bits, with some nasty encoding.

    Limiting M-refeence to 4 bits (0-15), but you still can't fit into 12
    bit instruction length.

    So with 16 bit word length, I do not understand why word reference is
    limited to 4-5 bits.The bit address limit makes more sense, so that it
    would not consume 4096 code points.


    Maybe the M-reference limit only applies to the bit manipulation
    instructions? The line in the manual explains M.n, there is no seprate
    line for M; maybe they only documented the restrictions, with M then
    referring to the full 8-bit range outside of bit manipulation instructions?

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 10:44:07 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    With such small ROM/RAM sizes, who needs reentrant functions ?

    Everyone. With an efficent stack-pointer-relative addresing mode, you
    put all local varibles on the stack and only need as much RAM as the
    local variables along the longest path in the call tree.

    If your local variables are all static, the local variables of two
    functions that never get called at the same time still both takespace in
    RAM at the same time.

    Compilers can sometimes overly local variables on non-reentrant
    functions as an optimization, but that will only work for some cases;
    often it would require link-timeoptimization, which is not that common
    in compilers for small µCs.

    Example: main() calls f() and g(); both f() and g() call h(). All four functions are in different translation units, f() and g() both use a lot
    of local variables, while main() and h() use little. Without link-time optimization, the compiler will use about as much RAM as f() and g()
    together, when the local variables are static. When they are put on the
    stack, it will only need as much RAM as either f() or g().

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Mon Oct 15 05:11:52 2018
    From Newsgroup: comp.arch.embedded

    On Sunday, October 14, 2018 at 8:39:10 AM UTC-4, David Brown wrote:
    On 13/10/18 18:59, Niklas Holsti wrote:
    And one more iteration (sorry...)

    On 18-10-13 19:46 , Niklas Holsti wrote:
    On 18-10-13 18:31 , Niklas Holsti wrote:

    I don't think that an interpreted Forth is feasible for this particular >>> MCU. ...
    Moreover, there is no indirect jump instruction -- "jump to a computed >>> address".

    Ok, before anyone else notices, I admit I forgot about implementing an
    indirect jump by pushing the target address on the stack and executing a >> return instruction. That would work for this machine.

    Except that one can only "push" the accumulator and flag registers, combined, and the flag register cannot be set directly, and has only 4 working bits.

    What would work, as an indirect jump, is to set the Stack Pointer (sp)
    to point at a RAM word that contains the target address, and then
    execute a return. But then one has lost the actual Stack Pointer value.


    Or you could read the SP, put that address into a different word memory location, and use that for indirect access to write to the stack.

    It is all possible, but not particularly efficient.
    Efficiency has to be relative on such a limited machine. If there are no registers nearly everything is going to be clumsy and slow. I'm not sure using this CPU with Forth would be at all bad even if the CPU is not intended for Forth.
    One of the things that makes Forth so useful is that it can be tailored to the target. Rather than use the standard words you can write your own words that better fit the architecture. I'm not a Forth system designer, but I have designed CPUs in FPGAs and being able to target my CPU design with Forth is great. My CPU uses an 8 or 9 bit instruction size with multibyte instructions by prepending immediate addresses or data. I was able to make that work easily in Forth while it would have been a bear in C or other languages.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 14:19:19 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.2018 um 10:18 schrieb Philipp Klaus Krause:

    They even make dual-core variants […]

    And there is the MCS11, with 8 cores.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 14:20:23 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.2018 um 08:50 schrieb Philipp Klaus Krause:
    On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.

    On the other hand, saving those pseudo-registers at interrupts and
    across function calls will be painful.

    Philipp

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Oct 15 14:35:18 2018
    From Newsgroup: comp.arch.embedded

    Am 10.10.2018 um 03:05 schrieb Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    If you are willing to pay 0.04$, you can get twice the RAM and program
    memory (not OTP for this one):

    https://detail.1688.com/offer/562502806054.html

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From raimond.dragomir@raimond.dragomir@gmail.com to comp.arch.embedded on Mon Oct 15 06:05:16 2018
    From Newsgroup: comp.arch.embedded

    luni, 15 octombrie 2018, 15:35:22 UTC+3, Philipp Klaus Krause a scris:
    Am 10.10.2018 um 03:05 schrieb Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    If you are willing to pay 0.04$, you can get twice the RAM and program
    memory (not OTP for this one):

    https://detail.1688.com/offer/562502806054.html

    Philipp
    Nah... not sure. 4c is too much... :-D
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Mon Oct 15 07:19:00 2018
    From Newsgroup: comp.arch.embedded

    On Monday, October 15, 2018 at 9:05:23 AM UTC-4, raimond....@gmail.com wrote:
    luni, 15 octombrie 2018, 15:35:22 UTC+3, Philipp Klaus Krause a scris:
    Am 10.10.2018 um 03:05 schrieb Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    If you are willing to pay 0.04$, you can get twice the RAM and program memory (not OTP for this one):

    https://detail.1688.com/offer/562502806054.html

    Philipp

    Nah... not sure. 4c is too much... :-D
    Too much you say? How about THIS deal???
    http://www.youboy.com/s504250937.html
    Three for a penny! But wait, there's MORE!!! It also has more memory and an ADC.
    Not sure how you actually order any of this stuff.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Mon Oct 15 07:26:46 2018
    From Newsgroup: comp.arch.embedded

    gnuarm.deletethisbit@gmail.com writes:
    http://www.youboy.com/s504250937.html
    Three for a penny! But wait, there's MORE!!! It also has more memory
    and an ADC.

    That's 0.35 Chinese Yuan (not Japanese Yen, which uses a similar-looking currency symbol) so about 0.05 USD.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Mon Oct 15 07:29:03 2018
    From Newsgroup: comp.arch.embedded

    Philipp Klaus Krause <pkk@spth.de> writes:
    Compilers can sometimes overly local variables on non-reentrant
    functions as an optimization, but that will only work for some cases;
    often it would require link-timeoptimization, which is not that common
    in compilers for small µCs.

    Normally you'd use whole-program optimization, I thought. I don't know
    if SDCC supports that, but GCC does, as do the more serious commercial
    embedded compilers.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Mon Oct 15 07:30:26 2018
    From Newsgroup: comp.arch.embedded

    On Monday, October 15, 2018 at 10:26:51 AM UTC-4, Paul Rubin wrote:
    gnuarm.deletethisbit@gmail.com writes:
    http://www.youboy.com/s504250937.html
    Three for a penny! But wait, there's MORE!!! It also has more memory
    and an ADC.

    That's 0.35 Chinese Yuan (not Japanese Yen, which uses a similar-looking currency symbol) so about 0.05 USD.

    Ok, thanks. So much for using Google to translate currency. You just saved my fledgling import, export, arbitrage business!

    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Mon Oct 15 21:34:10 2018
    From Newsgroup: comp.arch.embedded

    On Mon, 15 Oct 2018 10:44:07 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    With such small ROM/RAM sizes, who needs reentrant functions ?

    Everyone. With an efficent stack-pointer-relative addresing mode, you
    put all local varibles on the stack and only need as much RAM as the
    local variables along the longest path in the call tree.

    If you do not have efficient stack pointer relative addressing modes,
    why would you put local variables on stack ?

    If your local variables are all static, the local variables of two
    functions that never get called at the same time still both takespace in
    RAM at the same time.

    Just create global variables Tmp1, Tmp2, Tmp3 ... and use these as
    function local variables. As long as two functions do not call each
    other directly or indirectly, you can safely use these global
    variables as function local variables.

    To make your program even prettier, use function specific aliases for
    Tmp1, Tnp2 etc.. by using #define statements in C or multiple labels
    in assembly language storage allocation.


    Compilers can sometimes overly local variables on non-reentrant
    functions as an optimization, but that will only work for some cases;
    often it would require link-timeoptimization, which is not that common
    in compilers for small µCs.

    Why do you need a linker for such small processor.? Since you are
    going to use a cross-copiler on a PC with mega/gigabytes of memory,
    you just compile/assemble everything into binary at once.

    Example: main() calls f() and g(); both f() and g() call h(). All four >functions are in different translation units, f() and g() both use a lot
    of local variables, while main() and h() use little. Without link-time >optimization, the compiler will use about as much RAM as f() and g() >together, when the local variables are static. When they are put on the >stack, it will only need as much RAM as either f() or g().

    See above, no need for linker or stack variables.


    The question is, do you even need full scale parameter passing ?

    Function h() could use some predefined memory locations and both f()
    and g() can put the parameter into those memory locations before
    calling h(). Only those parameters that are different when calling
    from f() or g() needs to be passed, h() can get the parameters that
    are the same in both cases do not need to be passed, h() knows the
    parameter already at startup.


    Of course such tricks becomes impractical with larger system, but with
    1 KW / 64 B, this should definitely be doable.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From =?UTF-8?Q?Hans-Bernhard_Br=c3=b6ker?=@HBBroeker@t-online.de to comp.arch.embedded on Mon Oct 15 21:22:37 2018
    From Newsgroup: comp.arch.embedded

    Am 15.10.2018 um 10:44 schrieb Philipp Klaus Krause:
    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    With such small ROM/RAM sizes, who needs reentrant functions ?

    Everyone.

    Absolutely not. Reentrant functions are a massive nuisance on fully
    embedded systems, if only because they routinely make it impossible to determine the actual stack size usage.

    With an efficent stack-pointer-relative addresing mode, you
    put all local varibles on the stack and only need as much RAM as the
    local variables along the longest path in the call tree.

    And without such an addressing mode, you don't, because you'll suffer
    badly in every conceivable aspect.

    If your local variables are all static, the local variables of two
    functions that never get called at the same time still both takespace in
    RAM at the same time.

    So don't mark them 'static', unless they actually have to be.

    Compilers can sometimes overly local variables on non-reentrant
    functions as an optimization, but that will only work for some cases;
    often it would require link-timeoptimization, which is not that common
    in compilers for small µCs.

    On the contrary: it's precisely the compilers for such stack-starved architectures (e.g. the 8051) that have been coupling behind-the-scenes
    static allocation of automatic variables with whole-program overlay
    analysis since effectively forever. They really had to, because the alternative would be painful to the point of being unusable.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Tue Oct 16 09:19:44 2018
    From Newsgroup: comp.arch.embedded

    Am 15.10.2018 um 16:29 schrieb Paul Rubin:
    Philipp Klaus Krause <pkk@spth.de> writes:
    Compilers can sometimes overly local variables on non-reentrant
    functions as an optimization, but that will only work for some cases;
    often it would require link-timeoptimization, which is not that common
    in compilers for small µCs.

    Normally you'd use whole-program optimization, I thought. I don't know
    if SDCC supports that, but GCC does, as do the more serious commercial embedded compilers.


    Does GCC support any of these very simple µC architectures? I thought
    anyhting supported by GCC tends to have rather powerful insturction sets
    and plenty of registers aynway, so functions could be made reentrant by
    default without any problems resulting.

    While some link-time optimizations are commonly requested features for
    SDCC, currently none are supported. In SDCC, even inter-procedural optimizations within the same translation unit are not as powerful as
    they should be.
    Well, there always is a lot of work to do on SDCC, and there are only a
    few volunteers with time to work on it. So SDCC developers priorize
    (usually by personal preferences).

    Still, when looking at the big picture, SDCC is doing quite well
    compared to other compilers for the same architectures (see e.g. http://www.colecovision.eu/stm8/compilers.shtml - comparison from early
    2018, around the time of the SDCC 3.7.0 release - current SDCC is 3.8.0).

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Tue Oct 16 09:29:30 2018
    From Newsgroup: comp.arch.embedded

    Am 15.10.2018 um 20:34 schrieb upsidedown@downunder.com:
    On Mon, 15 Oct 2018 10:44:07 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    With such small ROM/RAM sizes, who needs reentrant functions ?

    Everyone. With an efficent stack-pointer-relative addresing mode, you
    put all local varibles on the stack and only need as much RAM as the
    local variables along the longest path in the call tree.

    If you do not have efficient stack pointer relative addressing modes,
    why would you put local variables on stack ?

    1) The question was "With such small ROM/RAM sizes, who needs reentrant functions ?". And it was a reply to a post were the lack of a effiednt sp-relative addressing mode was cited as a disadvatage of the Padauk.
    So my reasoning was, that one would want local variables on the stack,
    even for small RAM / ROM, so the lack of sp-relative addressing is a disadvatage - as one has to either put local variables elsewhere or
    handle stackk accesses in an inefficent way.

    2) There will still be some use cases for reentrant functions. And since
    the Padauk has a relatively large ROM - at least compared to the RAM
    (the ROM/RAM ratio seems far higher than on typical STM8 or MCS-51
    devices), when speed doesn't matter, maybe it might still be worth
    putting variables on the stack. A compiler should provide an option for
    that (as, e.g. SDCC does for architectures without efficnet stack
    access, such as MCS-51 or HC08).


    If your local variables are all static, the local variables of two
    functions that never get called at the same time still both takespace in
    RAM at the same time.

    Just create global variables Tmp1, Tmp2, Tmp3 ... and use these as
    function local variables. As long as two functions do not call each
    other directly or indirectly, you can safely use these global
    variables as function local variables.

    I'd rather write idiomatic C anmd leave such optimizations to the compiler.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Tue Oct 16 10:00:59 2018
    From Newsgroup: comp.arch.embedded

    Am 15.10.2018 um 21:22 schrieb Hans-Bernhard Bröker:
    Am 15.10.2018 um 10:44 schrieb Philipp Klaus Krause:
    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    With such small ROM/RAM sizes, who needs reentrant functions ?

    Everyone.

    Absolutely not. Reentrant functions are a massive nuisance on fully
    embedded systems, if only because they routinely make it impossible to determine the actual stack size usage.

    What is the problem?
    Either you use recursion - in which case the functions need to be
    reentrant, there is no alternative, or you don't. In the latter case
    you'd need to do whole-program analysis to efficiently overlay the
    variables - a very similar analysis could tell you the the total stack
    usage.


    With an efficent stack-pointer-relative addresing mode, you
    put all local varibles on the stack and only need as much RAM as the
    local variables along the longest path in the call tree.

    And without such an addressing mode, you don't, because you'll suffer
    badly in every conceivable aspect.

    Yes. So compilers like SDCC when targeting MCS-51 or HC08 don't use the
    stack by default (--stack-auto puts local variables on the stack
    per-file, __reentrant does so per function).


    Compilers can sometimes overly local variables on non-reentrant
    functions as an optimization, but that will only work for some cases;
    often it would require link-timeoptimization, which is not that common
    in compilers for small µCs.

    On the contrary: it's precisely the compilers for such stack-starved architectures (e.g. the 8051) that have been coupling behind-the-scenes static allocation of automatic variables with whole-program overlay
    analysis since effectively forever. They really had to, because the alternative would be painful to the point of being unusable.


    Well, SDCC when targeting MCS-51 or HC08 would be the combination that I
    know a bit about (though personally, I mostly use SDCC to target Z80 or
    STM8, which can both use the stack efficiently). SDCC doesn't really
    have link-time optimization yet, compilation units are handled
    independently. Regarding different compilation units, it can still
    overlay the variables of leaf functions - i.e. non-reentrant differnet
    that do not call non-reentrant functions, but not much more.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Oct 16 13:48:37 2018
    From Newsgroup: comp.arch.embedded

    On 16/10/18 09:19, Philipp Klaus Krause wrote:
    Am 15.10.2018 um 16:29 schrieb Paul Rubin:
    Philipp Klaus Krause <pkk@spth.de> writes:
    Compilers can sometimes overly local variables on non-reentrant
    functions as an optimization, but that will only work for some cases;
    often it would require link-timeoptimization, which is not that common
    in compilers for small µCs.

    Normally you'd use whole-program optimization, I thought. I don't know
    if SDCC supports that, but GCC does, as do the more serious commercial
    embedded compilers.


    Does GCC support any of these very simple µC architectures?

    No.

    I thought
    anyhting supported by GCC tends to have rather powerful insturction sets
    and plenty of registers aynway, so functions could be made reentrant by default without any problems resulting.

    Most gcc targets are quite powerful, with plenty of registers - and
    re-entrancy is not a problem. Some are a bit weaker, like the 8-bit
    AVR, and get inefficient with complicated stack usage. But it does not
    support the 8-bit CISC accumulator-based devices that SDCC targets.


    While some link-time optimizations are commonly requested features for
    SDCC, currently none are supported. In SDCC, even inter-procedural optimizations within the same translation unit are not as powerful as
    they should be.
    Well, there always is a lot of work to do on SDCC, and there are only a
    few volunteers with time to work on it. So SDCC developers priorize
    (usually by personal preferences).

    Still, when looking at the big picture, SDCC is doing quite well
    compared to other compilers for the same architectures (see e.g. http://www.colecovision.eu/stm8/compilers.shtml - comparison from early
    2018, around the time of the SDCC 3.7.0 release - current SDCC is 3.8.0).

    Philipp


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From =?UTF-8?Q?Hans-Bernhard_Br=c3=b6ker?=@HBBroeker@t-online.de to comp.arch.embedded on Tue Oct 16 22:52:37 2018
    From Newsgroup: comp.arch.embedded

    Am 16.10.2018 um 10:00 schrieb Philipp Klaus Krause:
    Am 15.10.2018 um 21:22 schrieb Hans-Bernhard Bröker:
    Am 15.10.2018 um 10:44 schrieb Philipp Klaus Krause:
    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    With such small ROM/RAM sizes, who needs reentrant functions ?

    Everyone.

    Absolutely not. Reentrant functions are a massive nuisance on fully
    embedded systems, if only because they routinely make it impossible to
    determine the actual stack size usage.

    What is the problem?

    The major part of it is that I mixed up Reentrance with Recursion there
    ... sorry for that.

    OTOH, one does tend to influence the other. Without recursion, one
    would only really need reentrance to be able to call the same function
    from separate threads of execution. On controllers this small, that
    would only happen if you're calling the same function from inside an
    interrupt handler and the main loop. And frankly: you really don't want
    to do that. If an ISR on this kind of hardware becomes big enough you
    feel the need to split it into sub-functions, that almost certainly
    means you've picked entirely the wrong tool for the job.

    In other words: for this kind of system (very small, with rotten
    stack-based addressing), not only doesn't everyone need re-entrant
    functions, it's more like nobody does.

    On the contrary: it's precisely the compilers for such stack-starved
    architectures (e.g. the 8051) that have been coupling behind-the-scenes
    static allocation of automatic variables with whole-program overlay
    analysis since effectively forever. They really had to, because the
    alternative would be painful to the point of being unusable.


    Well, SDCC when targeting MCS-51 or HC08 would be the combination that I
    know a bit about

    I don't think anyone has ever seriously claimed SDCC to be anywhere near
    the pinnacle of compiler design for the 8051. ;-P

    Frankly, just looking at statements in this thread has me thinking that
    the usual suspects among commercial offerings from 20 years ago might
    still run circles around it today.

    SDCC doesn't really have link-time optimization yet, compilation
    units are handled independently.
    Well, given the gigantic scale differences between the target hardware
    and the build host, just turning the whole thing into a single
    compilation unit (by force, if necessary) should really be a no-brainer.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Tue Oct 16 14:01:43 2018
    From Newsgroup: comp.arch.embedded

    On Tuesday, October 16, 2018 at 4:52:44 PM UTC-4, Hans-Bernhard Bröker wrote:
    Am 16.10.2018 um 10:00 schrieb Philipp Klaus Krause:
    Am 15.10.2018 um 21:22 schrieb Hans-Bernhard Bröker:
    Am 15.10.2018 um 10:44 schrieb Philipp Klaus Krause:
    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    With such small ROM/RAM sizes, who needs reentrant functions ?

    Everyone.

    Absolutely not. Reentrant functions are a massive nuisance on fully
    embedded systems, if only because they routinely make it impossible to
    determine the actual stack size usage.

    What is the problem?

    The major part of it is that I mixed up Reentrance with Recursion there
    ... sorry for that.

    OTOH, one does tend to influence the other. Without recursion, one
    would only really need reentrance to be able to call the same function
    from separate threads of execution. On controllers this small, that
    would only happen if you're calling the same function from inside an interrupt handler and the main loop.
    I don't believe this is correct. Reentrance is a problem any time a routine is entered again before it is exited from a prior call. This can happen without multiple threads when a routine is called from a routine that was ultimately called from within the routine. I suppose you might consider this to be recursion, but my point is this can happen without the intent of using recursion.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From =?UTF-8?Q?Hans-Bernhard_Br=c3=b6ker?=@HBBroeker@t-online.de to comp.arch.embedded on Wed Oct 17 00:03:51 2018
    From Newsgroup: comp.arch.embedded

    Am 16.10.2018 um 23:01 schrieb gnuarm.deletethisbit@gmail.com:
    On Tuesday, October 16, 2018 at 4:52:44 PM UTC-4, Hans-Bernhard
    Bröker wrote:

    OTOH, one does tend to influence the other. Without recursion,
    one would only really need reentrance to be able to call the same
    function from separate threads of execution. On controllers this
    small, that would only happen if you're calling the same function
    from inside an interrupt handler and the main loop.

    I don't believe this is correct. Reentrance is a problem any time a
    routine is entered again before it is exited from a prior call. This
    can happen without multiple threads when a routine is called from a
    routine that was ultimately called from within the routine. I
    suppose you might consider this to be recursion,

    Oh, there's no doubt about it: that's recursion all right.

    Some might prefer to qualify it as indirect recursion, a.k.a. a loop in
    the call graph, but it's still recursion.

    but my point is this
    can happen without the intent of using recursion.

    I'll asume we agree on this: unintended recursion is clear a bug in the
    code, every time.

    That could arguably be classified an actual benefit of using a such a stack-starved CPU architecture: any competent C compiler for it will
    have to perform call tree analysis anyway, so it finds that particular
    bug "en passant".

    More typical C toolchains relying on stack-centric calling conventions
    might not bother with such analysis, and thus won't see the bug. Until
    you use the accompanying stack size calculation tool, that is, which
    will barf.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Clifford Heath@no.spam@please.net to comp.arch.embedded on Wed Oct 17 09:22:58 2018
    From Newsgroup: comp.arch.embedded

    On 17/10/18 09:03, Hans-Bernhard Bröker wrote:
    Am 16.10.2018 um 23:01 schrieb gnuarm.deletethisbit@gmail.com:
    Reentrance is a problem any time a
    routine is entered again before it is exited from a prior call. This
    can happen without multiple threads when a routine is called from a
    routine that was ultimately called from within the routine. I
    suppose you might consider this to be recursion,
    Oh, there's no doubt about it: that's recursion all right.
    Some might prefer to qualify it as indirect recursion, a.k.a. a loop in
    the call graph, but it's still recursion.
    We call it mutual recursion.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Tue Oct 16 15:29:29 2018
    From Newsgroup: comp.arch.embedded

    On Tuesday, October 16, 2018 at 6:03:55 PM UTC-4, Hans-Bernhard Bröker wrote:
    Am 16.10.2018 um 23:01 schrieb gnuarm.deletethisbit@gmail.com:
    On Tuesday, October 16, 2018 at 4:52:44 PM UTC-4, Hans-Bernhard
    Bröker wrote:

    OTOH, one does tend to influence the other. Without recursion,
    one would only really need reentrance to be able to call the same
    function from separate threads of execution. On controllers this
    small, that would only happen if you're calling the same function
    from inside an interrupt handler and the main loop.

    I don't believe this is correct. Reentrance is a problem any time a routine is entered again before it is exited from a prior call. This
    can happen without multiple threads when a routine is called from a
    routine that was ultimately called from within the routine. I
    suppose you might consider this to be recursion,

    Oh, there's no doubt about it: that's recursion all right.

    Some might prefer to qualify it as indirect recursion, a.k.a. a loop in
    the call graph, but it's still recursion.

    but my point is this
    can happen without the intent of using recursion.

    I'll asume we agree on this: unintended recursion is clear a bug in the
    code, every time.
    Clearly there would be a bug, but it is just as much that the routine wasn't designed for recursion and that would be the most likely fix.
    That could arguably be classified an actual benefit of using a such a stack-starved CPU architecture: any competent C compiler for it will
    have to perform call tree analysis anyway, so it finds that particular
    bug "en passant".
    Are you swearing at me in French? ;)
    More typical C toolchains relying on stack-centric calling conventions
    might not bother with such analysis, and thus won't see the bug. Until
    you use the accompanying stack size calculation tool, that is, which
    will barf.
    Yeah, I'm not much of a C programmer, so I wouldn't know about such tools. What made me think of this is a problem often encountered by novices in Forth. Some system words use globally static data and can be called twice from different code before the first call has ended use of the data structure. Not quite the same thing as recursion, but the same result.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Oct 17 00:46:57 2018
    From Newsgroup: comp.arch.embedded

    On 17/10/2018 00:03, Hans-Bernhard Bröker wrote:

    I'll asume we agree on this: unintended recursion is clear a bug in the
    code, every time.

    I think we can agree that /any/ unintended action is a clear bug in the
    code!

    But recursion or re-entrancy without a clear purpose and careful limits
    on depths is a bug in the /design/, not just the code.

    When I am faced with someone else's code to examine or maintain, I often
    run it through Doxygen with "generate documentation for /everything/ -
    caller graphs, callee graphs, cross-linked source, etc." It can make it
    quick to jump around in the code. And recursive (or re-entrant,
    whichever you prefer) code stands out like a sore thumb, as long as the
    code is single-threaded - you get loops in the call graphs.

    The only other case is if interrupts call other functions - that won't
    be seen so easily.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Wed Oct 17 08:23:57 2018
    From Newsgroup: comp.arch.embedded

    Am 16.10.2018 um 22:52 schrieb Hans-Bernhard Bröker:

    Without recursion, one
    would only really need reentrance to be able to call the same function
    from separate threads of execution. On controllers this small, that
    would only happen if you're calling the same function from inside an interrupt handler and the main loop. And frankly: you really don't want
    to do that. If an ISR on this kind of hardware becomes big enough you
    feel the need to split it into sub-functions, that almost certainly
    means you've picked entirely the wrong tool for the job.

    In other words: for this kind of system (very small, with rotten
    stack-based addressing), not only doesn't everyone need re-entrant
    functions, it's more like nobody does.

    Multithreading matters here. It is not common on such small devices, but
    this one is an exception: Padauk sells multiple dual-core variants of
    this controller and one 8-core variant.
    And there is always the support functions the compilers tend to need on
    small systems (while I assume people would think twice before using an expensive division in an interrupt handler, the situation looks
    different for multithreading).


    I don't think anyone has ever seriously claimed SDCC to be anywhere near
    the pinnacle of compiler design for the 8051. ;-P

    Frankly, just looking at statements in this thread has me thinking that
    the usual suspects among commercial offerings from 20 years ago might
    still run circles around it today.

    I don't know of a current comparison for the MCS-51.

    For MCS-51, I do not know of a good compiler comparison; I did some
    benchmarks a while ago (https://sourceforge.net/p/sdcc/mailman/message/36359114/), and SDCC
    still has a bit of catching-up to do.

    On the other hand, for the STM8, SDCC seems to be doing more than just
    okay: http://www.colecovision.eu/stm8/compilers.shtml

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Wed Oct 17 09:35:45 2018
    From Newsgroup: comp.arch.embedded

    On 18-10-17 01:46 , David Brown wrote:
    ...
    When I am faced with someone else's code to examine or maintain, I often
    run it through Doxygen with "generate documentation for /everything/ -
    caller graphs, callee graphs, cross-linked source, etc." It can make it quick to jump around in the code. And recursive (or re-entrant,
    whichever you prefer) code stands out like a sore thumb, as long as the
    code is single-threaded - you get loops in the call graphs.

    Anecdote: some years ago, when I was applying a WCET analysis tool to
    someone else's program, the tool found recursion. This surprised the
    people I was working with, because they had generated call graphs for
    the program, analysed them visually, and found no recursive, looping paths.

    Turned out that they had asked the call-graph tool to optimize the size
    of the window used to display the call-graphs. The tool did as it was
    told, with the result that the line segments on the path for the
    recursive call went down to the bottom edge of the diagram, then
    *merged* with the lower border line of the diagram, followed that lower border, went up one side of the diagram -- still merged with the border
    line -- and then reentered the diagram to point at the source of the
    recursive call, effectively making the loop very hard to see...

    (It turned out that this recursion was intentional. At this point, the
    program was sending an alarm message, but the alarm buffer was full, so
    the alarm routine called itself to send an alarm about the full buffer
    -- and that worked, because one buffer slot was reserved, by design, for
    this "buffer full" alarm.)
    --
    Niklas Holsti
    Tidorum Ltd
    niklas holsti tidorum fi
    . @ .
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Oct 17 09:31:17 2018
    From Newsgroup: comp.arch.embedded

    On 17/10/18 08:35, Niklas Holsti wrote:
    On 18-10-17 01:46 , David Brown wrote:
    ...
    When I am faced with someone else's code to examine or maintain, I often
    run it through Doxygen with "generate documentation for /everything/ -
    caller graphs, callee graphs, cross-linked source, etc." It can make it
    quick to jump around in the code. And recursive (or re-entrant,
    whichever you prefer) code stands out like a sore thumb, as long as the
    code is single-threaded - you get loops in the call graphs.

    Anecdote: some years ago, when I was applying a WCET analysis tool to
    someone else's program, the tool found recursion. This surprised the
    people I was working with, because they had generated call graphs for
    the program, analysed them visually, and found no recursive, looping paths.

    Turned out that they had asked the call-graph tool to optimize the size
    of the window used to display the call-graphs. The tool did as it was
    told, with the result that the line segments on the path for the
    recursive call went down to the bottom edge of the diagram, then
    *merged* with the lower border line of the diagram, followed that lower border, went up one side of the diagram -- still merged with the border
    line -- and then reentered the diagram to point at the source of the recursive call, effectively making the loop very hard to see...

    Visual tools are helpful, but don't show everything!

    On the other hand, they can show things that can be hard to quantify in
    more rigorous tools. It is easy to look at the call graph of a function
    and say "that function is a bowl of spaghetti, and needs restructured" -
    it's harder to define rules or limits for an automatic checker that make
    such judgements.


    (It turned out that this recursion was intentional. At this point, the program was sending an alarm message, but the alarm buffer was full, so
    the alarm routine called itself to send an alarm about the full buffer
    -- and that worked, because one buffer slot was reserved, by design, for
    this "buffer full" alarm.)


    I can appreciate the purpose here, but I would rather have this:

    static bool putAlarmInLog(alarmPtr slot, ...) { ... }

    static alarmSlot_t alarmSlots[maxAlarmSlots];
    static alarmSlot_t emergencyAlarmSlot;

    static alarmPtr findFreeAlarmSlot(void) { ... }

    void logAlarm(...) {
    alarmPtr = findFreeAlarmSlot();
    if (alarmPtr) {
    putAlarmInLog(alarmPtr, ...);
    } else {
    putAlarmInLog(&emergencyAlarmSlot, "Buffer full");
    }
    }


    Hoist the condition checks up a step, and put the actual storage
    mechanism down a step, and you no longer have the re-entrancy. The code
    is a lot easier to write, read, analyse and test.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Wed Oct 17 07:08:30 2018
    From Newsgroup: comp.arch.embedded

    On Wednesday, October 17, 2018 at 2:35:46 AM UTC-4, Niklas Holsti wrote:
    On 18-10-17 01:46 , David Brown wrote:
    ...
    When I am faced with someone else's code to examine or maintain, I often run it through Doxygen with "generate documentation for /everything/ - caller graphs, callee graphs, cross-linked source, etc." It can make it quick to jump around in the code. And recursive (or re-entrant,
    whichever you prefer) code stands out like a sore thumb, as long as the code is single-threaded - you get loops in the call graphs.

    Anecdote: some years ago, when I was applying a WCET analysis tool to someone else's program, the tool found recursion. This surprised the
    people I was working with, because they had generated call graphs for
    the program, analysed them visually, and found no recursive, looping paths.

    Turned out that they had asked the call-graph tool to optimize the size
    of the window used to display the call-graphs. The tool did as it was
    told, with the result that the line segments on the path for the
    recursive call went down to the bottom edge of the diagram, then
    *merged* with the lower border line of the diagram, followed that lower border, went up one side of the diagram -- still merged with the border
    line -- and then reentered the diagram to point at the source of the recursive call, effectively making the loop very hard to see...

    (It turned out that this recursion was intentional. At this point, the program was sending an alarm message, but the alarm buffer was full, so
    the alarm routine called itself to send an alarm about the full buffer
    -- and that worked, because one buffer slot was reserved, by design, for this "buffer full" alarm.)

    Seems to me what actually failed was that they knew they had recursion in the design but didn't realize the fact that they didn't see the recursion in the call graphs was an error that should have been caught.

    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Wed Oct 17 11:31:27 2018
    From Newsgroup: comp.arch.embedded

    On Tue, 16 Oct 2018 22:52:37 +0200, Hans-Bernhard Bröker <HBBroeker@t-online.de> wrote:

    The major part of it is that I mixed up Reentrance with Recursion there
    ... sorry for that.

    OTOH, one does tend to influence the other. Without recursion, one
    would only really need reentrance to be able to call the same function
    from separate threads of execution.

    Recursion simply is a particlar useage of re-entrance: a function
    calling itself (possibly indirectly through other functions).

    You can use re-entrant functions without using recursion, but you
    can't recurse without re-entrant functions.


    Of course, re-entrance doesn't depend on a CPU stack ... it requires
    only that the local variables of each instance be kept separate. That
    can be done with auxiliary data structures.
    [It's interesting to watch programming students reinvent recursion - accidentally, or as an exercise - and realize all the effort saved by
    having it built into the language.]

    George

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Wed Oct 17 18:37:14 2018
    From Newsgroup: comp.arch.embedded

    On 18-10-17 17:08 , gnuarm.deletethisbit@gmail.com wrote:
    On Wednesday, October 17, 2018 at 2:35:46 AM UTC-4, Niklas Holsti
    wrote:
    On 18-10-17 01:46 , David Brown wrote: ...
    When I am faced with someone else's code to examine or maintain,
    I often run it through Doxygen with "generate documentation for
    /everything/ - caller graphs, callee graphs, cross-linked source,
    etc." It can make it quick to jump around in the code. And
    recursive (or re-entrant, whichever you prefer) code stands out
    like a sore thumb, as long as the code is single-threaded - you
    get loops in the call graphs.

    Anecdote: some years ago, when I was applying a WCET analysis tool
    to someone else's program, the tool found recursion. This surprised
    the people I was working with, because they had generated call
    graphs for the program, analysed them visually, and found no
    recursive, looping paths.

    Turned out that they had asked the call-graph tool to optimize the
    size of the window used to display the call-graphs. The tool did as
    it was told, with the result that the line segments on the path for
    the recursive call went down to the bottom edge of the diagram,
    then *merged* with the lower border line of the diagram, followed
    that lower border, went up one side of the diagram -- still merged
    with the border line -- and then reentered the diagram to point at
    the source of the recursive call, effectively making the loop very
    hard to see...

    (It turned out that this recursion was intentional. At this point,
    the program was sending an alarm message, but the alarm buffer was
    full, so the alarm routine called itself to send an alarm about the
    full buffer -- and that worked, because one buffer slot was
    reserved, by design, for this "buffer full" alarm.)

    Seems to me what actually failed was that they knew they had
    recursion in the design but didn't realize the fact that they didn't
    see the recursion in the call graphs was an error that should have
    been caught.

    The guys creating and viewing the call-graphs were not the designers of
    the program, either, so they didn't know, but for sure it was something
    they should have discovered and remarked on as part of their work.
    --
    Niklas Holsti
    Tidorum Ltd
    niklas holsti tidorum fi
    . @ .
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Wed Oct 17 19:43:09 2018
    From Newsgroup: comp.arch.embedded

    On Wed, 17 Oct 2018 08:23:57 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 16.10.2018 um 22:52 schrieb Hans-Bernhard Bröker:

    Without recursion, one
    would only really need reentrance to be able to call the same function
    from separate threads of execution. On controllers this small, that
    would only happen if you're calling the same function from inside an
    interrupt handler and the main loop. And frankly: you really don't want
    to do that. If an ISR on this kind of hardware becomes big enough you
    feel the need to split it into sub-functions, that almost certainly
    means you've picked entirely the wrong tool for the job.

    In other words: for this kind of system (very small, with rotten
    stack-based addressing), not only doesn't everyone need re-entrant
    functions, it's more like nobody does.

    Multithreading matters here. It is not common on such small devices, but
    this one is an exception: Padauk sells multiple dual-core variants of
    this controller and one 8-core variant.

    While I have been playing around with the idea of making some RTOS for
    such 1kW/64B machine (realistically supporting 2-3 tasks such as a foregroud/bacground monitor) realistically having 2 or 8 thread is no
    very realistic, even if the hardware supports it.

    The 8 core version might be usable for xCore style "pseudo-interupts"
    running a single DSP sample or PLC loop at a time. This would require
    8 input pins, each starting its own thread.

    But of course, the same rules should apply to pseudo-interrupts as
    real interrupts regarding re-entrancy etc.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Wed Oct 17 10:04:33 2018
    From Newsgroup: comp.arch.embedded

    On Wednesday, October 17, 2018 at 11:37:14 AM UTC-4, Niklas Holsti wrote:
    On 18-10-17 17:08 , gnuarm.deletethisbit@gmail.com wrote:
    On Wednesday, October 17, 2018 at 2:35:46 AM UTC-4, Niklas Holsti
    wrote:
    On 18-10-17 01:46 , David Brown wrote: ...
    When I am faced with someone else's code to examine or maintain,
    I often run it through Doxygen with "generate documentation for
    /everything/ - caller graphs, callee graphs, cross-linked source,
    etc." It can make it quick to jump around in the code. And
    recursive (or re-entrant, whichever you prefer) code stands out
    like a sore thumb, as long as the code is single-threaded - you
    get loops in the call graphs.

    Anecdote: some years ago, when I was applying a WCET analysis tool
    to someone else's program, the tool found recursion. This surprised
    the people I was working with, because they had generated call
    graphs for the program, analysed them visually, and found no
    recursive, looping paths.

    Turned out that they had asked the call-graph tool to optimize the
    size of the window used to display the call-graphs. The tool did as
    it was told, with the result that the line segments on the path for
    the recursive call went down to the bottom edge of the diagram,
    then *merged* with the lower border line of the diagram, followed
    that lower border, went up one side of the diagram -- still merged
    with the border line -- and then reentered the diagram to point at
    the source of the recursive call, effectively making the loop very
    hard to see...

    (It turned out that this recursion was intentional. At this point,
    the program was sending an alarm message, but the alarm buffer was
    full, so the alarm routine called itself to send an alarm about the
    full buffer -- and that worked, because one buffer slot was
    reserved, by design, for this "buffer full" alarm.)

    Seems to me what actually failed was that they knew they had
    recursion in the design but didn't realize the fact that they didn't
    see the recursion in the call graphs was an error that should have
    been caught.

    The guys creating and viewing the call-graphs were not the designers of
    the program, either, so they didn't know, but for sure it was something
    they should have discovered and remarked on as part of their work.
    Do you know the intended purpose of the call graphs? It seems to me that it would be to match expectations to what was coded. It shouldn't matter who was doing the evaluation, there should have been an accounting of expectations regarding the presence and/or absence of recursion.
    Much like a check list, it doesn't just assure the presence of everything on the list, it can be used to verify the absence of anything not on the list.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Wed Oct 17 20:24:29 2018
    From Newsgroup: comp.arch.embedded

    Am 17.10.2018 um 18:43 schrieb upsidedown@downunder.com:

    While I have been playing around with the idea of making some RTOS for
    such 1kW/64B machine (realistically supporting 2-3 tasks such as a foregroud/bacground monitor) realistically having 2 or 8 thread is no
    very realistic, even if the hardware supports it.

    The 8 core version might be usable for xCore style "pseudo-interupts"
    running a single DSP sample or PLC loop at a time. This would require
    8 input pins, each starting its own thread.

    But of course, the same rules should apply to pseudo-interrupts as
    real interrupts regarding re-entrancy etc.


    Since the Padauk doesn't have much in term of integrated peripherals,
    there is another use for hardware threads: Have each thread do some I/O protocol (I²C, UART, etc) in software.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Wed Oct 17 23:07:12 2018
    From Newsgroup: comp.arch.embedded

    On 18-10-17 20:04 , gnuarm.deletethisbit@gmail.com wrote:
    On Wednesday, October 17, 2018 at 11:37:14 AM UTC-4, Niklas Holsti
    wrote:
    On 18-10-17 17:08 , gnuarm.deletethisbit@gmail.com wrote:
    On Wednesday, October 17, 2018 at 2:35:46 AM UTC-4, Niklas
    Holsti wrote:
    On 18-10-17 01:46 , David Brown wrote: ...
    When I am faced with someone else's code to examine or
    maintain, I often run it through Doxygen with "generate
    documentation for /everything/ - caller graphs, callee
    graphs, cross-linked source, etc." It can make it quick to
    jump around in the code. And recursive (or re-entrant,
    whichever you prefer) code stands out like a sore thumb, as
    long as the code is single-threaded - you get loops in the
    call graphs.

    Anecdote: some years ago, when I was applying a WCET analysis
    tool to someone else's program, the tool found recursion. This
    surprised the people I was working with, because they had
    generated call graphs for the program, analysed them visually,
    and found no recursive, looping paths.

    Turned out that they had asked the call-graph tool to optimize
    the size of the window used to display the call-graphs. The
    tool did as it was told, with the result that the line segments
    on the path for the recursive call went down to the bottom edge
    of the diagram, then *merged* with the lower border line of the
    diagram, followed that lower border, went up one side of the
    diagram -- still merged with the border line -- and then
    reentered the diagram to point at the source of the recursive
    call, effectively making the loop very hard to see...

    (It turned out that this recursion was intentional. At this
    point, the program was sending an alarm message, but the alarm
    buffer was full, so the alarm routine called itself to send an
    alarm about the full buffer -- and that worked, because one
    buffer slot was reserved, by design, for this "buffer full"
    alarm.)

    Seems to me what actually failed was that they knew they had
    recursion in the design but didn't realize the fact that they
    didn't see the recursion in the call graphs was an error that
    should have been caught.

    The guys creating and viewing the call-graphs were not the
    designers of the program, either, so they didn't know, but for sure
    it was something they should have discovered and remarked on as
    part of their work.

    Do you know the intended purpose of the call graphs?

    IIRC they were doing independent SW verification & validation of the
    program (and the WCET analysis was also a part of that). But it was many
    years ago, and I don't remember the details well enough to say much
    more, nor can I say why the program was recursive in this way, or if it
    could as easily have been made non-recursive.
    --
    Niklas Holsti
    Tidorum Ltd
    niklas holsti tidorum fi
    . @ .
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Sun Oct 21 16:27:31 2018
    From Newsgroup: comp.arch.embedded

    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
    <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or I²C, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just
    an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
    I like that it's in a 6-pin SOT23 package since there aren't many other
    MCUs that small.

    Slightly OT, but I have often wonder how primitive a computer
    architecture can be and still do some useful work. In the
    tube/discrete/SSI times, there were quite a lot 1 bit processors.
    There were at least two types, the PLC (programmable Logic Controller)
    type replacing relay logic. These had typically at least AND, OR, NOT,
    (XOR) instructions.The other group was used as truly serial computers
    with the same instructions as the PLC but also at least a 1 bit SUB
    (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions.

    One that immediately comes in mind is the MC14500B PLC building block,
    from the 1970's, which requires quite lot of support chips (code
    memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA
    (Serial Boolean Analyser)
    http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four
    banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
    For the re-entrance enthusiasts, it contains stack pointer relative
    addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jim.brakefield@jim.brakefield@ieee.org to comp.arch.embedded on Sun Oct 21 07:47:21 2018
    From Newsgroup: comp.arch.embedded

    On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:
    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
    <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or Iæ¶Ž, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just
    an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
    I like that it's in a 6-pin SOT23 package since there aren't many other >MCUs that small.

    Slightly OT, but I have often wonder how primitive a computer
    architecture can be and still do some useful work. In the
    tube/discrete/SSI times, there were quite a lot 1 bit processors.
    There were at least two types, the PLC (programmable Logic Controller)
    type replacing relay logic. These had typically at least AND, OR, NOT,
    (XOR) instructions.The other group was used as truly serial computers
    with the same instructions as the PLC but also at least a 1 bit SUB
    (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions.

    One that immediately comes in mind is the MC14500B PLC building block,
    from the 1970's, which requires quite lot of support chips (code
    memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA
    (Serial Boolean Analyser)
    http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four
    banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
    For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?
    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?
    LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose
    (Logic Emulation Machine) https://opencores.org/project/lem1_9min
    Jim Brakefield
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Phil Martel@pomartel@comcast.net to comp.arch.embedded on Sun Oct 21 11:03:18 2018
    From Newsgroup: comp.arch.embedded

    On 10/21/2018 09:27, upsidedown@downunder.com wrote:
    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
    <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or I²C, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just
    an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram,
    enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
    I like that it's in a 6-pin SOT23 package since there aren't many other
    MCUs that small.

    Slightly OT, but I have often wonder how primitive a computer
    architecture can be and still do some useful work. In the
    tube/discrete/SSI times, there were quite a lot 1 bit processors.
    There were at least two types, the PLC (programmable Logic Controller)
    type replacing relay logic. These had typically at least AND, OR, NOT,
    (XOR) instructions.The other group was used as truly serial computers
    with the same instructions as the PLC but also at least a 1 bit SUB
    (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions.

    One that immediately comes in mind is the MC14500B PLC building block,
    from the 1970's, which requires quite lot of support chips (code
    memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA
    (Serial Boolean Analyser)
    http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four
    banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
    For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?

    I have a memory of a 1-bit GPU from the late 70's, but can't pin it
    down. There is an article on Wikipedia https://en.wikipedia.org/wiki/1-bit_architecture
    --
    Best wishes,
    --Phil
    pomartel At Comcast(ignore_this) dot net
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sun Oct 21 08:08:02 2018
    From Newsgroup: comp.arch.embedded

    On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:
    On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:
    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
    <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or Iæ¶Ž, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just
    an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
    I like that it's in a 6-pin SOT23 package since there aren't many other >MCUs that small.

    Slightly OT, but I have often wonder how primitive a computer
    architecture can be and still do some useful work. In the
    tube/discrete/SSI times, there were quite a lot 1 bit processors.
    There were at least two types, the PLC (programmable Logic Controller)
    type replacing relay logic. These had typically at least AND, OR, NOT, (XOR) instructions.The other group was used as truly serial computers
    with the same instructions as the PLC but also at least a 1 bit SUB
    (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions.

    One that immediately comes in mind is the MC14500B PLC building block,
    from the 1970's, which requires quite lot of support chips (code
    memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA
    (Serial Boolean Analyser)
    http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four
    banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
    For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?

    LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose (Logic Emulation Machine) https://opencores.org/project/lem1_9min

    Jim Brakefield
    It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.
    I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jim.brakefield@jim.brakefield@ieee.org to comp.arch.embedded on Sun Oct 21 09:31:29 2018
    From Newsgroup: comp.arch.embedded

    On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote:
    On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:
    On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:
    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
    <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or Iæ¶Ž, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just >an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >enough for plenty of MCU things. Didn't check if it has an ADC or PWM. >I like that it's in a 6-pin SOT23 package since there aren't many other >MCUs that small.

    Slightly OT, but I have often wonder how primitive a computer architecture can be and still do some useful work. In the tube/discrete/SSI times, there were quite a lot 1 bit processors.
    There were at least two types, the PLC (programmable Logic Controller) type replacing relay logic. These had typically at least AND, OR, NOT, (XOR) instructions.The other group was used as truly serial computers with the same instructions as the PLC but also at least a 1 bit SUB
    (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions.

    One that immediately comes in mind is the MC14500B PLC building block, from the 1970's, which requires quite lot of support chips (code
    memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA
    (Serial Boolean Analyser)
    http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four
    banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
    For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?

    LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose (Logic Emulation Machine) https://opencores.org/project/lem1_9min

    Jim Brakefield

    It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

    I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs.

    Rick C.
    It's hard to picture an application where you couldn't spare a few hundred LUTs.
    There are advantages to using several soft core processors, each sized and customized to the need.
    I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.
    There are many under 600 LUTs, including 32-bit. Had hoped the full featured LEM design would be under 100 LUTs.
    Have done some rough research of whats available for under 600 LUTs: https://opencores.org/project/up_core_list/downloads
    select: "By Performance Metric"
    A big rational for small soft core processors is that they replace LUTs (slow speed logic) with block RAM (instructions). And they are completely deterministic as opposed to doing the same by time slicing a ASIC (ARM) processor.
    Jim Brakefield
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gnuarm.deletethisbit@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sun Oct 21 10:51:29 2018
    From Newsgroup: comp.arch.embedded

    On Sunday, October 21, 2018 at 12:31:34 PM UTC-4, jim.bra...@ieee.org wrote:
    On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote:
    On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:
    On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:
    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or Iæ¶Ž, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just >an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram,
    enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
    I like that it's in a 6-pin SOT23 package since there aren't many other
    MCUs that small.

    Slightly OT, but I have often wonder how primitive a computer architecture can be and still do some useful work. In the tube/discrete/SSI times, there were quite a lot 1 bit processors.
    There were at least two types, the PLC (programmable Logic Controller) type replacing relay logic. These had typically at least AND, OR, NOT, (XOR) instructions.The other group was used as truly serial computers with the same instructions as the PLC but also at least a 1 bit SUB (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions.

    One that immediately comes in mind is the MC14500B PLC building block, from the 1970's, which requires quite lot of support chips (code memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA (Serial Boolean Analyser)
    http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
    For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC environment.

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?

    Anyone seen more modern 1 bit chips either for relay replacement or for truly serial computers ?

    LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose (Logic Emulation Machine) https://opencores.org/project/lem1_9min

    Jim Brakefield

    It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

    I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs.

    Rick C.

    It's hard to picture an application where you couldn't spare a few hundred LUTs.

    There are advantages to using several soft core processors, each sized and customized to the need.

    I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

    There are many under 600 LUTs, including 32-bit. Had hoped the full featured LEM design would be under 100 LUTs.
    Have done some rough research of whats available for under 600 LUTs: https://opencores.org/project/up_core_list/downloads
    select: "By Performance Metric"

    A big rational for small soft core processors is that they replace LUTs (slow speed logic) with block RAM (instructions). And they are completely deterministic as opposed to doing the same by time slicing a ASIC (ARM) processor.
    I won't argue a bit that softcores and especially *customizable* softcore CPUs aren't useful. I was talking about there being at best a very tiny region of utility for 1-bit processors.
    My 600 LUT processor didn't trade off much for performance. It would run pretty fast and was pretty capable. In addition the word size was independent of the instruction set. That said, there are apps where a much less powerful processor would do fine and saving a few more LUTs would be useful.
    Rick C.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Oct 21 21:43:43 2018
    From Newsgroup: comp.arch.embedded

    On 21/10/2018 17:08, gnuarm.deletethisbit@gmail.com wrote:


    It is hard for me to imagine applications where a 1 bit processor
    would be useful. A useful N bit processor can be built in a small
    number of LUTs. I've built a 16 bit processor in just 600 LUTs and
    I've seen processors in a bit less.

    I discussed this with someone once and he imagined apps where the
    processing speed requirement was quite low and you can save LUTs with
    a bit serial processor. I just don't know how many or why it would
    matter. Even the smallest FPGAs have thousands of LUTs. It's hard
    to picture an application where you couldn't spare a few hundred
    LUTs.


    There is not much point in 1-bit processing with modern architectures
    and FPGAs. But it used to be more useful, for cheap and scalable
    solutions. You got systems that scaled in parallel, using bit-slice processors to make cpus as wide as you want. And you got serial
    scaling, giving you practical numbers of bits with minimal die area
    (like the COP8 microcontrollers).

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jim.brakefield@jim.brakefield@ieee.org to comp.arch.embedded on Sun Oct 21 12:44:39 2018
    From Newsgroup: comp.arch.embedded

    On Sunday, October 21, 2018 at 12:51:34 PM UTC-5, gnuarm.del...@gmail.com wrote:
    On Sunday, October 21, 2018 at 12:31:34 PM UTC-4, jim.bra...@ieee.org wrote:
    On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote:
    On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:
    On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:
    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or Iæ¶Ž, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just
    an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram,
    enough for plenty of MCU things. Didn't check if it has an ADC or PWM.
    I like that it's in a 6-pin SOT23 package since there aren't many other
    MCUs that small.

    Slightly OT, but I have often wonder how primitive a computer architecture can be and still do some useful work. In the tube/discrete/SSI times, there were quite a lot 1 bit processors. There were at least two types, the PLC (programmable Logic Controller)
    type replacing relay logic. These had typically at least AND, OR, NOT,
    (XOR) instructions.The other group was used as truly serial computers with the same instructions as the PLC but also at least a 1 bit SUB (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions.

    One that immediately comes in mind is the MC14500B PLC building block,
    from the 1970's, which requires quite lot of support chips (code memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA (Serial Boolean Analyser) http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package. For the re-entrance enthusiasts, it contains stack pointer relative addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 Darlington buffers may be needed to drive loads typically found in PLC
    environment.

    Anyone seen more modern 1 bit chips either for relay replacement or for truly serial computers ?

    Anyone seen more modern 1 bit chips either for relay replacement or for truly serial computers ?

    LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose
    (Logic Emulation Machine) https://opencores.org/project/lem1_9min

    Jim Brakefield

    It is hard for me to imagine applications where a 1 bit processor would be useful. A useful N bit processor can be built in a small number of LUTs. I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

    I discussed this with someone once and he imagined apps where the processing speed requirement was quite low and you can save LUTs with a bit serial processor. I just don't know how many or why it would matter. Even the smallest FPGAs have thousands of LUTs. It's hard to picture an application where you couldn't spare a few hundred LUTs.

    Rick C.

    It's hard to picture an application where you couldn't spare a few hundred LUTs.

    There are advantages to using several soft core processors, each sized and customized to the need.

    I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

    There are many under 600 LUTs, including 32-bit. Had hoped the full featured LEM design would be under 100 LUTs.
    Have done some rough research of whats available for under 600 LUTs: https://opencores.org/project/up_core_list/downloads
    select: "By Performance Metric"

    A big rational for small soft core processors is that they replace LUTs (slow speed logic) with block RAM (instructions). And they are completely deterministic as opposed to doing the same by time slicing a ASIC (ARM) processor.

    I won't argue a bit that softcores and especially *customizable* softcore CPUs aren't useful. I was talking about there being at best a very tiny region of utility for 1-bit processors.

    My 600 LUT processor didn't trade off much for performance. It would run pretty fast and was pretty capable. In addition the word size was independent of the instruction set. That said, there are apps where a much less powerful processor would do fine and saving a few more LUTs would be useful.

    Rick C.
    there being at best a very tiny region of utility for 1-bit processors
    There are a small number of examples:
    Bit serial processors such as DEC PDP8L, early vacuum tube & drum machines, for example Bendix G-15.
    Bit serial Cordic
    Also telling, is that 4-bit processors for calculators have been replaced by 8-bit processors.
    My inspiration was EDIF, which was/is output from VHDL & Verilog compilers. E.g. use EDIF as a machine language. In the context of logic simulation, greater FPGA capacity possible for slow logic.
    This effort also lead to a theoretical insight for brain modelling: There is greater information content in the wiring than in the logic. The human brain has 2<<36+ neurons requiring 36-bits of information for each connection and only 16 or so bits for the state/configuration of each synapse. Also a FPGA requires 60+ bits to route each LUT input (assuming all LUT inputs in use) whereas each possible input can be specified by 20 bits or less (1M LUT FPGA).
    Of course optimizing simulators convert the EDIF to an existing machine language. Likewise for industrial automation (ladder logic, ...).
    Jim Brakefield
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Brett@ggtgp@yahoo.com to comp.arch.embedded on Mon Oct 22 00:28:51 2018
    From Newsgroup: comp.arch.embedded

    <jim.brakefield@ieee.org> wrote:
    On Sunday, October 21, 2018 at 12:51:34 PM UTC-5, gnuarm.del...@gmail.com wrote:
    On Sunday, October 21, 2018 at 12:31:34 PM UTC-4, jim.bra...@ieee.org wrote: >>> On Sunday, October 21, 2018 at 10:08:06 AM UTC-5, gnuarm.del...@gmail.com wrote:
    On Sunday, October 21, 2018 at 10:47:26 AM UTC-4, jim.bra...@ieee.org wrote:
    On Sunday, October 21, 2018 at 8:27:35 AM UTC-5, upsid...@downunder.com wrote:
    On Wed, 10 Oct 2018 19:29:13 -0700, Paul Rubin
    <no.email@nospam.invalid> wrote:

    Clifford Heath <no.spam@please.net> writes:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html>
    <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>
    OTP, no SPI, UART or Iæ¶Ž, but still...

    That is impressive! Seems to be an 8-bit RISC with no registers, just >>>>>>> an accumulator, a cute concept. 1K of program OTP and 64 bytes of ram, >>>>>>> enough for plenty of MCU things. Didn't check if it has an ADC or PWM. >>>>>>> I like that it's in a 6-pin SOT23 package since there aren't many other >>>>>>> MCUs that small.

    Slightly OT, but I have often wonder how primitive a computer
    architecture can be and still do some useful work. In the
    tube/discrete/SSI times, there were quite a lot 1 bit processors.
    There were at least two types, the PLC (programmable Logic Controller) >>>>>> type replacing relay logic. These had typically at least AND, OR, NOT, >>>>>> (XOR) instructions.The other group was used as truly serial computers >>>>>> with the same instructions as the PLC but also at least a 1 bit SUB >>>>>> (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions. >>>>>>
    One that immediately comes in mind is the MC14500B PLC building block, >>>>>> from the 1970's, which requires quite lot of support chips (code
    memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA
    (Serial Boolean Analyser)
    http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four >>>>>> banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package. >>>>>> For the re-entrance enthusiasts, it contains stack pointer relative >>>>>> addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 >>>>>> Darlington buffers may be needed to drive loads typically found in PLC >>>>>> environment.

    Anyone seen more modern 1 bit chips either for relay replacement or >>>>>> for truly serial computers ?

    Anyone seen more modern 1 bit chips either for relay replacement or >>>>> ]> for truly serial computers ?

    LEM1_9 and LEM4_9 are FPGA soft cores that are intended for that purpose >>>>> (Logic Emulation Machine) https://opencores.org/project/lem1_9min

    Jim Brakefield

    It is hard for me to imagine applications where a 1 bit processor
    would be useful. A useful N bit processor can be built in a small
    number of LUTs. I've built a 16 bit processor in just 600 LUTs and
    I've seen processors in a bit less.

    I discussed this with someone once and he imagined apps where the
    processing speed requirement was quite low and you can save LUTs with
    a bit serial processor. I just don't know how many or why it would
    matter. Even the smallest FPGAs have thousands of LUTs. It's hard to >>>> picture an application where you couldn't spare a few hundred LUTs.

    Rick C.

    It's hard to picture an application where you couldn't spare a few hundred LUTs.

    There are advantages to using several soft core processors, each sized
    and customized to the need.

    I've built a 16 bit processor in just 600 LUTs and I've seen processors in a bit less.

    There are many under 600 LUTs, including 32-bit. Had hoped the full
    featured LEM design would be under 100 LUTs.
    Have done some rough research of whats available for under 600 LUTs:
    https://opencores.org/project/up_core_list/downloads
    select: "By Performance Metric"

    A big rational for small soft core processors is that they replace LUTs
    (slow speed logic) with block RAM (instructions). And they are
    completely deterministic as opposed to doing the same by time slicing a
    ASIC (ARM) processor.

    I won't argue a bit that softcores and especially *customizable*
    softcore CPUs aren't useful. I was talking about there being at best a
    very tiny region of utility for 1-bit processors.

    My 600 LUT processor didn't trade off much for performance. It would
    run pretty fast and was pretty capable. In addition the word size was
    independent of the instruction set. That said, there are apps where a
    much less powerful processor would do fine and saving a few more LUTs would be useful.

    Rick C.

    there being at best a very tiny region of utility for 1-bit processors

    There are a small number of examples:
    Bit serial processors such as DEC PDP8L, early vacuum tube & drum
    machines, for example Bendix G-15.
    Bit serial Cordic

    Also telling, is that 4-bit processors for calculators have been replaced
    by 8-bit processors.

    My inspiration was EDIF, which was/is output from VHDL & Verilog
    compilers. E.g. use EDIF as a machine language. In the context of logic simulation, greater FPGA capacity possible for slow logic.

    This effort also lead to a theoretical insight for brain modelling: There
    is greater information content in the wiring than in the logic. The
    human brain has 2<<36+ neurons requiring 36-bits of information for each connection and only 16 or so bits for the state/configuration of each synapse. Also a FPGA requires 60+ bits to route each LUT input (assuming
    all LUT inputs in use) whereas each possible input can be specified by 20 bits or less (1M LUT FPGA).

    The clock speed is quite low, 2 Hz?
    So the wetware is is not quite impossible to emulate with current tech.
    Raising a baby and training the resultant adult to do a task is still many orders of magnitude cheaper.
    ;)

    Of course optimizing simulators convert the EDIF to an existing machine language. Likewise for industrial automation (ladder logic, ...).

    Jim Brakefield




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Sun Oct 21 20:59:55 2018
    From Newsgroup: comp.arch.embedded

    On Sun, 21 Oct 2018 16:27:31 +0300, upsidedown@downunder.com wrote:

    Slightly OT, but I have often wonder how primitive a computer
    architecture can be and still do some useful work. In the
    tube/discrete/SSI times, there were quite a lot 1 bit processors.
    There were at least two types, the PLC (programmable Logic Controller)
    type replacing relay logic. These had typically at least AND, OR, NOT,
    (XOR) instructions.The other group was used as truly serial computers
    with the same instructions as the PLC but also at least a 1 bit SUB
    (and ADD) instructions to implement all mathematical functions.

    However, in the LSI era, there down't seem to be many implement ions.

    One that immediately comes in mind is the MC14500B PLC building block,
    from the 1970's, which requires quite lot of support chips (code
    memory, PC, /O chips) to do some useful work.

    After much searching, I found the (NI) National Instruments SBA
    (Serial Boolean Analyser)
    http://www.wass.net/othermanuals/GI%20SBA.pdf
    from the same era, with 1024 word instructions (8 bit) ROM and four
    banks of 30 _bits_ data memory and 30 I/O pins in a 40 pin package.
    For the re-entrance enthusiasts, it contains stack pointer relative >addressing :-). THe I/O pins are 5 V TTL compatible, so a few ULN2803 >Darlington buffers may be needed to drive loads typically found in PLC >environment.

    Anyone seen more modern 1 bit chips either for relay replacement or
    for truly serial computers ?

    Circa 1985-1993, Thinking Machines Connection Machine.
    Circa 1987-1996, MasPar MP series.

    The CM-1, 2, 2a, and 200 all were SIMD parallel using 1-bit serial
    integer-only CPUs. Sizes ranged from 8K CPUs at the low end to 64K
    CPUs at the high end. Each CPU had 4K *bits* of private RAM, and the
    CPUs were connected in a multidimensional hypercube network.

    The CM-2, 2a, and 200 were augmented with 32-bit FPUs (1 per 32 CPUs),
    and the 200 featured a higher clock speed.


    The MP-1 was SIMD parallel using 4-bit serial integer-only CPUs in
    sizes from 1K to 16K CPUs. It also had 32-bit FPUs, but I don't
    remember how many / what ratio. I remember that it had an accumulator
    register rather than going memory->memory like the CM.

    [I can't find much information now about the MP-1 ... unfortunately
    MasPar didn't last very long in the marketplace. The Wikipedia
    article has some information about the MP-2, but the MP-2 was a later
    full 32-bit design, very different from the MP-1.]


    My college had both an 8K CM-2 and a 1K MP-1, accessible to those who
    took various parallel processing electives. I never got to use the
    MP-1 much - it was new at the end of my time and I only ever played
    with it a bit. But I spent 2 semesters working with the CM-2.

    Even though the CM's clock speed was only ~8MHz, the performance was
    amazing IF the problem was a good fit to the architecture. E.g., at
    that time, I owned a 66MHz (dx2) i486. Converted for the CM-2
    architecture, O(n^4) array processing on the i486 became O(n) on the
    CM-2. I had a physics simulation that took over 3 hours on my i486
    that ran in ~10 minutes on the CM.

    George
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Wed Oct 24 15:57:55 2018
    From Newsgroup: comp.arch.embedded

    Am 14.10.2018 um 11:55 schrieb Theo:
    Tim <cpldcpu+usenet@gmail.com> wrote:
    This is quite curious. I wonder

    - Has anyone actually received the devices they ordered? The cheaper
    variants seem to be sold out.

    I think they've sold out since they went viral. EEVblog did a video showing 550 in stock - that's only $16 worth of parts, not hard to imagine they've been bought up.

    The other option is they're some kind of EOL part and 3c is the 'reduced to clear' price - which they have done, very successfully.

    Theo


    They're back in stock, though the price rose by 21% to 0.046$.
    Also, LCSC seems to now be stocking more Padauk parts, including more
    dual-core devices. Unfortunately, the programmer seems to be out of
    stock, and they have neither the flash nor the DIP variants.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Mon Nov 5 12:41:27 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.2018 um 09:44 schrieb David Brown:
    On 12/10/18 08:50, Philipp Klaus Krause wrote:
    Am 12.10.2018 um 01:08 schrieb Paul Rubin:
    upsidedown@downunder.com writes:
    There is a lot of operations that will update memory locations, so why >>>> would you need a lot of CPU registers.

    Being able to (say) add register to register saves traffic through the
    accumulator and therefore instructions.

    1 KiB = 0.5 KiW is quite a lot, it is about 10-15 pages of commented
    assembly program listing.

    It would be nice to have a C compiler, and registers help with that.


    Looking at the instruction set, it should be possible to make a backend
    for this in SDCC; the architecture looks more C-friendly than the
    existing pic14 and pic16 backends. But it surely isn't as nice as stm8
    or z80.
    reentrant functions will be inefficent: No registers, and no sp-relative
    adressing mode. On would want to reserve a few memory locations as
    pseudo-registers to help with that, but that only goes so far.


    It looks like the lowest 16 memory addresses could be considered pseudo-registers - they are the ones that can be used for direct memory access rather than needing indirect access.


    Considering the multi-core variants of the Padauk µCs:
    Those adresses are shared across all cores. Each core only has its own
    A, SP, F, PC.
    How do we handle local variables?

    Option 1: Make functions non-reentrant. Requires duplication of code (we
    need per-thread copies of functions), and link-time analysis to ensure
    that each thread only calls the function implementation meant for it.
    Functions pointers get complicated.

    Option 2: Use an inefficient combination of thread-local storage and stack.

    Since this is a small µC, we need a lot of support functions, which the compiler inserts (e.g. for multiplication); of course those are affected
    by the same problems.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Thu Nov 8 13:53:48 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com:
    On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 10.10.2018 um 03:05 schrieb Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    They even make dual-core variants (the part where the first digit in the
    part number is '2'). It seems program counter, stack pointer, flag
    register and accumulator are per-core, while the rest, including the ALU
    is shared. In particular, the I/O registers are also shared, which means
    some multiplier registers would also be - but currently all variants
    with integrated multiplier are single-core.
    Use of the ALU is shared byt he two cores, alternating by clock cycle.

    Philipp


    Interesting, that would make it easy to run a multitasking RTOS (foreground/background) monitor, which might justify the use of some reentrant library routines :-). But in reality, the available memory (ROM/RAM) is so small so that you could easily manage this with static
    memory allocations.



    But static memory allocation would require one copy of each function per thread. And the linker would have to analyze the call graph to always
    call the correct function for each thread. Function pointers get
    complicated.

    Unfortunately, reentrancy becomes even harder with
    hardware-multithreading: TO access the stack, one has to construct a
    pointer to the stack location in a memory location. That memory location
    (as any pseudo-registers) is then shared among all running instances of
    the function. So it needs to be protected (e.g. with a spinlock), making
    access even more inefficient. And that spinlock will cause issues with interrupts (a solution might be to heavily restrict interrupt routines, essentially allowing not much more than setting some global variables).

    The there is the trade-off of using one such memory location per
    function vs. per program (the latter reducing memroy usage, but
    resulting in less paralellism).

    The pseudo-registers one would want to use are not so much a problem for interrupt routines (they would just need saving and thus increase
    interrupt overhead a bit), but for hardware parallelism. Essentially all
    access to them would again have to be protected by a spinlock.

    All these problems could have relatively easily been avoided by
    providing an efficient stack-pointer-relative addressing mode. Having a
    few general-purpose or index registers would have somewhat helped as well.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tauno Voipio@tauno.voipio@notused.fi.invalid to comp.arch.embedded on Thu Nov 8 15:08:24 2018
    From Newsgroup: comp.arch.embedded

    On 8.11.18 14:53, Philipp Klaus Krause wrote:
    Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com:
    On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 10.10.2018 um 03:05 schrieb Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    They even make dual-core variants (the part where the first digit in the >>> part number is '2'). It seems program counter, stack pointer, flag
    register and accumulator are per-core, while the rest, including the ALU >>> is shared. In particular, the I/O registers are also shared, which means >>> some multiplier registers would also be - but currently all variants
    with integrated multiplier are single-core.
    Use of the ALU is shared byt he two cores, alternating by clock cycle.

    Philipp


    Interesting, that would make it easy to run a multitasking RTOS
    (foreground/background) monitor, which might justify the use of some
    reentrant library routines :-). But in reality, the available memory
    (ROM/RAM) is so small so that you could easily manage this with static
    memory allocations.



    But static memory allocation would require one copy of each function per thread. And the linker would have to analyze the call graph to always
    call the correct function for each thread. Function pointers get
    complicated.

    Unfortunately, reentrancy becomes even harder with
    hardware-multithreading: TO access the stack, one has to construct a
    pointer to the stack location in a memory location. That memory location
    (as any pseudo-registers) is then shared among all running instances of
    the function. So it needs to be protected (e.g. with a spinlock), making access even more inefficient. And that spinlock will cause issues with interrupts (a solution might be to heavily restrict interrupt routines, essentially allowing not much more than setting some global variables).

    The there is the trade-off of using one such memory location per
    function vs. per program (the latter reducing memroy usage, but
    resulting in less paralellism).

    The pseudo-registers one would want to use are not so much a problem for interrupt routines (they would just need saving and thus increase
    interrupt overhead a bit), but for hardware parallelism. Essentially all access to them would again have to be protected by a spinlock.

    All these problems could have relatively easily been avoided by
    providing an efficient stack-pointer-relative addressing mode. Having a
    few general-purpose or index registers would have somewhat helped as well.

    Philipp


    And you'll end up with a low-end Cortex ...
    --

    -TV

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Thu Nov 8 14:34:44 2018
    From Newsgroup: comp.arch.embedded

    Am 08.11.18 um 14:08 schrieb Tauno Voipio:


    And you'll end up with a low-end Cortex ...


    A low-end Cortex would still be far heavier than a Padauk variant with
    an sp-relative adressing mode or a few registers added.
    I think a more multithreading-friendly variant of the Padauk would even
    still be simpler than an STM8.
    But one could surely create a nice STM8-like (with a few STM8 weaknesses
    fixed) processor with hardware multihreading.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Thu Nov 8 21:52:49 2018
    From Newsgroup: comp.arch.embedded

    On Thu, 8 Nov 2018 13:53:48 +0100, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 12.10.18 um 20:39 schrieb upsidedown@downunder.com:
    On Fri, 12 Oct 2018 10:18:56 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 10.10.2018 um 03:05 schrieb Clifford Heath:
    <https://lcsc.com/product-detail/PADAUK_PADAUK-Tech-PMS150C_C129127.html> >>>> <http://www.padauk.com.tw/upload/doc/PMS150C%20datasheet%20V004_EN_20180124.pdf>


    OTP, no SPI, UART or I²C, but still...

    Clifford Heath

    They even make dual-core variants (the part where the first digit in the >>> part number is '2'). It seems program counter, stack pointer, flag
    register and accumulator are per-core, while the rest, including the ALU >>> is shared. In particular, the I/O registers are also shared, which means >>> some multiplier registers would also be - but currently all variants
    with integrated multiplier are single-core.
    Use of the ALU is shared byt he two cores, alternating by clock cycle.

    Philipp


    Interesting, that would make it easy to run a multitasking RTOS
    (foreground/background) monitor, which might justify the use of some
    reentrant library routines :-). But in reality, the available memory
    (ROM/RAM) is so small so that you could easily manage this with static
    memory allocations.



    But static memory allocation would require one copy of each function per >thread.

    For a foreground/background monitor, the worst case would be two
    copies of static data, if both threads use the same rubroutine.

    And the linker would have to analyze the call graph to always
    call the correct function for each thread.

    Linker for such small target ?

    With such small processor, just track any dependencies manually.

    Function pointers get complicated.

    Do you really insist of using function pointer with such small
    targets?


    Unfortunately, reentrancy becomes even harder with
    hardware-multithreading:

    With two hardware threads, you would need at most two copies of static
    data.

    TO access the stack, one has to construct a
    pointer to the stack location in a memory location.

    Why would you want to access the stack ?

    The stack is usable for handling return addresses, but I guess that a
    hardware thread must have its own return address stack pointer.

    In fact many minicomputers from the 1960's did not even have a stack
    at all. The calling program just stored the return address in the
    first word of the subroutine and the at the end o the subroutine,
    performed an indirect jump through the first word of the subroutine to
    return to the calling program. Of course, this is not re-entrant and
    in those days one did not have to worry about multiple CPUs accessing
    the same routines:-).

    BTW, who needs a program counter (PC), many microprograms run without
    a PC, with the next instruction address stored at the end of the long instruction word :-)


    That memory location
    (as any pseudo-registers) is then shared among all running instances of
    the function. So it needs to be protected (e.g. with a spinlock), making >access even more inefficient. And that spinlock will cause issues with >interrupts (a solution might be to heavily restrict interrupt routines, >essentially allowing not much more than setting some global variables).

    Disabling all interrupts for the duration of some critical operations
    is often enough, but of course, the number of instructions executed
    during interrupt disabled should be minimized. In MACRO-11 assembler,
    the standard practice was to start the comment field with a semicolon,
    when task switching was disabled with two semicolons and when
    interrupt disabled with three semicolons, it was visually easy to
    detect when interrupts were disabled and not mess too much with such
    code sections.


    The there is the trade-off of using one such memory location per
    function vs. per program (the latter reducing memroy usage, but
    resulting in less paralellism).

    The pseudo-registers one would want to use are not so much a problem for >interrupt routines (they would just need saving and thus increase
    interrupt overhead a bit), but for hardware parallelism. Essentially all >access to them would again have to be protected by a spinlock.

    All these problems could have relatively easily been avoided by
    providing an efficient stack-pointer-relative addressing mode. Having a
    few general-purpose or index registers would have somewhat helped as well.

    Philipp

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Thu Nov 8 21:56:16 2018
    From Newsgroup: comp.arch.embedded

    Am 08.11.18 um 20:52 schrieb upsidedown@downunder.com:

    But static memory allocation would require one copy of each function per
    thread.

    For a foreground/background monitor, the worst case would be two
    copies of static data, if both threads use the same rubroutine.

    And the linker would have to analyze the call graph to always
    call the correct function for each thread.

    Linker for such small target ?

    Of course. The support routines the compiler uses reside in some
    library, the linker links them in if necessary. Also, the larger
    variants are not that small, with up to 256 B of RAM and 8 KB of ROM.
    One might want to e.g. have one .c file for handling I²", one for the
    soft UART, etc.


    With such small processor, just track any dependencies manually.

    See above.


    Function pointers get complicated.

    Do you really insist of using function pointer with such small
    targets?


    I want to have C, function pointers are part of it.


    Unfortunately, reentrancy becomes even harder with
    hardware-multithreading:

    With two hardware threads, you would need at most two copies of static
    data.

    Padauk still makes one chip with 8 hardware threads (and it looks to me
    as if there were more in the past, though they are not currently listed
    on their website, one can find them e.g. in their IDE).


    TO access the stack, one has to construct a
    pointer to the stack location in a memory location.

    Why would you want to access the stack ?

    For reentrency, so I can use one function implementation for all
    threads. It would also be useful to dynamically assign threads to
    hardware threads (so no thread is tied to specific hardware, and some OS schedules them).


    The stack is usable for handling return addresses, but I guess that a hardware thread must have its own return address stack pointer.

    Each hardware thread has its flag register (4 bits) accumulator (8
    bits), pc (12 bits) and stack pointer (8 bits).


    That memory location
    (as any pseudo-registers) is then shared among all running instances of
    the function. So it needs to be protected (e.g. with a spinlock), making
    access even more inefficient. And that spinlock will cause issues with
    interrupts (a solution might be to heavily restrict interrupt routines,
    essentially allowing not much more than setting some global variables).

    Disabling all interrupts for the duration of some critical operations
    is often enough, but of course, the number of instructions executed
    during interrupt disabled should be minimized.

    Disabling interrupts any time a spinlock is held or a thread is wating
    for one might be too much, especially if there are many threads, so the spinlock is held often.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From upsidedown@upsidedown@downunder.com to comp.arch.embedded on Fri Nov 9 00:35:55 2018
    From Newsgroup: comp.arch.embedded

    On Thu, 8 Nov 2018 21:56:16 +0100, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 08.11.18 um 20:52 schrieb upsidedown@downunder.com:

    But static memory allocation would require one copy of each function per >>> thread.

    For a foreground/background monitor, the worst case would be two
    copies of static data, if both threads use the same rubroutine.

    And the linker would have to analyze the call graph to always
    call the correct function for each thread.

    Linker for such small target ?

    Of course. The support routines the compiler uses reside in some
    library, the linker links them in if necessary. Also, the larger
    variants are not that small, with up to 256 B of RAM and 8 KB of ROM.
    One might want to e.g. have one .c file for handling I²", one for the
    soft UART, etc.

    A linker is required, if the libraries are (for copyright reasons)
    delivered as binary object code only.

    However, if the library are delivered as source files and the compiler/assembler has even a rudimentary #include mechanism, just
    include those library files you need. With a include or macro
    processor with parameter passing, just invoke same include file or
    macro twice with different parameters for different static variable
    instances.

    Of course, linkers are also needed, if very primitive compilation
    machines are used, such as floppy based Intellecs or Exorcisers. It
    could take a day to compile a large program all the way from sources,
    with multiple floppy changes to get the final absolute file to a
    single floppy, ready to be burnt into EPROMS for an additional hour or
    two. In such environment compiling, linking and burning only the
    source file changed would speed up program development a lot.

    When using a modern PC for compilation, there are no such issues.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Fri Nov 9 09:00:41 2018
    From Newsgroup: comp.arch.embedded

    Am 08.11.18 um 23:35 schrieb upsidedown@downunder.com:
    And the linker would have to analyze the call graph to always
    call the correct function for each thread.

    Linker for such small target ?

    Of course. The support routines the compiler uses reside in some
    library, the linker links them in if necessary. Also, the larger
    variants are not that small, with up to 256 B of RAM and 8 KB of ROM.
    One might want to e.g. have one .c file for handling I²", one for the
    soft UART, etc.

    A linker is required, if the libraries are (for copyright reasons)
    delivered as binary object code only.

    However, if the library are delivered as source files and the compiler/assembler has even a rudimentary #include mechanism, just
    include those library files you need. With a include or macro
    processor with parameter passing, just invoke same include file or
    macro twice with different parameters for different static variable instances.

    Of course, linkers are also needed, if very primitive compilation
    machines are used, such as floppy based Intellecs or Exorcisers. It
    could take a day to compile a large program all the way from sources,
    with multiple floppy changes to get the final absolute file to a
    single floppy, ready to be burnt into EPROMS for an additional hour or
    two. In such environment compiling, linking and burning only the
    source file changed would speed up program development a lot.

    When using a modern PC for compilation, there are no such issues.


    Separate compilation and then linking is the normal thing to, and a
    common workflow for small devices. This is e.g. how most people use
    SDCC, a mainstream free compiler targeting various 8-bit architectures.

    That doesn't mean it is the only way (and since SDCC does not have
    link-time optimization it might not be the optimal way either). But it
    is something people use and expect to work reasonably well.

    So for anyone designing an architecture it would be wise to not put too
    many obstacles into that workflow.

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Philipp Klaus Krause@pkk@spth.de to comp.arch.embedded on Sun Nov 11 09:27:20 2018
    From Newsgroup: comp.arch.embedded

    Am 12.10.18 um 22:45 schrieb upsidedown@downunder.com:
    On Fri, 12 Oct 2018 22:06:02 +0200, Philipp Klaus Krause <pkk@spth.de>
    wrote:

    Am 12.10.2018 um 20:30 schrieb upsidedown@downunder.com:

    The real issue would be the small RAM size.

    Devices with this architecture go up to 256 B of RAM (but they then cost
    a few cent more).

    Philipp

    Did you find the binary encoding of various instruction formats, i.e
    how many bits allocated to the operation code and how many for the
    address field ?

    My initial guess was that the instruction word is simple 8 bit opcode
    + 8 bit address, but the bit and word address limits for the smaller
    models would suggest that for some op-codes, the op-code field might
    be wider than 8 bits and address fields narrower than 8 bits (e.g. bit
    and word addressing).


    It is more complicated. Apparently the encoding changed from a 16-bit instruction word used by older types (https://www.mikrocontroller.net/topic/461002#5616813) to a 14-bit
    instruction word used by newer types (https://www.mikrocontroller.net/topic/461002#5616603).

    Padauk also dropped and added various instructions at some points (e.g.
    ldtabh, ldtabl, mul, pushw, popw).

    Philipp
    --- Synchronet 3.20a-Linux NewsLink 1.114