• Voice compression

    From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Apr 2 18:33:56 2025
    From Newsgroup: comp.arch.embedded

    I need to manage some audio voice streams. They will be saved on a non volatile memory as raw arrays of unsigned char.

    The main goal is to play these audio streams through a DAC/PWM. I'm not interested in high quality, a "mid/low" quality could be good.

    My hardware is a poor AVR8 8-bits MCU.

    I have a limited memory space, so I'm searching a good voice codec with compression. Lower the bitrate, more streams I can store. 16kbps can be
    good for my application. Maximum 24kbps.

    By using 4-bits ADPCM I could have 32kpbs at 8kHz sampling rate. With
    3-bits ADPCM I could read 24kbps.

    I tried to reduce sampling frequency to 4kHz, but the quality is
    drastically reduced.

    I know there are many others voice codecs that reduce the bitrate a lot,
    but the decoder seems too complex to implement on AVR8.

    Any suggestions?
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Rafael Deliano@Rafael_Deliano@arcor.de to comp.arch.embedded on Wed Apr 2 19:55:39 2025
    From Newsgroup: comp.arch.embedded

    CVSD uses a bit-serial data stream. Harris datasheets for obsolete
    Codecs are HC55516, HC55532. The "recording"-circuit can be an analog
    hack ( Kop, flipflop, 4 Bit shiftregister ) that sends data via SPI.
    The "playback" would have to emulate this circuit in software and output
    via a 8 bit D/A ( R2R resistor network, but serial ICs may be easier in
    SMD ).
    16kBit/sec is very moderate quality, 24kBit/sec more reasonable.
    We used these in the 80ies for digital answering machines in cars for
    the analog telephone system via radio that predated GSM in Germany.
    24kBit was for incoming messages in RAM, 16 kBit for the fixed messages
    from EPROM. CVSD was ok, as the analog radio was a bit noisy
    anyway.
    At 32kBit/sec ADPCM is better, but you probably do not intend to use a
    64kBit PCM codec as a frontend. If you use a handset or a digital
    PCM-link, the quality of CVSD may be not competitive. For playback via
    a loudspeaker sufficient, there is usually enough background noise.

    MfG JRD

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Thu Apr 3 19:53:28 2025
    From Newsgroup: comp.arch.embedded

    Il 02/04/2025 19:55, Rafael Deliano ha scritto:
    CVSD uses a bit-serial data stream. Harris datasheets for obsolete
    Codecs are HC55516, HC55532. The "recording"-circuit can be an analog
    hack ( Kop, flipflop, 4 Bit shiftregister ) that sends data via SPI.
    The "playback" would have to emulate this circuit in software and output
    via a 8 bit D/A ( R2R resistor network, but serial ICs may be easier in
    SMD ).
    16kBit/sec is very moderate quality, 24kBit/sec more reasonable.
    We used these in the 80ies for digital answering machines in cars for
    the analog telephone system via radio that predated GSM in Germany.
    24kBit was for incoming messages in RAM, 16 kBit for the fixed messages
    from EPROM. CVSD was ok, as the analog radio was a bit noisy
    anyway.

    Thank you for the suggestion. I tried to implement a simple CVSD codec
    in Python just to test the quality. I finally got these two functions[1].

    I started from this audio[2] and obtained this one[3] after an encoding
    and decoding process. It's a short speech from an italian voice. I think
    you can see how bad the quality of decoded audio is.

    I suspect I made some errors, because I don't think this is the quality
    of this audio codec. You said this codec was used in the past, but even
    if the quality some years ago wasn't high, the quality I reached in my implementation is very poor, quite unusable.

    [2] https://we.tl/t-RmC6EszYRS
    [3] https://we.tl/t-oVbXFy5twW


    At 32kBit/sec ADPCM is better, but you probably do not intend to use a 64kBit PCM codec as a frontend. If you use a handset or a digital
    PCM-link, the quality of CVSD may be not competitive. For playback via
    a loudspeaker sufficient, there is usually enough background noise.

    My sounds is quite clear, they are generated by a TTS engine. Then they
    are flashed on the chip memory.


    [1]
    def cvsd_encode(samples):
    prev_sample = 0
    step_size = 16
    STEP_SIZE_MIN = 16
    STEP_SIZE_MAX = 16384

    encoded_stream = bytearray()
    encoded_byte = ""
    last_bits = 0x00
    for sample in samples:
    bit = 1 if sample >= prev_sample else 0

    # Aggiorna il valore del campione precedente
    if bit == 1:
    prev_sample += step_size
    else:
    prev_sample -= step_size

    # Adatta la dimensione dello step guardando gli ultimi 3 bit
    last_bits = last_bits << 1
    last_bits += 1 if bit == 1 else 0
    last_bits &= 0x07
    if last_bits == 0x00 or last_bits == 0x07:
    step_size = step_size * 2
    else:
    step_size = step_size // 2
    # Limita la dimensione del passo
    if step_size > STEP_SIZE_MAX:
    step_size = STEP_SIZE_MAX
    elif step_size < STEP_SIZE_MIN:
    step_size = STEP_SIZE_MIN

    encoded_byte += "1" if bit == 1 else "0"
    if len(encoded_byte) == 8:
    encoded_stream += bytes([int(encoded_byte,2)])
    encoded_byte = ""

    return encoded_stream


    def cvsd_decode(bitstream):
    prev_sample = 0
    step_size = 16
    STEP_SIZE_MIN = 16
    STEP_SIZE_MAX = 16384

    samples = []
    last_bits = 0x00
    for byte in bitstream:
    for sbit in f"{byte:08b}":
    bit = 1 if sbit == "1" else 0
    if bit == 1:
    prev_sample += step_size
    else:
    prev_sample -= step_size

    samples += [prev_sample]

    # Adatta la dimensione dello step guardando gli ultimi 3 bit
    last_bits = last_bits << 1
    last_bits += 1 if bit == 1 else 0
    last_bits &= 0x07
    if last_bits == 0x00 or last_bits == 0x07:
    step_size = step_size * 2
    else:
    step_size = step_size // 2
    # Limita la dimensione del passo
    if step_size > STEP_SIZE_MAX:
    step_size = STEP_SIZE_MAX
    elif step_size < STEP_SIZE_MIN:
    step_size = STEP_SIZE_MIN

    return samples

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Fri Apr 4 13:54:22 2025
    From Newsgroup: comp.arch.embedded

    pozz <pozzugno@gmail.com> writes:
    I tried to reduce sampling frequency to 4kHz, but the quality is
    drastically reduced.

    Try 6.5 khz. I'll write a little more later but I've dealt with this
    problem and there are some reasonable approaches.
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From Rafael Deliano@Rafael_Deliano@arcor.de to comp.arch.embedded on Sat Apr 5 11:12:45 2025
    From Newsgroup: comp.arch.embedded

    very poor, quite unusable.

    You are not seriously expecting me to debug your code ?

    CVSD 16kBit was used in the 70ies for military secure communication.
    The then SpaceShuttle ADM ( = CVSD ) is a simple digital implementation, 16kBit i guess.
    Therefore at 16kBit CVSD is usable, but not for public phone system.
    Initial circuits were analog:

    https://get.hidrive.com/5gdAmSyB cvsd-ptarmigan.pdf

    The CML FX209 is an early integrated analog version:

    https://get.hidrive.com/HhS2FWU4 cvsd-steele.pdf

    The Harris HC55564 is a simple digital IC.

    The CML FX609 is the next and final generation with PCM-like
    filter that reduces high frequency noise.

    We did use the Harris. On switching to the FX609 had a test with
    all the employees in the company with handset what they liked
    better: 90:10 for the FX609. The problem with "better" is that
    everyone is accustomed to PCM-filtered speech.

    All these ICs one can get via ebay.com from China.
    Using 2 on breadboards one can build a simple channel that
    "distorts" speech for testing. Reference would be an old
    PCM-chip.

    As for quality: the 64kBit PCM may be the gold standard,
    but the cordless DECT phones use ADPCM at 32kBit
    with hardly any loss of quality. This is not the
    original CCITT-ADPCM that was very complex. But i still
    doubt implementation on an AVR is easy.

    CVSDs i did years/decades ago on PICs/68HC05.

    MfG JRD

    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Mon Apr 7 13:09:15 2025
    From Newsgroup: comp.arch.embedded

    Il 04/04/2025 22:54, Paul Rubin ha scritto:
    pozz <pozzugno@gmail.com> writes:
    I tried to reduce sampling frequency to 4kHz, but the quality is
    drastically reduced.

    Try 6.5 khz. I'll write a little more later but I've dealt with this
    problem and there are some reasonable approaches.

    Yes, reducing a little the sampling freq is a good solution.

    From 8kHz to 6kHz the quality stays acceptable and the bitrate
    decreases from 32kbps to 24kbps.
    --- Synchronet 3.20c-Linux NewsLink 1.2
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Mon Apr 7 13:13:41 2025
    From Newsgroup: comp.arch.embedded

    Il 05/04/2025 11:12, Rafael Deliano ha scritto:
    very poor, quite unusable.

    You are not seriously expecting me to debug your code ?

    I didn't write this.


    CVSD 16kBit was used in the 70ies for military secure communication.
    The then SpaceShuttle ADM ( = CVSD ) is a simple digital implementation, 16kBit i guess.
    Therefore at 16kBit CVSD is usable, but not for public phone system.
    Initial circuits were analog:

    https://get.hidrive.com/5gdAmSyB   cvsd-ptarmigan.pdf

    The CML FX209 is an early integrated analog version:

    https://get.hidrive.com/HhS2FWU4   cvsd-steele.pdf

    The Harris HC55564 is a simple digital IC.

    The CML FX609 is the next and final generation with PCM-like
    filter that reduces high frequency noise.

    We did use the Harris. On switching to the FX609 had a test with
    all the employees in the company with handset what they liked
    better: 90:10 for the FX609. The problem with "better" is that
    everyone is accustomed to PCM-filtered speech.

    All these ICs one can get via ebay.com from China.
    Using 2 on breadboards one can build a simple channel that
    "distorts" speech for testing. Reference would be an old
    PCM-chip.

    As for quality: the 64kBit PCM may be the gold standard,
    but the cordless DECT phones use ADPCM at 32kBit
    with hardly any loss of quality. This is not the
    original CCITT-ADPCM that was very complex. But i still
    doubt implementation on an AVR is easy.

    CVSDs i did years/decades ago on PICs/68HC05.

    From your last post, it wasn't clear to me if CVSD was a real suggested solution for my application. I tried but the quality is very bad (if my
    code is ok). Isn't a solution for me for the audio quality we are used
    to these days.

    It's much better to reduce the sampling frequency from 8kHz to 6kHz
    without touching anything else in the ADPCM playback system.
    --- Synchronet 3.20c-Linux NewsLink 1.2