Forum: War Ensemble BBS

Voice compression

From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Apr 2 18:33:56 2025

From Newsgroup: comp.arch.embedded

I need to manage some audio voice streams. They will be saved on a non volatile memory as raw arrays of unsigned char.

The main goal is to play these audio streams through a DAC/PWM. I'm not interested in high quality, a "mid/low" quality could be good.

My hardware is a poor AVR8 8-bits MCU.

I have a limited memory space, so I'm searching a good voice codec with compression. Lower the bitrate, more streams I can store. 16kbps can be
good for my application. Maximum 24kbps.

By using 4-bits ADPCM I could have 32kpbs at 8kHz sampling rate. With
3-bits ADPCM I could read 24kbps.

I tried to reduce sampling frequency to 4kHz, but the quality is
drastically reduced.

I know there are many others voice codecs that reduce the bitrate a lot,
but the decoder seems too complex to implement on AVR8.

Any suggestions?
--- Synchronet 3.20c-Linux NewsLink 1.2

From Rafael Deliano@Rafael_Deliano@arcor.de to comp.arch.embedded on Wed Apr 2 19:55:39 2025

From Newsgroup: comp.arch.embedded

CVSD uses a bit-serial data stream. Harris datasheets for obsolete
Codecs are HC55516, HC55532. The "recording"-circuit can be an analog
hack ( Kop, flipflop, 4 Bit shiftregister ) that sends data via SPI.
The "playback" would have to emulate this circuit in software and output
via a 8 bit D/A ( R2R resistor network, but serial ICs may be easier in
SMD ).
16kBit/sec is very moderate quality, 24kBit/sec more reasonable.
We used these in the 80ies for digital answering machines in cars for
the analog telephone system via radio that predated GSM in Germany.
24kBit was for incoming messages in RAM, 16 kBit for the fixed messages
from EPROM. CVSD was ok, as the analog radio was a bit noisy
anyway.
At 32kBit/sec ADPCM is better, but you probably do not intend to use a
64kBit PCM codec as a frontend. If you use a handset or a digital
PCM-link, the quality of CVSD may be not competitive. For playback via
a loudspeaker sufficient, there is usually enough background noise.

MfG JRD

--- Synchronet 3.20c-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Thu Apr 3 19:53:28 2025

From Newsgroup: comp.arch.embedded

Il 02/04/2025 19:55, Rafael Deliano ha scritto:

CVSD uses a bit-serial data stream. Harris datasheets for obsolete
Codecs are HC55516, HC55532. The "recording"-circuit can be an analog
hack ( Kop, flipflop, 4 Bit shiftregister ) that sends data via SPI.
The "playback" would have to emulate this circuit in software and output
via a 8 bit D/A ( R2R resistor network, but serial ICs may be easier in
SMD ).
16kBit/sec is very moderate quality, 24kBit/sec more reasonable.
We used these in the 80ies for digital answering machines in cars for
the analog telephone system via radio that predated GSM in Germany.
24kBit was for incoming messages in RAM, 16 kBit for the fixed messages
from EPROM. CVSD was ok, as the analog radio was a bit noisy
anyway.

Thank you for the suggestion. I tried to implement a simple CVSD codec
in Python just to test the quality. I finally got these two functions[1].

I started from this audio[2] and obtained this one[3] after an encoding
and decoding process. It's a short speech from an italian voice. I think
you can see how bad the quality of decoded audio is.

I suspect I made some errors, because I don't think this is the quality
of this audio codec. You said this codec was used in the past, but even
if the quality some years ago wasn't high, the quality I reached in my implementation is very poor, quite unusable.

[2] https://we.tl/t-RmC6EszYRS
[3] https://we.tl/t-oVbXFy5twW

At 32kBit/sec ADPCM is better, but you probably do not intend to use a 64kBit PCM codec as a frontend. If you use a handset or a digital
PCM-link, the quality of CVSD may be not competitive. For playback via
a loudspeaker sufficient, there is usually enough background noise.

My sounds is quite clear, they are generated by a TTS engine. Then they
are flashed on the chip memory.

[1]
def cvsd_encode(samples):
prev_sample = 0
step_size = 16
STEP_SIZE_MIN = 16
STEP_SIZE_MAX = 16384

encoded_stream = bytearray()
encoded_byte = ""
last_bits = 0x00
for sample in samples:
bit = 1 if sample >= prev_sample else 0

# Aggiorna il valore del campione precedente
if bit == 1:
prev_sample += step_size
else:
prev_sample -= step_size

# Adatta la dimensione dello step guardando gli ultimi 3 bit
last_bits = last_bits << 1
last_bits += 1 if bit == 1 else 0
last_bits &= 0x07
if last_bits == 0x00 or last_bits == 0x07:
step_size = step_size * 2
else:
step_size = step_size // 2
# Limita la dimensione del passo
if step_size > STEP_SIZE_MAX:
step_size = STEP_SIZE_MAX
elif step_size < STEP_SIZE_MIN:
step_size = STEP_SIZE_MIN

encoded_byte += "1" if bit == 1 else "0"
if len(encoded_byte) == 8:
encoded_stream += bytes([int(encoded_byte,2)])
encoded_byte = ""

return encoded_stream

def cvsd_decode(bitstream):
prev_sample = 0
step_size = 16
STEP_SIZE_MIN = 16
STEP_SIZE_MAX = 16384

samples = []
last_bits = 0x00
for byte in bitstream:
for sbit in f"{byte:08b}":
bit = 1 if sbit == "1" else 0
if bit == 1:
prev_sample += step_size
else:
prev_sample -= step_size

samples += [prev_sample]

# Adatta la dimensione dello step guardando gli ultimi 3 bit
last_bits = last_bits << 1
last_bits += 1 if bit == 1 else 0
last_bits &= 0x07
if last_bits == 0x00 or last_bits == 0x07:
step_size = step_size * 2
else:
step_size = step_size // 2
# Limita la dimensione del passo
if step_size > STEP_SIZE_MAX:
step_size = STEP_SIZE_MAX
elif step_size < STEP_SIZE_MIN:
step_size = STEP_SIZE_MIN

return samples

--- Synchronet 3.20c-Linux NewsLink 1.2

From Paul Rubin@no.email@nospam.invalid to comp.arch.embedded on Fri Apr 4 13:54:22 2025

From Newsgroup: comp.arch.embedded

pozz <pozzugno@gmail.com> writes:

I tried to reduce sampling frequency to 4kHz, but the quality is
drastically reduced.

Try 6.5 khz. I'll write a little more later but I've dealt with this
problem and there are some reasonable approaches.
--- Synchronet 3.20c-Linux NewsLink 1.2

From Rafael Deliano@Rafael_Deliano@arcor.de to comp.arch.embedded on Sat Apr 5 11:12:45 2025

From Newsgroup: comp.arch.embedded

very poor, quite unusable.

You are not seriously expecting me to debug your code ?

CVSD 16kBit was used in the 70ies for military secure communication.
The then SpaceShuttle ADM ( = CVSD ) is a simple digital implementation, 16kBit i guess.
Therefore at 16kBit CVSD is usable, but not for public phone system.
Initial circuits were analog:

https://get.hidrive.com/5gdAmSyB cvsd-ptarmigan.pdf

The CML FX209 is an early integrated analog version:

https://get.hidrive.com/HhS2FWU4 cvsd-steele.pdf

The Harris HC55564 is a simple digital IC.

The CML FX609 is the next and final generation with PCM-like
filter that reduces high frequency noise.

We did use the Harris. On switching to the FX609 had a test with
all the employees in the company with handset what they liked
better: 90:10 for the FX609. The problem with "better" is that
everyone is accustomed to PCM-filtered speech.

All these ICs one can get via ebay.com from China.
Using 2 on breadboards one can build a simple channel that
"distorts" speech for testing. Reference would be an old
PCM-chip.

As for quality: the 64kBit PCM may be the gold standard,
but the cordless DECT phones use ADPCM at 32kBit
with hardly any loss of quality. This is not the
original CCITT-ADPCM that was very complex. But i still
doubt implementation on an AVR is easy.

CVSDs i did years/decades ago on PICs/68HC05.

MfG JRD

--- Synchronet 3.20c-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Mon Apr 7 13:09:15 2025

From Newsgroup: comp.arch.embedded

Il 04/04/2025 22:54, Paul Rubin ha scritto:

pozz <pozzugno@gmail.com> writes:

I tried to reduce sampling frequency to 4kHz, but the quality is
drastically reduced.

Try 6.5 khz. I'll write a little more later but I've dealt with this
problem and there are some reasonable approaches.

Yes, reducing a little the sampling freq is a good solution.

From 8kHz to 6kHz the quality stays acceptable and the bitrate
decreases from 32kbps to 24kbps.
--- Synchronet 3.20c-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Mon Apr 7 13:13:41 2025

From Newsgroup: comp.arch.embedded

Il 05/04/2025 11:12, Rafael Deliano ha scritto:

very poor, quite unusable.

You are not seriously expecting me to debug your code ?

I didn't write this.

CVSD 16kBit was used in the 70ies for military secure communication.
The then SpaceShuttle ADM ( = CVSD ) is a simple digital implementation, 16kBit i guess.
Therefore at 16kBit CVSD is usable, but not for public phone system.
Initial circuits were analog:

https://get.hidrive.com/5gdAmSyB cvsd-ptarmigan.pdf

The CML FX209 is an early integrated analog version:

https://get.hidrive.com/HhS2FWU4 cvsd-steele.pdf

The Harris HC55564 is a simple digital IC.

The CML FX609 is the next and final generation with PCM-like
filter that reduces high frequency noise.

We did use the Harris. On switching to the FX609 had a test with
all the employees in the company with handset what they liked
better: 90:10 for the FX609. The problem with "better" is that
everyone is accustomed to PCM-filtered speech.

All these ICs one can get via ebay.com from China.
Using 2 on breadboards one can build a simple channel that
"distorts" speech for testing. Reference would be an old
PCM-chip.

As for quality: the 64kBit PCM may be the gold standard,
but the cordless DECT phones use ADPCM at 32kBit
with hardly any loss of quality. This is not the
original CCITT-ADPCM that was very complex. But i still
doubt implementation on an AVR is easy.

CVSDs i did years/decades ago on PICs/68HC05.

From your last post, it wasn't clear to me if CVSD was a real suggested solution for my application. I tried but the quality is very bad (if my
code is ok). Isn't a solution for me for the audio quality we are used
to these days.

It's much better to reduce the sampling frequency from 8kHz to 6kHz
without touching anything else in the ADPCM playback system.
--- Synchronet 3.20c-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Ptb1970
  Sat Dec 13 17:34:42 2025
  from Wisconsin via Telnet
- Microbot
  Sat Dec 13 17:04:31 2025
  from Moore, Ok via Telnet
- John F Kennedy
  Fri Dec 12 21:48:00 2025
  from Crazyworldbbs.Com:2323 via Telnet
- Microbot
  Fri Dec 12 18:16:00 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,089
Nodes:	10 (0 / 10)
Uptime:	153:51:42
Calls:	13,921
Calls today:	2
Files:	187,021
D/L today:	3,755 files (944M bytes)
Messages:	2,457,163

Voice compression

Who's Online

Recent Visitors

System Info