Forum: War Ensemble BBS

How to add the second (or other) languages

From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Feb 12 17:26:26 2025

From Newsgroup: comp.arch.embedded

I have an embedded project that runs on a platform without a fully OS
(bare metal). The application can interact with humans with italian
messages. These messages are displayed on a touch screen, sent in the
payload of SMS or push notifications.

I used a very stupid approach: sprintf() with hard-coded constant
strings. For example:

void
display_event(Event *ev)
{
if (ev->type == EVENT_TYPE_ON) {
display_printf("Evento %d: accensione", ev->idx);
} else ...
...
}

Now I want to add a new language.

I could create a new build that replaces the constant strings at
preprocessor time:

#if LANGUAGE_ITALIAN
# define STRING123 "Evento %d: accensione"
#elif LANGUAGE_ENGLISH
# define STRING123 "Event %d: power up"
#endif

void
display_event(Event *ev)
{
if (ev->type == EVENT_TYPE_ON) {
display_printf(STRING123, ev->idx);
} else ...
...
}

This way I can save some space in memory, but I will have two completely different production binary for the two languages.

Another approach is giving the user the possibility to change the
language at runtime, maybe with an option on the display. In some cases,
I have enough memory to store all the strings in all languages.

I know there are many possible solutions, but I'd like to know some suggestions from you. For example, it could be nice if there was some
tool that automatically extracts all the strings used in the source code
and helps managing more languages.
--- Synchronet 3.20c-Linux NewsLink 1.2

From Stefan Reuther@stefan.news@arcor.de to comp.arch.embedded on Wed Feb 12 18:14:26 2025

From Newsgroup: comp.arch.embedded

Am 12.02.2025 um 17:26 schrieb pozz:

#if LANGUAGE_ITALIAN
# define STRING123 "Evento %d: accensione"
#elif LANGUAGE_ENGLISH
# define STRING123 "Event %d: power up"
#endif

[...]

Another approach is giving the user the possibility to change the
language at runtime, maybe with an option on the display. In some cases,
I have enough memory to store all the strings in all languages.

Put the strings into a structure.

struct Strings {
const char* power_up_message;
};

I hate global variables, so I pass a pointer to the structure to every
function that needs it (but of course you can also make a global variable).

Then, on language change, just point your structure pointer elsewhere,
or load the strings from secondary storage.

One disadvantage is that this loses you the compiler warnings for
mismatching printf specifiers.

I know there are many possible solutions, but I'd like to know some suggestions from you. For example, it could be nice if there was some
tool that automatically extracts all the strings used in the source code
and helps managing more languages.

There's packages like gettext. You tag your strings as
'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
all into a .po file. Other tools help you manage these files (e.g.
'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
warnings.

The .po file is a mapping from English to Whateverish strings. So you
would convert that into some space-efficient resource file, and
implement the '_' macro/function to perform the mapping. The
disadvantage is that this takes lot of memory because your app needs to
have both the English and the translated strings in memory. But unless
you also use a fancy preprocessor that translates your code to 'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
you might come up with some compile-time hashing...

I wouldn't use that on a microcontroller, but it's nice for desktop apps.

Stefan
--- Synchronet 3.20c-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Feb 12 20:50:18 2025

From Newsgroup: comp.arch.embedded

On 12/02/2025 18:14, Stefan Reuther wrote:

Am 12.02.2025 um 17:26 schrieb pozz:

#if LANGUAGE_ITALIAN
# define STRING123 "Evento %d: accensione"
#elif LANGUAGE_ENGLISH
# define STRING123 "Event %d: power up"
#endif

[...]

Another approach is giving the user the possibility to change the
language at runtime, maybe with an option on the display. In some cases,
I have enough memory to store all the strings in all languages.

Put the strings into a structure.

struct Strings {
const char* power_up_message;
};

I hate global variables, so I pass a pointer to the structure to every function that needs it (but of course you can also make a global variable).

Then, on language change, just point your structure pointer elsewhere,
or load the strings from secondary storage.

One disadvantage is that this loses you the compiler warnings for
mismatching printf specifiers.

I know there are many possible solutions, but I'd like to know some
suggestions from you. For example, it could be nice if there was some
tool that automatically extracts all the strings used in the source code
and helps managing more languages.

There's packages like gettext. You tag your strings as
'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
all into a .po file. Other tools help you manage these files (e.g. 'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
warnings.

The .po file is a mapping from English to Whateverish strings. So you
would convert that into some space-efficient resource file, and
implement the '_' macro/function to perform the mapping. The
disadvantage is that this takes lot of memory because your app needs to
have both the English and the translated strings in memory. But unless
you also use a fancy preprocessor that translates your code to 'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
you might come up with some compile-time hashing...

I wouldn't use that on a microcontroller, but it's nice for desktop apps.

Stefan

You don't need a very fancy pre-processor to handle this yourself, if
you are happy to make a few changes to the code. Have your code use
something like :

#define DisplayPrintf(id, desc, args...) \
display_printf(strings[language][string_ ## id], ## x)

Use it like :

DisplayPrintf(event_type_on, "Event on", ev->idx);

A little Python preprocessor script can chew through all your C files
and identify each call to "DisplayPrintf". It can collect together all
the id's and generate a header with something like :

typedef enum {
string_event_type_on, ...
} string_index;
enum { no_of_strings = ... };

enum {
lang_English, lang_Italian, ...
} language_index;
enum { no_of_languages = ... };

extern language_index language; // global var :-)
extern const char* strings[no_of_languages][no_of_strings];

Then it will have a C file :

#include "language.h"

language_index language;
const char* strings[no_of_languages][no_of_strings] = {
{ // English
"Event %d: power up", // Event on
...
}
{ // Italian
"Evento %d: accensione", // Event on
}
}

It would generate the strings based on language files:

# english.txt
event_type_on : Event %d: power up
...

If the preprocessor finds a use of DisplayPrintf where the id (which can
be as long or short as you want, but can't have spaces or awkward
characters) does not match the description, it should give an error - duplicate uses of the same pair are skipped. (You could just use an id
and no description if you prefer.)

Any ids that are not in the language files will be printed out or put in
a file, ids that are in the language files but not used in the program
will give warnings, etc.

It can all be done in a manner that makes it easy to get right, hard to
get wrong, and will not cause trouble as strings are added or removed.

It would be a lot simpler than gettext, and use minimal runtime space
and time. And it should be straightforward to change if you want to
have string tables stored externally or something like that. (I've made systems with string tables in an external serial eprom, for example.)

--- Synchronet 3.20c-Linux NewsLink 1.2

From =?UTF-8?Q?Niocl=C3=A1i=C5=BF=C3=ADn_C=C3=B3il=C3=ADn_de_=C4=A0lo?==?UTF-8?Q?=C5=BFt=C3=A9ir?=@Master_Fontaine_is_dishonest@Strand_in_London.Gov.UK to comp.arch.embedded on Thu Feb 13 22:51:10 2025

From Newsgroup: comp.arch.embedded

Pozz ha scritto:
"Another approach is giving the user the possibility to change the
language at
runtime, maybe with an option on the display."

Ciao!

This is a good idea.
--- Synchronet 3.20c-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Sun Feb 16 19:59:58 2025

From Newsgroup: comp.arch.embedded

Il 12/02/2025 20:50, David Brown ha scritto:

On 12/02/2025 18:14, Stefan Reuther wrote:

Am 12.02.2025 um 17:26 schrieb pozz:

#if LANGUAGE_ITALIAN
# define STRING123            "Evento %d: accensione"
#elif LANGUAGE_ENGLISH
# define STRING123            "Event %d: power up"
#endif

[...]

Another approach is giving the user the possibility to change the
language at runtime, maybe with an option on the display. In some cases, >>> I have enough memory to store all the strings in all languages.

Put the strings into a structure.

   struct Strings {
       const char* power_up_message;
   };

I hate global variables, so I pass a pointer to the structure to every
function that needs it (but of course you can also make a global
variable).

Then, on language change, just point your structure pointer elsewhere,
or load the strings from secondary storage.

One disadvantage is that this loses you the compiler warnings for
mismatching printf specifiers.

I know there are many possible solutions, but I'd like to know some
suggestions from you. For example, it could be nice if there was some
tool that automatically extracts all the strings used in the source code >>> and helps managing more languages.

There's packages like gettext. You tag your strings as
'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
all into a .po file. Other tools help you manage these files (e.g.
'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
warnings.

The .po file is a mapping from English to Whateverish strings. So you
would convert that into some space-efficient resource file, and
implement the '_' macro/function to perform the mapping. The
disadvantage is that this takes lot of memory because your app needs to
have both the English and the translated strings in memory. But unless
you also use a fancy preprocessor that translates your code to
'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
you might come up with some compile-time hashing...

I wouldn't use that on a microcontroller, but it's nice for desktop apps.

   Stefan

You don't need a very fancy pre-processor to handle this yourself, if
you are happy to make a few changes to the code. Have your code use something like :

#define DisplayPrintf(id, desc, args...) \
    display_printf(strings[language][string_ ## id], ## x)

Use it like :

    DisplayPrintf(event_type_on, "Event on", ev->idx);

A little Python preprocessor script can chew through all your C files
and identify each call to "DisplayPrintf".

Little... yes, it would be little, but not simple, at least for me. How
to write a correct C preprocessor in Python?

This preprocessor should ingest a C source file after it is preprocessed
by the standard C preprocessor for the specific build you are doing.

For example, you could have a C source file that contains:

#if BUILD == BUILD_FULL
DisplayPrintf(msg, "Press (1) for simple process, (2) for advanced process");
x = wait_keypress();
if (x == '1') do_simple();
if (x == '2') do_adv();
#elif BUILD == BUILD_LIGHT
do_simple();
#endif

If I'm building the project as BUILD_FULL, there's at least one
additional string to translate.

Another big problem is the Python preprocessor should understand C
syntax; it shouldn't simply search for DisplayPrintf occurrences.
For example:

/* DisplayPrintf(old_string, "This is an old message"); */ DisplayPrintf(new_string, "This is a new message");

Of course, only one string is present in the source file, but it's not
simple to extract it.

It can collect together all
the id's and generate a header with something like :

    typedef enum {
        string_event_type_on, ...
    } string_index;
    enum { no_of_strings = ... };

    enum {
        lang_English, lang_Italian, ...
    } language_index;
    enum { no_of_languages = ... };

    extern language_index language;        // global var :-)
    extern const char* strings[no_of_languages][no_of_strings];

Then it will have a C file :

    #include "language.h"

    language_index language;
    const char* strings[no_of_languages][no_of_strings] = {
    {    // English
        "Event %d: power up",        // Event on
        ...
    }
    {    // Italian
        "Evento %d: accensione",    // Event on
    }
    }

It would generate the strings based on language files:

    # english.txt
    event_type_on : Event %d: power up
    ...

If the preprocessor finds a use of DisplayPrintf where the id (which can
be as long or short as you want, but can't have spaces or awkward characters) does not match the description, it should give an error - duplicate uses of the same pair are skipped. (You could just use an id
and no description if you prefer.)

Any ids that are not in the language files will be printed out or put in
a file, ids that are in the language files but not used in the program
will give warnings, etc.

It can all be done in a manner that makes it easy to get right, hard to
get wrong, and will not cause trouble as strings are added or removed.

It would be a lot simpler than gettext, and use minimal runtime space
and time. And it should be straightforward to change if you want to
have string tables stored externally or something like that. (I've made systems with string tables in an external serial eprom, for example.)

Thanks for the suggestion, the idea is great. However I'm not able to
write a Python preprocessor that works well.

--- Synchronet 3.20c-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Sun Feb 16 22:56:55 2025

From Newsgroup: comp.arch.embedded

Il 12/02/2025 20:50, David Brown ha scritto:

On 12/02/2025 18:14, Stefan Reuther wrote:

Am 12.02.2025 um 17:26 schrieb pozz:

#if LANGUAGE_ITALIAN
# define STRING123            "Evento %d: accensione"
#elif LANGUAGE_ENGLISH
# define STRING123            "Event %d: power up"
#endif

[...]

Another approach is giving the user the possibility to change the
language at runtime, maybe with an option on the display. In some cases, >>> I have enough memory to store all the strings in all languages.

Put the strings into a structure.

   struct Strings {
       const char* power_up_message;
   };

I hate global variables, so I pass a pointer to the structure to every
function that needs it (but of course you can also make a global
variable).

Then, on language change, just point your structure pointer elsewhere,
or load the strings from secondary storage.

One disadvantage is that this loses you the compiler warnings for
mismatching printf specifiers.

I know there are many possible solutions, but I'd like to know some
suggestions from you. For example, it could be nice if there was some
tool that automatically extracts all the strings used in the source code >>> and helps managing more languages.

There's packages like gettext. You tag your strings as
'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
all into a .po file. Other tools help you manage these files (e.g.
'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
warnings.

The .po file is a mapping from English to Whateverish strings. So you
would convert that into some space-efficient resource file, and
implement the '_' macro/function to perform the mapping. The
disadvantage is that this takes lot of memory because your app needs to
have both the English and the translated strings in memory. But unless
you also use a fancy preprocessor that translates your code to
'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
you might come up with some compile-time hashing...

I wouldn't use that on a microcontroller, but it's nice for desktop apps.

   Stefan

You don't need a very fancy pre-processor to handle this yourself, if
you are happy to make a few changes to the code. Have your code use something like :

#define DisplayPrintf(id, desc, args...) \
    display_printf(strings[language][string_ ## id], ## x)

What is the final "## x"?

Use it like :

    DisplayPrintf(event_type_on, "Event on", ev->idx);

Other problems that came to my mind.

There are many functions that accept "translatable" strings, not only DisplayPrintf(). Ok, I can write a macro for each of these functions.

I could have other C instructions that let the task more complex. For
example:

char msg[32];
sprintf(mymsg, "Ciao mondo");
DisplayPrintf(hello_msg, mymsg);

Python preprocessor isn't able to detect where is the string to translate.

--- Synchronet 3.20c-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Sun Feb 16 23:15:21 2025

From Newsgroup: comp.arch.embedded

Il 12/02/2025 18:14, Stefan Reuther ha scritto:

Am 12.02.2025 um 17:26 schrieb pozz:

#if LANGUAGE_ITALIAN
# define STRING123 "Evento %d: accensione"
#elif LANGUAGE_ENGLISH
# define STRING123 "Event %d: power up"
#endif

[...]

Another approach is giving the user the possibility to change the
language at runtime, maybe with an option on the display. In some cases,
I have enough memory to store all the strings in all languages.

Put the strings into a structure.

struct Strings {
const char* power_up_message;
};

I hate global variables, so I pass a pointer to the structure to every function that needs it (but of course you can also make a global variable).

Then, on language change, just point your structure pointer elsewhere,
or load the strings from secondary storage.

One disadvantage is that this loses you the compiler warnings for
mismatching printf specifiers.

I know there are many possible solutions, but I'd like to know some
suggestions from you. For example, it could be nice if there was some
tool that automatically extracts all the strings used in the source code
and helps managing more languages.

There's packages like gettext. You tag your strings as
'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
all into a .po file. Other tools help you manage these files (e.g. 'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
warnings.

The .po file is a mapping from English to Whateverish strings. So you
would convert that into some space-efficient resource file, and
implement the '_' macro/function to perform the mapping. The
disadvantage is that this takes lot of memory because your app needs to
have both the English and the translated strings in memory. But unless
you also use a fancy preprocessor that translates your code to 'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
you might come up with some compile-time hashing...

I wouldn't use that on a microcontroller, but it's nice for desktop apps.

In some projects keeping all the translated strings is not a problem.

All the gettext tools seem good (xgettext, marking strings to translate
in the source code, pot file, msginit, msgmerge, msgfmt, po files, mo
files, ..) except the final step.

mo files should be installed in a file-system and gettext library automatically loads the correct .mo file from a suitable path. All these things are impractical on microcontroller systems.

Is it so difficult to import mo files as C const unsigned char arrays
and implement the gettext() function to search strings from them?

Another approach could be to rewrite a custom msgfmt tool that converts
a .po file into a simpler .mo file (or directly a .c file) that can be
used by a custom gettext() function.
--- Synchronet 3.20c-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Mon Feb 17 09:51:05 2025

From Newsgroup: comp.arch.embedded

On 16/02/2025 19:59, pozz wrote:

Il 12/02/2025 20:50, David Brown ha scritto:

You don't need a very fancy pre-processor to handle this yourself, if
you are happy to make a few changes to the code. Have your code use
something like :

#define DisplayPrintf(id, desc, args...) \
display_printf(strings[language][string_ ## id], ## x)

Use it like :

DisplayPrintf(event_type_on, "Event on", ev->idx);

A little Python preprocessor script can chew through all your C files
and identify each call to "DisplayPrintf".

Little... yes, it would be little, but not simple, at least for me. How
to write a correct C preprocessor in Python?

You don't write a C preprocessor - that's the point.

Tools like gettext have to handle any C code. That means they need to
deal with situations with complicated macros, include files, etc.

You don't need to do that when you make your own tools. You make the
rules - /you/ decide what limitations you will accept in order to
simplify the pre-processing script.

So you would typically decide you only put these DisplayPrintf calls in
C files, not headers, that you ignore all normal C preprocessor stuff,
and that you keep each call entirely on one line, and that you'll never
use the sequence "DisplayPrintf" for anything else. Then your Python preprocessor becomes :

for this_line in open(filename).readlines() :
if "DisplayPrintf" in line :
handle(line)

This is /vastly/ simpler than dealing with more general C code, without significant restrictions to you as the programmer using the system.

If you /really/ want to handle include files, conditional compilation
and all rest of it, get the C compiler to handle that - use "gcc -E" and
use the output of that. Trying to duplicate that in your own Python
code would be insane.

This preprocessor should ingest a C source file after it is preprocessed
by the standard C preprocessor for the specific build you are doing.

For example, you could have a C source file that contains:

#if BUILD == BUILD_FULL
DisplayPrintf(msg, "Press (1) for simple process, (2) for advanced process");
x = wait_keypress();
if (x == '1') do_simple();
if (x == '2') do_adv();
#elif BUILD == BUILD_LIGHT
do_simple();
#endif

The really simple answer is, don't do that.

If I'm building the project as BUILD_FULL, there's at least one
additional string to translate.

The slightly more complex answer is that you end up with an extra string
in one build or the other. Almost certainly, this is not worth
bothering about. And if it is - say you have a large number of extra
strings in a debug test build - then I'm sure you can find convenient
ways to handle that. At a minimum, you'd probably not bother having translated versions but fall back to English.

Another big problem is the Python preprocessor should understand C
syntax; it shouldn't simply search for DisplayPrintf occurrences.

Why not?

For example:

/* DisplayPrintf(old_string, "This is an old message"); */ DisplayPrintf(new_string, "This is a new message");

Of course, only one string is present in the source file, but it's not simple to extract it.

It's extremely simple to extract it. Remember - /you/ make the rules.
If you don't want to bother skipping such commented-out lines, /you/
pick a convenient way to do so. For example, you would decide that the opening comment token must be at the start of the white-space stripped
line :

if line.strip().startswith("/*") :
return False

if line.strip().startswith("//") :
return False

(I've been talking about Python here, because that's the language I use
for such tools, and it's a very common choice. If you are not familiar
with Python then you can obviously use any other language you like.)

Or alternatively, have :

#define XDisplayPrintf(...)

And now your commenting system becomes :

XDisplayPrintf(old_string, "This is an old message");
DisplayPrintf(new_string, "This is a new message");

The "XDisplayPrintf" can be inside comments or conditionally uncompiled
code if you like. (You do have to filter out XDisplayPrintf bits from
the earlier check for DisplayPrintf.)

Thanks for the suggestion, the idea is great. However I'm not able to
write a Python preprocessor that works well.

Sure you can. You just have to redefine what you mean by "works well"
to suit what you can write :-)

For my own use, I probably wouldn't even bother handling commented-out strings. I have used this kind of technique for message translation and
a variety of other situations.

For more fun, you could switch to modern C++ and use user-defined
literals combined with constexpr template variables to put together a
system that is all within the one source language and is fully checked
at compile-time. I'm not sure it would be clearer, however!

--- Synchronet 3.20c-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Mon Feb 17 09:57:35 2025

From Newsgroup: comp.arch.embedded

On 16/02/2025 22:56, pozz wrote:

Il 12/02/2025 20:50, David Brown ha scritto:

On 12/02/2025 18:14, Stefan Reuther wrote:

Am 12.02.2025 um 17:26 schrieb pozz:

#if LANGUAGE_ITALIAN
# define STRING123            "Evento %d: accensione"
#elif LANGUAGE_ENGLISH
# define STRING123            "Event %d: power up"
#endif

[...]

Another approach is giving the user the possibility to change the
language at runtime, maybe with an option on the display. In some
cases,
I have enough memory to store all the strings in all languages.

Put the strings into a structure.

   struct Strings {
       const char* power_up_message;
   };

I hate global variables, so I pass a pointer to the structure to every
function that needs it (but of course you can also make a global
variable).

Then, on language change, just point your structure pointer elsewhere,
or load the strings from secondary storage.

One disadvantage is that this loses you the compiler warnings for
mismatching printf specifiers.

I know there are many possible solutions, but I'd like to know some
suggestions from you. For example, it could be nice if there was some
tool that automatically extracts all the strings used in the source
code
and helps managing more languages.

There's packages like gettext. You tag your strings as
'printf(_("Event %d"), e)', and the 'xgettext' command will extract them >>> all into a .po file. Other tools help you manage these files (e.g.
'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
warnings.

The .po file is a mapping from English to Whateverish strings. So you
would convert that into some space-efficient resource file, and
implement the '_' macro/function to perform the mapping. The
disadvantage is that this takes lot of memory because your app needs to
have both the English and the translated strings in memory. But unless
you also use a fancy preprocessor that translates your code to
'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20, >>> you might come up with some compile-time hashing...

I wouldn't use that on a microcontroller, but it's nice for desktop
apps.

   Stefan

You don't need a very fancy pre-processor to handle this yourself, if
you are happy to make a few changes to the code. Have your code use
something like :

#define DisplayPrintf(id, desc, args...) \
     display_printf(strings[language][string_ ## id], ## x)

What is the final "## x"?

It's a gcc extension that skips the extra comma if args is empty
(combined with a typo in my post - "x" should have been "args").

If you want to stick to standard C, C23 introduced the __VA_OPT__
feature to handle this in a less convenient manner.

Use it like :

     DisplayPrintf(event_type_on, "Event on", ev->idx);

Other problems that came to my mind.

There are many functions that accept "translatable" strings, not only DisplayPrintf(). Ok, I can write a macro for each of these functions.

Yes.

Or write a single macro for the translation, and use that within those functions:

DisplayPrintf(trans(event_type_on, "Event on"), ev->idx);

I could have other C instructions that let the task more complex. For example:

char msg[32];
sprintf(mymsg, "Ciao mondo");
DisplayPrintf(hello_msg, mymsg);

Python preprocessor isn't able to detect where is the string to translate.

So don't write your code that way.

--- Synchronet 3.20c-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Mon Feb 17 09:59:50 2025

From Newsgroup: comp.arch.embedded

On 16/02/2025 23:15, pozz wrote:

Il 12/02/2025 18:14, Stefan Reuther ha scritto:

Am 12.02.2025 um 17:26 schrieb pozz:

#if LANGUAGE_ITALIAN
# define STRING123            "Evento %d: accensione"
#elif LANGUAGE_ENGLISH
# define STRING123            "Event %d: power up"
#endif

[...]

Another approach is giving the user the possibility to change the
language at runtime, maybe with an option on the display. In some cases, >>> I have enough memory to store all the strings in all languages.

Put the strings into a structure.

   struct Strings {
       const char* power_up_message;
   };

I hate global variables, so I pass a pointer to the structure to every
function that needs it (but of course you can also make a global
variable).

Then, on language change, just point your structure pointer elsewhere,
or load the strings from secondary storage.

One disadvantage is that this loses you the compiler warnings for
mismatching printf specifiers.

I know there are many possible solutions, but I'd like to know some
suggestions from you. For example, it could be nice if there was some
tool that automatically extracts all the strings used in the source code >>> and helps managing more languages.

There's packages like gettext. You tag your strings as
'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
all into a .po file. Other tools help you manage these files (e.g.
'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
warnings.

The .po file is a mapping from English to Whateverish strings. So you
would convert that into some space-efficient resource file, and
implement the '_' macro/function to perform the mapping. The
disadvantage is that this takes lot of memory because your app needs to
have both the English and the translated strings in memory. But unless
you also use a fancy preprocessor that translates your code to
'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
you might come up with some compile-time hashing...

I wouldn't use that on a microcontroller, but it's nice for desktop apps.

In some projects keeping all the translated strings is not a problem.

All the gettext tools seem good (xgettext, marking strings to translate
in the source code, pot file, msginit, msgmerge, msgfmt, po files, mo
files, ..) except the final step.

mo files should be installed in a file-system and gettext library automatically loads the correct .mo file from a suitable path. All these things are impractical on microcontroller systems.

Is it so difficult to import mo files as C const unsigned char arrays
and implement the gettext() function to search strings from them?

You know the answer... a little Python script that reads mo files and generates files with C constant arrays. You'd also probably need to
make a few changes to the gettext language choice functions. (I've used gettext with big Python programs, but never in embedded C code.)

Another approach could be to rewrite a custom msgfmt tool that converts
a .po file into a simpler .mo file (or directly a .c file) that can be
used by a custom gettext() function.

--- Synchronet 3.20c-Linux NewsLink 1.2

From pozz@pozzugno@gmail.com to comp.arch.embedded on Mon Feb 17 16:05:30 2025

From Newsgroup: comp.arch.embedded

Il 17/02/2025 09:51, David Brown ha scritto:

On 16/02/2025 19:59, pozz wrote:

Il 12/02/2025 20:50, David Brown ha scritto:

You don't need a very fancy pre-processor to handle this yourself, if
you are happy to make a few changes to the code. Have your code use
something like :

#define DisplayPrintf(id, desc, args...) \
     display_printf(strings[language][string_ ## id], ## x)

Use it like :

     DisplayPrintf(event_type_on, "Event on", ev->idx);

A little Python preprocessor script can chew through all your C files
and identify each call to "DisplayPrintf".

Little... yes, it would be little, but not simple, at least for me.
How to write a correct C preprocessor in Python?

You don't write a C preprocessor - that's the point.

Tools like gettext have to handle any C code. That means they need to
deal with situations with complicated macros, include files, etc.

You don't need to do that when you make your own tools. You make the
rules - /you/ decide what limitations you will accept in order to
simplify the pre-processing script.

So you would typically decide you only put these DisplayPrintf calls in
C files, not headers, that you ignore all normal C preprocessor stuff,
and that you keep each call entirely on one line, and that you'll never
use the sequence "DisplayPrintf" for anything else. Then your Python preprocessor becomes :

    for this_line in open(filename).readlines() :
        if "DisplayPrintf" in line :
            handle(line)

This is /vastly/ simpler than dealing with more general C code, without significant restrictions to you as the programmer using the system.

If you /really/ want to handle include files, conditional compilation
and all rest of it, get the C compiler to handle that - use "gcc -E" and
use the output of that. Trying to duplicate that in your own Python
code would be insane.

And this is the reason why it appeared to me a complex task :-)

You're right, this is my own tool and I decide the rules. Many times I
try to solve the complete and general problem when, in the reality, the
border of the the problem is much smaller.

The only drawback is that YOU (and all the developers that work on the
project now and in the future) have to remember your own rules forever
for that project.

This preprocessor should ingest a C source file after it is
preprocessed by the standard C preprocessor for the specific build you
are doing.

For example, you could have a C source file that contains:

#if BUILD == BUILD_FULL
   DisplayPrintf(msg, "Press (1) for simple process, (2) for advanced
process");
   x = wait_keypress();
   if (x == '1') do_simple();
   if (x == '2') do_adv();
#elif BUILD == BUILD_LIGHT
   do_simple();
#endif

The really simple answer is, don't do that.

If I'm building the project as BUILD_FULL, there's at least one
additional string to translate.

The slightly more complex answer is that you end up with an extra string
in one build or the other. Almost certainly, this is not worth
bothering about.

Oh yes, but that was only an example. We can think of other scenarios
where the preprocessor could change the string depending on the build.

And if it is - say you have a large number of extra
strings in a debug test build - then I'm sure you can find convenient
ways to handle that. At a minimum, you'd probably not bother having translated versions but fall back to English.

Another big problem is the Python preprocessor should understand C
syntax; it shouldn't simply search for DisplayPrintf occurrences.

Why not?

For example:

/* DisplayPrintf(old_string, "This is an old message"); */
DisplayPrintf(new_string, "This is a new message");

Of course, only one string is present in the source file, but it's not
simple to extract it.

It's extremely simple to extract it. Remember - /you/ make the rules.
If you don't want to bother skipping such commented-out lines, /you/
pick a convenient way to do so. For example, you would decide that the opening comment token must be at the start of the white-space stripped
line :

    if line.strip().startswith("/*") :
        return False

    if line.strip().startswith("//") :
        return False

I see, other rules: don't use multi-line comments, comments that start
in the middle of a line...

(I've been talking about Python here, because that's the language I use
for such tools, and it's a very common choice. If you are not familiar with Python then you can obviously use any other language you like.)

Python is fine for me too :-)

Or alternatively, have :

    #define XDisplayPrintf(...)

And now your commenting system becomes :

    XDisplayPrintf(old_string, "This is an old message");
    DisplayPrintf(new_string, "This is a new message");

The "XDisplayPrintf" can be inside comments or conditionally uncompiled
code if you like. (You do have to filter out XDisplayPrintf bits from
the earlier check for DisplayPrintf.)

We are always talking about rules. In this case, if you comment DisplayPrintf() put a leading X.

Thanks for the suggestion, the idea is great. However I'm not able to
write a Python preprocessor that works well.

Sure you can. You just have to redefine what you mean by "works well"
to suit what you can write :-)

For my own use, I probably wouldn't even bother handling commented-out strings. I have used this kind of technique for message translation and
a variety of other situations.

For more fun, you could switch to modern C++ and use user-defined
literals combined with constexpr template variables to put together a
system that is all within the one source language and is fully checked
at compile-time. I'm not sure it would be clearer, however!

--- Synchronet 3.20c-Linux NewsLink 1.2

From Stefan Reuther@stefan.news@arcor.de to comp.arch.embedded on Mon Feb 17 19:00:43 2025

From Newsgroup: comp.arch.embedded

Am 16.02.2025 um 23:15 schrieb pozz:

Another approach could be to rewrite a custom msgfmt tool that converts
a .po file into a simpler .mo file (or directly a .c file) that can be
used by a custom gettext() function.

That's precisely what I tried to suggest (and personally use).

Stefan
--- Synchronet 3.20c-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Mon Feb 17 19:09:12 2025

From Newsgroup: comp.arch.embedded

On 17/02/2025 16:05, pozz wrote:

Il 17/02/2025 09:51, David Brown ha scritto:

On 16/02/2025 19:59, pozz wrote:

Il 12/02/2025 20:50, David Brown ha scritto:

You don't need a very fancy pre-processor to handle this yourself,
if you are happy to make a few changes to the code. Have your code
use something like :

#define DisplayPrintf(id, desc, args...) \
     display_printf(strings[language][string_ ## id], ## x)

Use it like :

     DisplayPrintf(event_type_on, "Event on", ev->idx);

A little Python preprocessor script can chew through all your C
files and identify each call to "DisplayPrintf".

Little... yes, it would be little, but not simple, at least for me.
How to write a correct C preprocessor in Python?

You don't write a C preprocessor - that's the point.

Tools like gettext have to handle any C code. That means they need to
deal with situations with complicated macros, include files, etc.

You don't need to do that when you make your own tools. You make the
rules - /you/ decide what limitations you will accept in order to
simplify the pre-processing script.

So you would typically decide you only put these DisplayPrintf calls
in C files, not headers, that you ignore all normal C preprocessor
stuff, and that you keep each call entirely on one line, and that
you'll never use the sequence "DisplayPrintf" for anything else. Then
your Python preprocessor becomes :

     for this_line in open(filename).readlines() :
         if "DisplayPrintf" in line :
             handle(line)

This is /vastly/ simpler than dealing with more general C code,
without significant restrictions to you as the programmer using the
system.

If you /really/ want to handle include files, conditional compilation
and all rest of it, get the C compiler to handle that - use "gcc -E"
and use the output of that. Trying to duplicate that in your own
Python code would be insane.

And this is the reason why it appeared to me a complex task :-)

You're right, this is my own tool and I decide the rules. Many times I
try to solve the complete and general problem when, in the reality, the border of the the problem is much smaller.

The only drawback is that YOU (and all the developers that work on the project now and in the future) have to remember your own rules forever
for that project.

This is embedded development. It is not always easy or straightforward.
When a problem seems difficult, re-arrange it or subdivide it into
things that you /can/ solve. Here I've given one solution (of many
possible solutions) - it makes some things easier, but also requires
other changes. You can use a big, general solution like gettext and
document how that should work in your development, or you can make a
much smaller and simpler, but more limited, custom solution and document /that/. There are /always/ pros and cons, tradeoffs and balances in
this game.

This preprocessor should ingest a C source file after it is
preprocessed by the standard C preprocessor for the specific build
you are doing.

For example, you could have a C source file that contains:

#if BUILD == BUILD_FULL
   DisplayPrintf(msg, "Press (1) for simple process, (2) for advanced >>> process");
   x = wait_keypress();
   if (x == '1') do_simple();
   if (x == '2') do_adv();
#elif BUILD == BUILD_LIGHT
   do_simple();
#endif

The really simple answer is, don't do that.

If I'm building the project as BUILD_FULL, there's at least one
additional string to translate.

The slightly more complex answer is that you end up with an extra
string in one build or the other. Almost certainly, this is not worth
bothering about.

Oh yes, but that was only an example. We can think of other scenarios
where the preprocessor could change the string depending on the build.

As the saying goes, you can burn that bridge when you come to it.
Imagining all the possible ways things can go wrong or be complicated
can be a lot more effort than getting a solution for the actual
practical situation.

I am not guaranteeing that my ideas here will be ideal for your needs.
But it is roughly in the direction of a system that I have used
successfully myself, and it's where I would start out in the situation
you described. Hopefully it gives you a good starting point for your
own solution - or at least something to compare to other potential
solutions when judging them.

--- Synchronet 3.20c-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Ptb1970
  Sat Dec 13 17:34:42 2025
  from Wisconsin via Telnet
- Microbot
  Sat Dec 13 17:04:31 2025
  from Moore, Ok via Telnet
- John F Kennedy
  Fri Dec 12 21:48:00 2025
  from Crazyworldbbs.Com:2323 via Telnet
- Microbot
  Fri Dec 12 18:16:00 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,089
Nodes:	10 (0 / 10)
Uptime:	153:52:05
Calls:	13,921
Calls today:	2
Files:	187,021
D/L today:	3,756 files (944M bytes)
Messages:	2,457,163

How to add the second (or other) languages

Who's Online

Recent Visitors

System Info