• Conversion between wchar_t and UTF-8 on C level

    From Torsten@Torsten@example.com to comp.lang.tcl on Fri Nov 28 15:41:02 2025
    From Newsgroup: comp.lang.tcl

    Hi,

    I am working on a Tcl wrapper for a library written in C. It must run on Windows, Linux, and macOS. My last experience in C development was a good
    30 years ago, but with AI, I already have the wrapper running on Windows
    and Linux.

    The library uses wchar_t for strings. So I need to convert between these and Tcl UTF-8 strings. The AI systems can generate me appropriate functions that (should) run on all three platforms.

    However, I wondered whether the Tcl C API already has appropriate conversion functions that work on all platforms. The problem is unlikely to be new.
    But I couldn't find anything.

    Have I overlooked something, or do I actually have to implement my own conversion functions?

    Torsten
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Harald Oehlmann@wortkarg3@yahoo.com to comp.lang.tcl on Fri Nov 28 16:19:38 2025
    From Newsgroup: comp.lang.tcl

    Am 28.11.2025 um 15:41 schrieb Torsten:
    Hi,

    I am working on a Tcl wrapper for a library written in C. It must run on Windows, Linux, and macOS. My last experience in C development was a good
    30 years ago, but with AI, I already have the wrapper running on Windows
    and Linux.

    The library uses wchar_t for strings. So I need to convert between these
    and
    Tcl UTF-8 strings. The AI systems can generate me appropriate functions
    that
    (should) run on all three platforms.

    However, I wondered whether the Tcl C API already has appropriate
    conversion
    functions that work on all platforms. The problem is unlikely to be new.
    But I couldn't find anything.

    Have I overlooked something, or do I actually have to implement my own conversion functions?

    Torsten

    Yes, of cause, everything is there ;-).
    Is it TCL 9?
    Remark, that TCL 8.6 uses internally CESU-16, what is your format

    In Tcl 9:
    str=Tcl_GetStringFromObject(Obj,&size);
    Tcl_UtfToWCharDString(str, size, dsPtr)

    See also the migration wiki, which gives more options.

    Harald
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Torsten@Torsten@example.com to comp.lang.tcl on Fri Nov 28 19:14:11 2025
    From Newsgroup: comp.lang.tcl

    -------- Original-Nachricht --------
    Am 28.11.2025 um 15:41 schrieb Torsten:
    Hi,

    I am working on a Tcl wrapper for a library written in C. It must run on
    Windows, Linux, and macOS. My last experience in C development was a good
    30 years ago, but with AI, I already have the wrapper running on Windows
    and Linux.

    The library uses wchar_t for strings. So I need to convert between these and >> Tcl UTF-8 strings. The AI systems can generate me appropriate functions that >> (should) run on all three platforms.

    However, I wondered whether the Tcl C API already has appropriate conversion >> functions that work on all platforms. The problem is unlikely to be new.
    But I couldn't find anything.

    Have I overlooked something, or do I actually have to implement my own
    conversion functions?

    Torsten

    Yes, of cause, everything is there ;-).
    Is it TCL 9?
    Remark, that TCL 8.6 uses internally CESU-16, what is your format

    In Tcl 9:
    str=Tcl_GetStringFromObject(Obj,&size);
    Tcl_UtfToWCharDString(str, size, dsPtr)

    See also the migration wiki, which gives more options.

    Harald

    Thank you for the answer. I'm still on 8.6.
    So the options are to stay with 8.6 with self-written functions or switch to 9.

    Torsten
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From undroidwish@undroidwish@googlemail.com to comp.lang.tcl on Fri Nov 28 21:43:39 2025
    From Newsgroup: comp.lang.tcl

    On 11/28/25 19:14, Torsten wrote:

    ...
    Thank you for the answer. I'm still on 8.6.
    So the options are to stay with 8.6 with self-written functions or
    switch to 9.

    These things are somehow complicated. So please compare
    sizeof(wchar_t) between Win32 and !Win32 (ROW, e.g. POSIX).
    It could be that you'll find a significant difference which
    is 16 bit wide between these worlds. The consequences are
    then representing beyond BMP stuff like the much loved emojis.

    Good luck,
    Christian
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Harald Oehlmann@wortkarg3@yahoo.com to comp.lang.tcl on Sun Nov 30 18:32:54 2025
    From Newsgroup: comp.lang.tcl

    Am 28.11.2025 um 19:14 schrieb Torsten:
    -------- Original-Nachricht --------
    Am 28.11.2025 um 15:41 schrieb Torsten:
    Hi,

    I am working on a Tcl wrapper for a library written in C. It must run on >>> Windows, Linux, and macOS. My last experience in C development was a
    good
    30 years ago, but with AI, I already have the wrapper running on Windows >>> and Linux.

    The library uses wchar_t for strings. So I need to convert between
    these and
    Tcl UTF-8 strings. The AI systems can generate me appropriate
    functions that
    (should) run on all three platforms.

    However, I wondered whether the Tcl C API already has appropriate
    conversion
    functions that work on all platforms. The problem is unlikely to be new. >>> But I couldn't find anything.

    Have I overlooked something, or do I actually have to implement my own
    conversion functions?

    Torsten

    Yes, of cause, everything is there ;-).
    Is it TCL 9?
    Remark, that TCL 8.6 uses internally CESU-16, what is your format

    In Tcl 9:
    str=Tcl_GetStringFromObject(Obj,&size);
    Tcl_UtfToWCharDString(str, size, dsPtr)

    See also the migration wiki, which gives more options.

    Harald

    Thank you for the answer. I'm still on 8.6.
    So the options are to stay with 8.6 with self-written functions or
    switch to 9.

    Torsten

    Torsten,
    Remark that in tcl8.6, there are special functions for Windows to do
    that. But you are not on Windows, right ?

    But as long as you don't encode any NULL-Bytes, you may just use native
    TCL string encoding.

    And as magic Christian pointed out, Non BMP characters are there using surrogates.

    Take care,
    Harald

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Torsten@Torsten@example.com to comp.lang.tcl on Tue Dec 2 16:33:21 2025
    From Newsgroup: comp.lang.tcl



    Torsten,
    Remark that in tcl8.6, there are special functions for Windows to do that. But you are not on
    Windows, right ?

    But as long as you don't encode any NULL-Bytes, you may just use native TCL string encoding.

    And as magic Christian pointed out, Non BMP characters are there using surrogates.

    Take care,
    Harald

    Hi Harald,

    The program runs on Windows, Linux, and macOS. Windows and Linux are already working; I haven't
    tested macOS yet.

    Gemini used the special functions for Windows. I just need to convert the error messages from the
    library. The character strings should mainly contain ASCII characters. I don't expect any surprises
    here.

    Regards,
    Torsten
    --- Synchronet 3.21a-Linux NewsLink 1.2