Forum: War Ensemble BBS

Re: Unicode...

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Tue Nov 18 20:40:58 2025

From Newsgroup: comp.lang.c

On Tue, 18 Nov 2025 20:17:24 -0000 (UTC), Michael Sanders wrote:

If you manage an improvement, please do post it here in the group
so I can learn more too.

/*
* Robust UTF-8 capability test?
*
* This test checks:
* 1. Locale reports UTF-8
* 2. Wide-character conversion works (mbrtowc)
* 3. Terminal accepts UTF-8 output (optional: write test char)
*
* Result returned:
* 0 = No UTF-8 support detected
* 1 = UTF-8 *likely* supported
*/

#include <stdio.h>
#include <string.h>
#include <locale.h>
#include <wchar.h>
#include <errno.h>

/* return 1 if UTF-8 capable, else 0 */

int utf8_capable(void) {
/* 1 check locale */
const char *loc = setlocale(LC_CTYPE, "");
if (!loc) return 0;
if (!strstr(loc, "UTF-8") && !strstr(loc, "utf8")) return 0;

/* 2 check UTF-8 decoding with mbrtowc */
{
const char *test = "✓"; /* E2 9C 93 */
wchar_t wc = 0;
mbstate_t st;
memset(&st, 0, sizeof(st));

size_t n = mbrtowc(&wc, test, strlen(test), &st);

if (n == (size_t)-1 || n == (size_t)-2) return 0; /* decode error */
if (wc == 0) return 0; /* returned null? impossible for ✓ */
}

/* 3 ensure terminal accepts UTF-8 by writing */
if (fwrite("✓", 3, 1, stdout) != 1) {
return 0; /* write failed — suppress error message, just say no */
}

return 1;
}

int main(void) {
if (utf8_capable()) printf("\nUTF-8 OK\n");
else printf("\nNOT UTF-8\n");
return 0;
}
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Wed Nov 19 11:56:55 2025

From Newsgroup: comp.lang.c

Am 15.11.2025 um 20:28 schrieb Michael Sanders:

On Sat, 15 Nov 2025 06:24:39 +0100, Bonita Montero wrote:

A little bugfix and a perfect style:

#include <iostream>
#include <bit>
#include 
#include <optional>

using namespace std;

optional<size_t> utf8Width( u8string_view str )
{
size_t w = 0;
for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
if( size_t head = countl_zero( (unsigned char)~*it ); head <= 4
&& (size_t)(str.end() - it) >= head + 1 ) [[likely]]
it += head + 1;
else
return nullopt;
return w;
}

int main()
{
cout << *utf8Width( u8"Hello, 世界!" ) << endl;
}

Very nice!

#include <iostream>
#include <string_view>
#include <bit>

using namespace std;

template<bool Validate = false, typename View>
requires std::same_as<View, string_view> || std::same_as<View, u8string_view>
size_t utf8Width( View str )
{
size_t rem = str.end() - str.begin(), w = 0, chunk;
for( auto it = str.begin(); rem; rem -= chunk, ++w ) [[likely]]
{
chunk = countl_one( (unsigned char)*it ) + 1;
if constexpr( Validate )
if( (*it & 0xC0) == 0x80 || chunk > 5 || rem < chunk ) [[unlikely]]
return -1;
auto end = it + chunk;
if constexpr( !Validate )
it = end;
else
while( ++it != end )
if( (unsigned char)(*it & 0xC0) != 0x80 )
return -1;
}
return w;
}

int main()
{
char8_t strU8[] = u8"Hello, 世界!";
string_view sv( (char *)strU8 );
cout << utf8Width<false>( sv ) << endl;
cout << utf8Width<true>( sv ) << endl;
u8string_view svU8( strU8 );
cout << utf8Width<false>( svU8 ) << endl;
cout << utf8Width<true>( svU8 ) << endl;
}

Even cooler. Now the code accepts usual string_views as well as u8string_views.
And if you supply a boolean temlpate parameter before the ()-parameter which
is true the data is verified to be a valid UTF-8 string. If you supply false
or omit the parameter the string isn't valiedated.

--- Synchronet 3.21a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Wed Nov 19 09:08:10 2025

From Newsgroup: comp.lang.c

On 2025-11-18 15:17, Michael Sanders wrote:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode locale
contains "UTF-8"? Do you know what the domain of applicability of that
document is? It apparently does not cover my Ubuntu Linux system. The
command "locale -a" provides a list of all supported locales. Here's
what it says:

[...]

Hi James, umm 'guarantees'? No no... It does NOT verify:

- whether the environment actually supports UTF8 fully
- whether multibyte functions are enabled
- whether the terminal supports UTF8
- whether the C library supports UTF8 normalization
(combining characters, etc. but it seems to work well here)

To be sure: It's not a UTF-8 capability test. It's only a
locale-string check. So it likely misses many valid UTF8
locale variants...

If intended for use by anyone other than yourself, you should document
it's limitations in that regard, either with in-code comments or in user documentation.

Here I'm running any mixture of: Windows/BSD/Linix Mint LMDE.

The best I can tell you at this stage is that it works on my end,
not a very satisfying reply I'm sure you'd agree. But till I learn
more about the issue that's the best I can offer.

If you manage an improvement, please do post it here in the group
so I can learn more too.

There might be documents specifying locale naming standards, but I'm not
aware of any. In the absence of such standards, or on systems not
covered by such standards, there's not much you can do about this.

If your targets include Linux Mint, there's a chance the locale names
might be similar to those on my Ubuntu Linux system - but I'm no expert
on the differences between Linux distributions. If so, you should make
the "UTF" search case-insensitive, and make the '-' optional, which
would add considerable complexity to what is currently a very simple
routine.

I'm curious - if you're interested in Unicode, why are you not making
any use of the Unicode support available in the current version of C?
Does your code need to work under older versions of C?

Since C2023, a conforming implementation of C is required to support
character constants and string literals that use UTF-8, UTF-16, and
UTF-32 encodings when prefixed with u8, u or U, respectively. Those use
the char8_t, char16_t, and char32_t types. Also new in C2023 is
mbrtoc8() and c8rtomb().
Those prefixes and types go back to C2011, where it was optional whether
they used those encodings. There were pre#defined macros which could be
queried to determine whether or not they did. Routines for converting
between those types and multi-byte strings or wchar_t also go back to
that time.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael =?ISO-8859-1?Q?B=E4uerle?=@michael.baeuerle@stz-e.de to comp.lang.c on Wed Nov 19 15:29:37 2025

From Newsgroup: comp.lang.c

James Kuyper wrote:

On 2025-11-18 15:17, Michael Sanders wrote:

On Tue, 18 Nov 2025 14:27:53 -0500, James Kuyper wrote:

Could you identify which document guarantees that every Unicode locale contains "UTF-8"? Do you know what the domain of applicability of that document is? It apparently does not cover my Ubuntu Linux system. The command "locale -a" provides a list of all supported locales. Here's
what it says:

[...]

Hi James, umm 'guarantees'? No no... It does NOT verify:

- whether the environment actually supports UTF8 fully
- whether multibyte functions are enabled
- whether the terminal supports UTF8
- whether the C library supports UTF8 normalization
(combining characters, etc. but it seems to work well here)

To be sure: It's not a UTF-8 capability test. It's only a
locale-string check. So it likely misses many valid UTF8
locale variants...

If intended for use by anyone other than yourself, you should document
it's limitations in that regard, either with in-code comments or in user documentation.

Here I'm running any mixture of: Windows/BSD/Linix Mint LMDE.

The best I can tell you at this stage is that it works on my end,
not a very satisfying reply I'm sure you'd agree. But till I learn
more about the issue that's the best I can offer.

If you manage an improvement, please do post it here in the group
so I can learn more too.

There might be documents specifying locale naming standards, but I'm not aware of any. [...]

POSIX.1-2024 documents one for the XSI extension in Section 8.2: <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap08.html#tag_08_02>
|
| If the locale value has the form:
|
| language[_territory][.codeset]
|
| it refers to an implementation-provided locale, where settings of
| language, territory, and codeset are implementation-defined.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Wed Nov 19 19:22:52 2025

From Newsgroup: comp.lang.c

On Wed, 19 Nov 2025 09:08:10 -0500, James Kuyper wrote:

If your targets include Linux Mint, there's a chance the locale names
might be similar to those on my Ubuntu Linux system - but I'm no expert
on the differences between Linux distributions. If so, you should make
the "UTF" search case-insensitive, and make the '-' optional, which
would add considerable complexity to what is currently a very simple
routine.

Thanks for the info. Cases-insensitive routines were posted in this
sub-thread yesterday. But I'll repost it here.

James straight up, I'm simply trying to learn. I appreciate your
comments but I need examples of what to do - that's how my mind works...

/*
* Robust UTF-8 capability test?
*
* This test checks:
* 1. Locale reports UTF-8
* 2. Wide-character conversion works (mbrtowc)
* 3. Terminal accepts UTF-8 output (optional: write test char)
*
* Result returned:
* 0 = No UTF-8 support detected
* 1 = UTF-8 *likely* supported
*/

#include <stdio.h>
#include <string.h>
#include <locale.h>
#include <wchar.h>
#include <errno.h>

/* return 1 if UTF-8 capable, else 0 */

int utf8_capable(void) {
/* 1 check locale */
const char *loc = setlocale(LC_CTYPE, "");
if (!loc) return 0;
if (!strstr(loc, "UTF-8") && !strstr(loc, "utf8")) return 0;

/* 2 check UTF-8 decoding with mbrtowc */
{
const char *test = "✓"; /* E2 9C 93 */
wchar_t wc = 0;
mbstate_t st;
memset(&st, 0, sizeof(st));

size_t n = mbrtowc(&wc, test, strlen(test), &st);

if (n == (size_t)-1 || n == (size_t)-2) return 0; /* decode error */
if (wc == 0) return 0; /* returned null? impossible for ✓ */
}

/* 3 ensure terminal accepts UTF-8 by writing */
if (fwrite("✓", 3, 1, stdout) != 1) {
return 0; /* write failed — suppress error message, just say no */
}

return 1;
}

int main(void) {
if (utf8_capable()) printf("\nUTF-8 OK\n");
else printf("\nNOT UTF-8\n");
return 0;
}
--
:wq
Mike Sanders

--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Fri Nov 21 02:21:19 2025

From Newsgroup: comp.lang.c

On Wed, 19 Nov 2025 11:56:55 +0100, Bonita Montero wrote:

[...]

Even cooler. Now the code accepts usual string_views as well as u8string_views.
And if you supply a boolean temlpate parameter before the ()-parameter which is true the data is verified to be a valid UTF-8 string. If you supply false or omit the parameter the string isn't valiedated.

Hi Bonita! These are nice c++/c examples you've provided.

Thanks for your input, I appreciate your code & remarks.
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Nov 21 11:10:39 2025

From Newsgroup: comp.lang.c

Am 21.11.2025 um 03:21 schrieb Michael Sanders:

Hi Bonita! These are nice c++/c examples you've provided.
Thanks for your input, I appreciate your code & remarks.

That's an even cooler solution with AVX-512 and without validation.

size_t utf8Width( const char *s )
{
__m512i
zero = _mm512_setzero_si512(),
oneMask = _mm512_set1_epi8( (char)0x80 ),
oneHead = zero,
twoMask = _mm512_set1_epi8( (char)0xE0 ),
twoHead = _mm512_set1_epi8( (char)0xC0 ),
threeMask = _mm512_set1_epi8( (char)0xF0 ),
threeHead = _mm512_set1_epi8( (char)0xE0 ),
fourMask = _mm512_set1_epi8( (char)0xF8 ),
fourHead = _mm512_set1_epi8( (char)0xF0 );
uintptr_t
up = (uintptr_t)s,
base = up & ~(uintptr_t)63;
unsigned skip = (unsigned)(up - base);
s = (char *)base;
size_t count = 0;
size_t i = 0;
uint64_t nzMatches;
do
{
__m512i chunk = _mm512_loadu_si512( (void *)s );
nzMatches = ~(_mm512_cmpeq_epi8_mask( chunk, zero ) >> skip);
nzMatches = nzMatches != -1 ? (1ull << countr_one( nzMatches ))
- 1 : -1;
uint64_t
one = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, oneMask ), oneHead ) >> skip & nzMatches,
two = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, twoMask ), twoHead ) >> skip & nzMatches,
three = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, threeMask ), threeHead ) >> skip & nzMatches,
four = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, fourMask ), fourHead ) >> skip & nzMatches;
count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two ) + _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
skip = 0;
} while( nzMatches == -1 );
return count;
}

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Fri Nov 21 17:03:10 2025

From Newsgroup: comp.lang.c

On 15/11/2025 05:24, Bonita Montero wrote:

A little bugfix and a perfect style:

#include <iostream>
#include <bit>
#include 
#include <optional>

using namespace std;

optional<size_t> utf8Width( u8string_view str )
{
size_t w = 0;
for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
if( size_t head = countl_zero( (unsigned char)~*it ); head <= 4 && (size_t)(str.end() - it) >= head + 1 ) [[likely]]
it += head + 1;
else
return nullopt;
return w;
}

int main()
{
cout << *utf8Width( u8"Hello, 世界!" ) << endl;
}

The trouble with this is that I haven't a clue how it works or what
those extras do, or how they impact on performance.

A version in C is given below. This is much more straightforward. It
doesn't verify anything, but then I don't know if yours does either.

As for performance: I duplicated that test string to form one 104 times
as long, then called that function one million times. Here are the timings:

C gcc-O2 1.06 seconds
C bcc 1.17 seconds
C tcc 2.81 seconds

C++ g++-O2 4.6 seconds
C++ g++-O0 19 seconds

--------------------------

size_t utf8width(char* s) {
size_t length;
int c, n;

length=0;
while (c=*s) {
if ((c & 0x80) == 0) n = 1;
else if ((c & 0xE0) == 0xC0) n = 2;
else if ((c & 0xF0) == 0xE0) n = 3;
else n = 4;
s += n;
++length;
}
return length;
}

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Fri Nov 21 17:39:45 2025

From Newsgroup: comp.lang.c

On Fri, 21 Nov 2025 17:03:10 +0000, bart wrote:

size_t utf8width(char* s) {
size_t length;
int c, n;

length=0;
while (c=*s) {
if ((c & 0x80) == 0) n = 1;
else if ((c & 0xE0) == 0xC0) n = 2;
else if ((c & 0xF0) == 0xE0) n = 3;
else n = 4;
s += n;
++length;
}
return length;
}

A variant based on your take:

size_t utf8width(char *s) {
size_t len = 0;
unsigned char c;

while ((c = (unsigned char)*s))
s += (c < 0x80) ? 1 :
(c < 0xE0) ? 2 :
(c < 0xF0) ? 3 : 4,
++len;

return len;
}
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Nov 22 06:39:20 2025

From Newsgroup: comp.lang.c

Am 21.11.2025 um 18:03 schrieb bart:

On 15/11/2025 05:24, Bonita Montero wrote:

A little bugfix and a perfect style:

#include <iostream>
#include <bit>
#include 
#include <optional>

using namespace std;

optional<size_t> utf8Width( u8string_view str )
{
 size_t w = 0;
 for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
 if( size_t head = countl_zero( (unsigned char)~*it ); head
<= 4 && (size_t)(str.end() - it) >= head + 1 ) [[likely]]
 it += head + 1;
 else
 return nullopt;
 return w;
}

int main()
{
 cout << *utf8Width( u8"Hello, 世界!" ) << endl;
}

The trouble with this is that I haven't a clue how it works or what
those extras do, or how they impact on performance.

A version in C is given below. This is much more straightforward. It
doesn't verify anything, but then I don't know if yours does either.

As for performance: I duplicated that test string to form one 104
times as long, then called that function one million times. Here are
the timings:

C gcc-O2 1.06 seconds
C bcc 1.17 seconds
C tcc 2.81 seconds

C++ g++-O2 4.6 seconds
C++ g++-O0 19 seconds

--------------------------

size_t utf8width(char* s) {
 size_t length;
 int c, n;

 length=0;
 while (c=*s) {
 if ((c & 0x80) == 0) n = 1;
 else if ((c & 0xE0) == 0xC0) n = 2;
 else if ((c & 0xF0) == 0xE0) n = 3;
 else n = 4;
 s += n;
 ++length;
 }
 return length;
}

Take a string of a number of UTF-8 characters with a proper
mixed chunk-lengths.

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Sat Nov 22 11:55:32 2025

From Newsgroup: comp.lang.c

On 22/11/2025 05:39, Bonita Montero wrote:

Am 21.11.2025 um 18:03 schrieb bart:

On 15/11/2025 05:24, Bonita Montero wrote:

A little bugfix and a perfect style:

#include <iostream>
#include <bit>
#include 
#include <optional>

using namespace std;

optional<size_t> utf8Width( u8string_view str )
{
 size_t w = 0;
 for( auto it = str.begin(); it != str.end(); ++w ) [[likely]]
 if( size_t head = countl_zero( (unsigned char)~*it ); head >>> <= 4 && (size_t)(str.end() - it) >= head + 1 ) [[likely]]
 it += head + 1;
 else
 return nullopt;
 return w;
}

int main()
{
 cout << *utf8Width( u8"Hello, 世界!" ) << endl;
}

The trouble with this is that I haven't a clue how it works or what
those extras do, or how they impact on performance.

A version in C is given below. This is much more straightforward. It
doesn't verify anything, but then I don't know if yours does either.

As for performance: I duplicated that test string to form one 104
times as long, then called that function one million times. Here are
the timings:

C gcc-O2 1.06 seconds
C bcc 1.17 seconds
C tcc 2.81 seconds

C++ g++-O2 4.6 seconds
C++ g++-O0 19 seconds

--------------------------

size_t utf8width(char* s) {
 size_t length;
 int c, n;

 length=0;
 while (c=*s) {
 if ((c & 0x80) == 0) n = 1;
 else if ((c & 0xE0) == 0xC0) n = 2;
 else if ((c & 0xF0) == 0xE0) n = 3;
 else n = 4;
 s += n;
 ++length;
 }
 return length;
}

Take a string of a number of UTF-8 characters with a proper
mixed chunk-lengths.

OK. I took the Wikipedia article on China, /in Chinese/ and extract the
source text.

This was a file of 2179489 bytes. I got these results:

My C: 1969415 chars (reading text from a file)
Your C++: 1956525 chars (embedded u8"" string)

There's a small discrepancy, so I used an online service, which gave me:

1965068 chars

My figure is somewhat closer than yours. However this site also reported
the size of the pasted-in text slightly differently, so I looked for a source-code version, which I found at:

https://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html

That gave me:

1969415 chars

Which is exactly my figure.

BTW, repeating the count 1000 times, took this amount of time:

C++ -O2 8.5 seconds
C gcc-O2 2.1 seconds (my version)
C gcc-O2 0.4 seconds (fast online version)
C tcc 1.1 seconds (fast version)

So even Tiny C wipes the floor with optimised C++ with its fancy
libraries and advanced string types, and using ordinary C strings and
standard C (that __builtin_prefetch line wasn't used).
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Nov 22 14:10:46 2025

From Newsgroup: comp.lang.c

This code with AVX512BW and BMI1 is 13,5 times faster than yours on my Zen4-PC.

size_t utf8Width2( const char *s )

{
__m512i const
ZERO = _mm512_setzero_si512(),
ONE_MASK = _mm512_set1_epi8( (char)0x80 ),
ONE_HEAD = ZERO,
TWO_MASK = _mm512_set1_epi8( (char)0xE0 ),
TWO_HEAD = _mm512_set1_epi8( (char)0xC0 ),
THREE_MASK = _mm512_set1_epi8( (char)0xF0 ),
THREE_HEAD = _mm512_set1_epi8( (char)0xE0 ),
FOUR_MASK = _mm512_set1_epi8( (char)0xF8 ),
FOUR_HEAD = _mm512_set1_epi8( (char)0xF0 );
uintptr_t
begin = (uintptr_t)s,
base = begin & -64;
s = (char *)base;
size_t count = 0;
__m512i chunk;
uint64_t nzMask;
auto doChunk = [&]() L_FORCEINLINE
{
uint64_t
one = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
two = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
three = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
four = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two ) + _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
};
chunk = _mm512_loadu_si512( s );
unsigned head = (unsigned)(begin - base);
nzMask = ~_mm512_cmpeq_epi8_mask( chunk, ZERO ) >> head;
unsigned ones = countr_one( nzMask );
nzMask &= ones < 64 ? (1ull << ones) - 1 : -1;
nzMask <<= head;
doChunk();
if( (int64_t)nzMask >= 0 )
return count;
for( ; ; )
{
s += 64;
chunk = _mm512_loadu_si512( s );
nzMask = ~_mm512_cmpeq_epi8_mask( chunk, ZERO );
ones = countr_one( nzMask );
nzMask = ones < 64 ? (1ull << ones) - 1 : -1;
if( !nzMask )
break;
doChunk();
}
return count;
}
--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Sat Nov 22 13:38:27 2025

From Newsgroup: comp.lang.c

On 22/11/2025 13:10, Bonita Montero wrote:

This code with AVX512BW and BMI1 is 13,5 times faster than yours on my Zen4-PC.

size_t utf8Width2( const char *s )

{
__m512i const
ZERO = _mm512_setzero_si512(),
ONE_MASK = _mm512_set1_epi8( (char)0x80 ),
ONE_HEAD = ZERO,
TWO_MASK = _mm512_set1_epi8( (char)0xE0 ),
TWO_HEAD = _mm512_set1_epi8( (char)0xC0 ),
THREE_MASK = _mm512_set1_epi8( (char)0xF0 ),
THREE_HEAD = _mm512_set1_epi8( (char)0xE0 ),
FOUR_MASK = _mm512_set1_epi8( (char)0xF8 ),
FOUR_HEAD = _mm512_set1_epi8( (char)0xF0 );
uintptr_t
begin = (uintptr_t)s,
base = begin & -64;
s = (char *)base;
size_t count = 0;
__m512i chunk;
uint64_t nzMask;
auto doChunk = [&]() L_FORCEINLINE
{
uint64_t
one = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
two = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
three = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
four = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two ) + _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
};
chunk = _mm512_loadu_si512( s );
unsigned head = (unsigned)(begin - base);
nzMask = ~_mm512_cmpeq_epi8_mask( chunk, ZERO ) >> head;
unsigned ones = countr_one( nzMask );
nzMask &= ones < 64 ? (1ull << ones) - 1 : -1;
nzMask <<= head;
doChunk();
if( (int64_t)nzMask >= 0 )
return count;
for( ; ; )
{
s += 64;
chunk = _mm512_loadu_si512( s );
nzMask = ~_mm512_cmpeq_epi8_mask( chunk, ZERO );
ones = countr_one( nzMask );
nzMask = ones < 64 ? (1ull << ones) - 1 : -1;
if( !nzMask )
break;
doChunk();
}
return count;
}

Doesn't compile, even after I add suitable *intrin headers.

I took out L_FORCEINLINE (not recognised); added std:: to countr_one,
but it still gave me errors like this:

C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h: In
lambda function: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
42 | _mm_popcnt_u64 (unsigned long long __X)
| ^~~~~~~~~~~~~~

You have to give complete compilable code or have only simple
dependencies like stdio.h.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Nov 22 15:08:16 2025

From Newsgroup: comp.lang.c

Am 22.11.2025 um 14:38 schrieb bart:

Doesn't compile, even after I add suitable *intrin headers.
I took out L_FORCEINLINE (not recognised); added std:: to countr_one,
but it still gave me errors like this: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h: In
lambda function: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
42 | _mm_popcnt_u64 (unsigned long long __X)
| ^~~~~~~~~~~~~~
You have to give complete compilable code or have only simple
dependencies like stdio.h.

Try __attribute__((always_inline)) instead. The code requires enabled
AVX512 compilation
with g++ and a AVX512-compatible CPU (Intel since Skylake-X Xeons, AMD
since Zen4).
If you want to test for an older CPU you can stick with the below code,
which is AVX2.
On my CPU this is only seven times faster than yours. AVX-512 really
rocks the house.

size_t utf8Width256( const char *s )
{
__m256i const
ZERO = _mm256_setzero_si256(),
ONE_MASK = _mm256_set1_epi8( (char)0x80 ),
ONE_HEAD = ZERO,
TWO_MASK = _mm256_set1_epi8( (char)0xE0 ),
TWO_HEAD = _mm256_set1_epi8( (char)0xC0 ),
THREE_MASK = _mm256_set1_epi8( (char)0xF0 ),
THREE_HEAD = _mm256_set1_epi8( (char)0xE0 ),
FOUR_MASK = _mm256_set1_epi8( (char)0xF8 ),
FOUR_HEAD = _mm256_set1_epi8( (char)0xF0 );
uintptr_t
begin = (uintptr_t)s,
base = begin & -32;
s = (char *)base;
size_t count = 0;
__m256i chunk;
uint32_t nzMask;
auto doChunk = [&]() L_FORCEINLINE
{
uint32_t
one = _mm256_movemask_epi8( _mm256_cmpeq_epi8( _mm256_and_si256( chunk, ONE_MASK ), ONE_HEAD ) ) & nzMask,
two = _mm256_movemask_epi8( _mm256_cmpeq_epi8( _mm256_and_si256( chunk, TWO_MASK ), TWO_HEAD ) ) & nzMask,
three = _mm256_movemask_epi8( _mm256_cmpeq_epi8( _mm256_and_si256( chunk, THREE_MASK ), THREE_HEAD ) ) & nzMask,
four = _mm256_movemask_epi8( _mm256_cmpeq_epi8( _mm256_and_si256( chunk, FOUR_MASK ), FOUR_HEAD ) ) & nzMask;
count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two ) + _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
};

chunk = _mm256_loadu_si256( (__m256i *)s );
unsigned head = (unsigned)(begin - base);
nzMask = ~_mm256_movemask_epi8( _mm256_cmpeq_epi8( chunk, ZERO ) )

head;

unsigned ones = countr_one( nzMask );
nzMask &= ones < 32 ? (1ull << ones) - 1 : -1;
nzMask <<= head;
doChunk();
if( (int32_t)nzMask >= 0 )
return count;
for( ; ; )
{
s += 32;
chunk = _mm256_loadu_si256( (__m256i *)s );
nzMask = ~_mm256_movemask_epi8( _mm256_cmpeq_epi8( chunk, ZERO ) );
ones = countr_one( nzMask );
nzMask = ones < 32 ? (1ull << ones) - 1 : -1;
if( !nzMask )
break;
doChunk();
}
return count;
}

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Sat Nov 22 14:28:11 2025

From Newsgroup: comp.lang.c

On 22/11/2025 14:08, Bonita Montero wrote:

Am 22.11.2025 um 14:38 schrieb bart:

Doesn't compile, even after I add suitable *intrin headers.
I took out L_FORCEINLINE (not recognised); added std:: to countr_one,
but it still gave me errors like this:
C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h: In
lambda function:
C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int
_mm_popcnt_u64(long long unsigned int)': target specific option mismatch
42 | _mm_popcnt_u64 (unsigned long long __X)
| ^~~~~~~~~~~~~~
You have to give complete compilable code or have only simple
dependencies like stdio.h.

Try __attribute__((always_inline)) instead. The code requires enabled
AVX512 compilation
with g++ and a AVX512-compatible CPU (Intel since Skylake-X Xeons, AMD
since Zen4).
If you want to test for an older CPU you can stick with the below code, which is AVX2.

Still doesn't work. I'm using g++ 14.1.0. It doesn't like 'countr_one'
with or without std::

Would it hurt to post a complete, compilable program? Plus the compiler invocation if it needs anything unusual.

It only needs a minimal main() routine which I can tweak to my test
input. Unless all it needs to use it is a call to utf8Width("abc") which returns a simple integer.

But ATM my C version is still faster!

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Nov 22 15:51:07 2025

From Newsgroup: comp.lang.c

Am 22.11.2025 um 15:28 schrieb bart:

Still doesn't work. I'm using g++ 14.1.0. It doesn't like 'countr_one'
with or without std::

-std=c++20

Would it hurt to post a complete, compilable program? Plus the
compiler invocation if it needs anything unusual.

I'm using Visual C++ or clang-cl (MSVC-compatible clang).
I guess with g++ / clang you need <x86intrin.h>

It only needs a minimal main() routine which I can tweak to my test
input. Unless all it needs to use it is a call to utf8Width("abc")
which returns a simple integer.

It works the same as your code, i.e. it takes a char-pointer.

But ATM my C version is still faster!

For sure not that fast as my AVX (seven times) / AVX-512 (13,5 times)
version.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Nov 22 16:05:51 2025

From Newsgroup: comp.lang.c

Take this and -mavx512bw and -std=c++23.

#include <iostream>
#include <string_view>
#include <bit>
#include <algorithm>
#include <random>
#include <array>
#include 
#include <chrono>
#if defined(_MSC_VER)
#include <intrin.h>
#elif defined(__GNUC__) || defined(__clang__)
#include <x86intrin.h>
#endif
#include "inline.h"

#if defined(_MSC_VER) && !defined(__clang__)
#pragma warning(disable: 26815) // dangling pointer
#endif

using namespace std;
using namespace chrono;

template<bool Validate = false, typename View>
requires std::same_as<View, string_view> || std::same_as<View, u8string_view>
NOINLINE size_t utf8Width( View str )
{
ptrdiff_t rem = str.end() - str.begin(), w = 0, width;
for( auto it = str.begin(); rem > 0; rem -= width, ++w ) [[likely]]
{
width = countl_one( (unsigned char)*it );
width += (size_t)!width;
if constexpr( Validate )
if( (*it & 0xC0) == 0x80 || width > min( 4Z, rem ) ) [[unlikely]]
return -1;
auto end = it + width;
if constexpr( !Validate )
it = end;
else
while( ++it != end )
if( (*it & 0xC0) != 0x80 )
return -1;
}
if constexpr( Validate )
if( rem )
return -1;
return w;
}

NOINLINE size_t utf8widthC( char const *str )
{
size_t length = 0, n;
for( char8_t c; (c = *str); ++length )
{
if( (c & 0x80) == 0 )
n = 1;
else if( (c & 0xE0) == 0xC0 )
n = 2;
else if( (c & 0xF0) == 0xE0 )
n = 3;
else
n = 4;
n += (size_t)!n;
str += n;
}
return length;
}

NOINLINE size_t utf8Width512( const char *s )
{
__m512i const
ZERO = _mm512_setzero_si512(),
ONE_MASK = _mm512_set1_epi8( (char)0x80 ),
ONE_HEAD = ZERO,
TWO_MASK = _mm512_set1_epi8( (char)0xE0 ),
TWO_HEAD = _mm512_set1_epi8( (char)0xC0 ),
THREE_MASK = _mm512_set1_epi8( (char)0xF0 ),
THREE_HEAD = _mm512_set1_epi8( (char)0xE0 ),
FOUR_MASK = _mm512_set1_epi8( (char)0xF8 ),
FOUR_HEAD = _mm512_set1_epi8( (char)0xF0 );
uintptr_t
begin = (uintptr_t)s,
base = begin & -64;
s = (char *)base;
size_t count = 0;
__m512i chunk;
uint64_t nzMask;
auto doChunk = [&]() L_FORCEINLINE
{
uint64_t
one = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
two = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
three = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
four = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two ) + _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
};
chunk = _mm512_loadu_si512( s );
unsigned head = (unsigned)(begin - base);
nzMask = ~_mm512_cmpeq_epi8_mask( chunk, ZERO ) >> head;
unsigned ones = countr_one( nzMask );
nzMask &= ones < 64 ? (1ull << ones) - 1 : -1;
nzMask <<= head;
doChunk();
if( (int64_t)nzMask >= 0 )
return count;
for( ; ; )
{
s += 64;
chunk = _mm512_loadu_si512( s );
nzMask = ~_mm512_cmpeq_epi8_mask( chunk, ZERO );
ones = countr_one( nzMask );
nzMask = ones < 64 ? (1ull << ones) - 1 : -1;
if( !nzMask )
break;
doChunk();
}
return count;
}

NOINLINE size_t utf8Width256( const char *s )
{
__m256i const
ZERO = _mm256_setzero_si256(),
ONE_MASK = _mm256_set1_epi8( (char)0x80 ),
ONE_HEAD = ZERO,
TWO_MASK = _mm256_set1_epi8( (char)0xE0 ),
TWO_HEAD = _mm256_set1_epi8( (char)0xC0 ),
THREE_MASK = _mm256_set1_epi8( (char)0xF0 ),
THREE_HEAD = _mm256_set1_epi8( (char)0xE0 ),
FOUR_MASK = _mm256_set1_epi8( (char)0xF8 ),
FOUR_HEAD = _mm256_set1_epi8( (char)0xF0 );
uintptr_t
begin = (uintptr_t)s,
base = begin & -32;
s = (char *)base;
size_t count = 0;
__m256i chunk;
uint32_t nzMask;
auto doChunk = [&]() L_FORCEINLINE
{
uint32_t
one = _mm256_movemask_epi8( _mm256_cmpeq_epi8( _mm256_and_si256( chunk, ONE_MASK ), ONE_HEAD ) ) & nzMask,
two = _mm256_movemask_epi8( _mm256_cmpeq_epi8( _mm256_and_si256( chunk, TWO_MASK ), TWO_HEAD ) ) & nzMask,
three = _mm256_movemask_epi8( _mm256_cmpeq_epi8( _mm256_and_si256( chunk, THREE_MASK ), THREE_HEAD ) ) & nzMask,
four = _mm256_movemask_epi8( _mm256_cmpeq_epi8( _mm256_and_si256( chunk, FOUR_MASK ), FOUR_HEAD ) ) & nzMask;
count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two ) + _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
};
chunk = _mm256_loadu_si256( (__m256i *)s );
unsigned head = (unsigned)(begin - base);
nzMask = ~_mm256_movemask_epi8( _mm256_cmpeq_epi8( chunk, ZERO ) )

head;

unsigned ones = countr_one( nzMask );
nzMask &= ones < 32 ? (1ull << ones) - 1 : -1;
nzMask <<= head;
doChunk();
if( (int32_t)nzMask >= 0 )
return count;
for( ; ; )
{
s += 32;
chunk = _mm256_loadu_si256( (__m256i *)s );
nzMask = ~_mm256_movemask_epi8( _mm256_cmpeq_epi8( chunk, ZERO ) );
ones = countr_one( nzMask );
nzMask = ones < 32 ? (1ull << ones) - 1 : -1;
if( !nzMask )
break;
doChunk();
}
return count;
}

int main()
{
constexpr unsigned
TYPE1_BITS = 7,
TYPE2_BITS = 11,
TYPE3_BITS = 16,
TYPE4_BITS = 21;
constexpr char32_t
TYPE1_END = 1 << TYPE1_BITS,
TYPE2_END = 1 << TYPE2_BITS,
TYPE3_END = 1 << TYPE3_BITS,
TYPE4_END = 1 << TYPE4_BITS;
using urand = uniform_int_distribution<unsigned>;
mt19937_64 mt;
uniform_int_distribution<size_t> rndWidth( 1, 4 );
urand rawRanges[4] =
{
urand( 1, TYPE1_END - 1 ),
urand( TYPE1_END, TYPE2_END - 1 ),
urand( TYPE2_END, TYPE3_END - 1 ),
urand( TYPE3_END, TYPE4_END - 1 )
};
span ranges( rawRanges );
char8_t rawTypeHeads[4] { 0, 0xC0, 0xE0, 0xF0 };
span typeHeads( rawTypeHeads );
constexpr size_t BUF_MIN = 0x10000;
u8string u8Str( BUF_MIN + 3, (char8_t)0 );
using u8s_it = u8string::iterator;
u8s_it
itChar = u8Str.begin(),
itCharEnd = itChar + BUF_MIN;
for( size_t width, type; itChar < itCharEnd; itChar += width )
{
width = rndWidth( mt );
type = width - 1;
char32_t c = (ranges[type])( mt );
for( u8s_it itTail = itChar + width; --itTail > itChar; c >>= 6 )
*itTail = (char8_t)(0x80 | c & 0x3F);
*itChar = typeHeads[type] | (char8_t)c;
}
u8Str.resize( itChar - u8Str.begin() );
#if defined(NDEBUG)
constexpr size_t ROUNDS = 100'000;
#else
constexpr size_t ROUNDS = 1'000;
#endif
auto bench = [&]( char const *what, auto fn )
{
auto start = high_resolution_clock::now();
for( size_t r = ROUNDS; r; --r )
fn( u8Str );
double secs = (double)duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0e9;
cout << what << secs << endl;
};
size_t (*volatile utf8widthC)( char const * ) = ::utf8widthC;
size_t (*volatile utf8Width256)(const char *s) = ::utf8Width256;
size_t (*volatile utf8Width512)(const char *s) = ::utf8Width512;
size_t total = 0;
bench( "my: ", [&]( u8string const &str ) { total += utf8Width256( (char *)str.c_str() ); } );
bench( "nerd: ", [&]( u8string const &str ) { total += utf8widthC( (char *)str.c_str() ); } );
return (int)total;

}

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Sat Nov 22 16:35:13 2025

From Newsgroup: comp.lang.c

On 22/11/2025 15:05, Bonita Montero wrote:

Take this and -mavx512bw and -std=c++23.

#include <iostream>
#include <string_view>
#include <bit>
#include <algorithm>
#include <random>
#include <array>
#include 
#include <chrono>
#if defined(_MSC_VER)
#include <intrin.h>
#elif defined(__GNUC__) || defined(__clang__)
#include <x86intrin.h>
#endif
#include "inline.h"

I don't have 'inline.h'. If I comment that out, then I get the errors
below from 'g++ -std=c++23 prog.c', also with -Wno-inline.

Your code seems incredibly fragile.

c.cpp: In function 'size_t utf8Width512(const char*)':
c.cpp:72:37: warning: AVX512F vector return without AVX512F enabled
changes the ABI [-Wpsabi]
72 | ZERO = _mm512_setzero_si512(),
| ^
c.cpp: In function 'size_t utf8Width256(const char*)':
c.cpp:123:37: warning: AVX vector return without AVX enabled changes the
ABI [-Wpsabi]
123 | ZERO = _mm256_setzero_si256(),
| ^
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86gprintrin.h:73,
from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86intrin.h:27,
from c.cpp:13: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h: In
lambda function: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
42 | _mm_popcnt_u64 (unsigned long long __X)
| ^~~~~~~~~~~~~~
c.cpp:95:106: note: called from here
95 | count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
|
~~~~~~~~~~~~~~^~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
42 | _mm_popcnt_u64 (unsigned long long __X)
| ^~~~~~~~~~~~~~
c.cpp:95:80: note: called from here
95 | count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
|
~~~~~~~~~~~~~~^~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
42 | _mm_popcnt_u64 (unsigned long long __X)
| ^~~~~~~~~~~~~~
c.cpp:95:56: note: called from here
95 | count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
| ~~~~~~~~~~~~~~^~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
42 | _mm_popcnt_u64 (unsigned long long __X)
| ^~~~~~~~~~~~~~
c.cpp:95:32: note: called from here
95 | count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
| ~~~~~~~~~~~~~~^~~~~~~
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/immintrin.h:65,
from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86intrin.h:32: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option mismatch
1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:94:42: note: called from here
94 | four = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/immintrin.h:55: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~~
c.cpp:94:42: note: called from here
94 | four = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option mismatch
1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:93:43: note: called from here
93 | three = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~~
c.cpp:93:43: note: called from here
93 | three = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option mismatch
1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:92:41: note: called from here
92 | two = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~~
c.cpp:92:41: note: called from here
92 | two = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option mismatch
1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:91:41: note: called from here
91 | one = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
| ^~~~~~~~~~~~~~~~
c.cpp:91:41: note: called from here
91 | one = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Nov 22 18:13:07 2025

From Newsgroup: comp.lang.c

You can compile the code with -mavx512bw.
This is "inline.h":

#if !defined(INLINE_HEADER)
#define INLINE_HEADER

#if !defined(NOINLINE)
#if defined(__GNUC__) || defined(__clang__)
#define NOINLINE __attribute__((noinline))
#elif defined(_MSC_VER)
#define NOINLINE __declspec(noinline)
#elif
#define NOINLINE
#endif
#endif

#if defined(__GNUC__)
#pragma GCC diagnostic ignored "-Wattributes"
#endif

#if !defined(FORCEINLINE)
#if (defined(__GNUC__) || defined(__clang__))
#define FORCEINLINE __attribute__((always_inline)) inline
#elif defined(_MSC_VER)
#define FORCEINLINE __forceinline
#elif
#define FORCEINLINE inline
#endif
#endif

#if !defined(L_FORCEINLINE)
#if defined(__GNUC__) || defined(__clang__)
#define L_FORCEINLINE __attribute__((always_inline))
#elif defined(_MSC_VER)
#define L_FORCEINLINE [[msvc::forceinline]]
#elif
#define L_FORCEINLINE
#endif
#endif
#endif

Am 22.11.2025 um 17:35 schrieb bart:

On 22/11/2025 15:05, Bonita Montero wrote:

Take this and -mavx512bw and -std=c++23.

#include <iostream>
#include <string_view>
#include <bit>
#include <algorithm>
#include <random>
#include <array>
#include 
#include <chrono>
#if defined(_MSC_VER)
 #include <intrin.h>
#elif defined(__GNUC__) || defined(__clang__)
 #include <x86intrin.h>
#endif
#include "inline.h"

I don't have 'inline.h'. If I comment that out, then I get the errors
below from 'g++ -std=c++23 prog.c', also with -Wno-inline.

Your code seems incredibly fragile.

c.cpp: In function 'size_t utf8Width512(const char*)':
c.cpp:72:37: warning: AVX512F vector return without AVX512F enabled
changes the ABI [-Wpsabi]
 72 | ZERO = _mm512_setzero_si512(),
 | ^
c.cpp: In function 'size_t utf8Width256(const char*)':
c.cpp:123:37: warning: AVX vector return without AVX enabled changes
the ABI [-Wpsabi]
123 | ZERO = _mm256_setzero_si256(),
 | ^
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86gprintrin.h:73, from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86intrin.h:27, from c.cpp:13: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h: In
lambda function: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
 42 | _mm_popcnt_u64 (unsigned long long __X)
 | ^~~~~~~~~~~~~~
c.cpp:95:106: note: called from here
 95 | count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
 | ~~~~~~~~~~~~~~^~~~~~~~
C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
 42 | _mm_popcnt_u64 (unsigned long long __X)
 | ^~~~~~~~~~~~~~
c.cpp:95:80: note: called from here
 95 | count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
 | ~~~~~~~~~~~~~~^~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
 42 | _mm_popcnt_u64 (unsigned long long __X)
 | ^~~~~~~~~~~~~~
c.cpp:95:56: note: called from here
 95 | count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
 | ~~~~~~~~~~~~~~^~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/popcntintrin.h:42:1:
error: inlining failed in call to 'always_inline' 'long long int _mm_popcnt_u64(long long unsigned int)': target specific option mismatch
 42 | _mm_popcnt_u64 (unsigned long long __X)
 | ^~~~~~~~~~~~~~
c.cpp:95:32: note: called from here
 95 | count += _mm_popcnt_u64( one ) + _mm_popcnt_u64( two )
+ _mm_popcnt_u64( three ) + _mm_popcnt_u64( four );
 | ~~~~~~~~~~~~~~^~~~~~~
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/immintrin.h:65, from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86intrin.h:32: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option
mismatch
1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
 | ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:94:42: note: called from here
 94 | four = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
 | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/immintrin.h:55: C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
 | ^~~~~~~~~~~~~~~~
c.cpp:94:42: note: called from here
 94 | four = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, FOUR_MASK ), FOUR_HEAD ) & nzMask;
 | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option
mismatch
1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
 | ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:93:43: note: called from here
 93 | three = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
 | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
 | ^~~~~~~~~~~~~~~~
c.cpp:93:43: note: called from here
 93 | three = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, THREE_MASK ), THREE_HEAD ) & nzMask,
 | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option
mismatch
1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
 | ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:92:41: note: called from here
 92 | two = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
 | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
 | ^~~~~~~~~~~~~~~~
c.cpp:92:41: note: called from here
 92 | two = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, TWO_MASK ), TWO_HEAD ) & nzMask,
 | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512bwintrin.h:1716:1: error: inlining failed in call to 'always_inline' '__mmask64 _mm512_cmpeq_epi8_mask(__m512i, __m512i)': target specific option
mismatch
1716 | _mm512_cmpeq_epi8_mask (__m512i __A, __m512i __B)
 | ^~~~~~~~~~~~~~~~~~~~~~
c.cpp:91:41: note: called from here
 91 | one = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
 | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C:/tdm/lib/gcc/x86_64-w64-mingw32/14.1.0/include/avx512fintrin.h:10651:1: error: inlining failed in call to 'always_inline' '__m512i _mm512_and_si512(__m512i, __m512i)': target specific option mismatch
10651 | _mm512_and_si512 (__m512i __A, __m512i __B)
 | ^~~~~~~~~~~~~~~~
c.cpp:91:41: note: called from here
 91 | one = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, ONE_MASK ), ONE_HEAD ) & nzMask,
 | ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Sat Nov 22 17:35:12 2025

From Newsgroup: comp.lang.c

On 22/11/2025 17:13, Bonita Montero wrote:

You can compile the code with -mavx512bw.
This is "inline.h":

But I now get, from:

g++ =std=c++23 -mavx512bw -O2 c.cpp

the errors shown below. I tried -fconcepts too.

So, what also do I need? (So far you're not selling C++ very well!)

---------------------------------

c.cpp:33:54: warning: use of C++23 'make_signed_t<size_t>' integer constant
33 | if( (*it & 0xC0) == 0x80 || width > min( 4Z, rem )
) [[unlikely]]
| ^~
c.cpp:24:5: error: 'requires' does not name a type
24 | requires std::same_as<View, string_view> ||
std::same_as<View, u8string_view>
| ^~~~~~~~
c.cpp:24:5: note: 'requires' only available with '-std=c++20' or
'-fconcepts'
c.cpp: In function 'size_t utf8widthC(const char*)':
c.cpp:52:10: error: 'char8_t' was not declared in this scope; did you
mean 'wchar_t'?
52 | for( char8_t c; (c = *str); ++length )
| ^~~~~~~
| wchar_t
c.cpp:52:22: error: 'c' was not declared in this scope
52 | for( char8_t c; (c = *str); ++length )
| ^
c.cpp: In function 'size_t utf8Width512(const char*)':
c.cpp:99:21: error: 'countr_one' was not declared in this scope
99 | unsigned ones = countr_one( nzMask );
| ^~~~~~~~~~
c.cpp: In function 'size_t utf8Width256(const char*)':
c.cpp:150:21: error: 'countr_one' was not declared in this scope
150 | unsigned ones = countr_one( nzMask );
| ^~~~~~~~~~
c.cpp: In function 'int main()':
c.cpp:192:5: error: 'span' was not declared in this scope
192 | span ranges( rawRanges );
| ^~~~
c.cpp:192:5: note: 'std::span' is only available from C++20 onwards c.cpp:193:5: error: 'char8_t' was not declared in this scope; did you
mean 'wchar_t'?
193 | char8_t rawTypeHeads[4] { 0, 0xC0, 0xE0, 0xF0 };
| ^~~~~~~
| wchar_t
c.cpp:194:9: error: expected ';' before 'typeHeads'
194 | span typeHeads( rawTypeHeads );
| ^~~~~~~~~~
| ;
c.cpp:196:5: error: 'u8string' was not declared in this scope
196 | u8string u8Str( BUF_MIN + 3, (char8_t)0 );
| ^~~~~~~~
c.cpp:196:5: note: 'std::u8string' is only available from C++20 onwards c.cpp:197:20: error: 'u8string' does not name a type
197 | using u8s_it = u8string::iterator;
| ^~~~~~~~
c.cpp:198:5: error: 'u8s_it' was not declared in this scope
198 | u8s_it
| ^~~~~~
c.cpp:201:30: error: 'itChar' was not declared in this scope
201 | for( size_t width, type; itChar < itCharEnd; itChar += width )
| ^~~~~~
c.cpp:201:39: error: 'itCharEnd' was not declared in this scope
201 | for( size_t width, type; itChar < itCharEnd; itChar += width )
| ^~~~~~~~~
c.cpp:205:23: error: 'ranges' was not declared in this scope; did you
mean 'rawRanges'?
205 | char32_t c = (ranges[type])( mt );
| ^~~~~~
| rawRanges
c.cpp:206:20: error: expected ';' before 'itTail'
206 | for( u8s_it itTail = itChar + width; --itTail > itChar;
c >>= 6 )
| ^~~~~~~
| ;
c.cpp:206:48: error: 'itTail' was not declared in this scope
206 | for( u8s_it itTail = itChar + width; --itTail > itChar;
c >>= 6 )
| ^~~~~~
c.cpp:208:19: error: 'typeHeads' was not declared in this scope
208 | *itChar = typeHeads[type] | (char8_t)c;
| ^~~~~~~~~
c.cpp:210:5: error: 'u8Str' was not declared in this scope
210 | u8Str.resize( itChar - u8Str.begin() );
| ^~~~~
c.cpp:210:19: error: 'itChar' was not declared in this scope
210 | u8Str.resize( itChar - u8Str.begin() );
| ^~~~~~
c.cpp:228:25: error: 'u8string' is not a type
228 | bench( "my: ", [&]( u8string const &str ) { total += utf8Width256( (char *)str.c_str() ); } );
| ^~~~~~~~
c.cpp: In lambda function:
c.cpp:228:84: error: request for member 'c_str' in 'str', which is of non-class type 'const int'
228 | bench( "my: ", [&]( u8string const &str ) { total += utf8Width256( (char *)str.c_str() ); } );
|
^~~~~
c.cpp: In function 'int main()':
c.cpp:229:27: error: 'u8string' is not a type
229 | bench( "nerd: ", [&]( u8string const &str ) { total += utf8widthC( (char *)str.c_str() ); } );
| ^~~~~~~~
c.cpp: In lambda function:
c.cpp:229:84: error: request for member 'c_str' in 'str', which is of non-class type 'const int'
229 | bench( "nerd: ", [&]( u8string const &str ) { total += utf8widthC( (char *)str.c_str() ); } );
|
^~~~~

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Sat Nov 22 17:39:19 2025

From Newsgroup: comp.lang.c

On 22/11/2025 17:35, bart wrote:

On 22/11/2025 17:13, Bonita Montero wrote:

You can compile the code with -mavx512bw.
This is "inline.h":

But I now get, from:

g++ =std=c++23 -mavx512bw -O2 c.cpp

the errors shown below. I tried -fconcepts too.

So, what also do I need? (So far you're not selling C++ very well!)

Wait, there's a "=std" in that command line instead of "-std".
Apparently it is not an error (?).

Anyway, it now compiles, and I can do some tests.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Nov 22 18:44:02 2025

From Newsgroup: comp.lang.c

A lot of errors look like that you haven't enable at C++23 properly.
Can you install a current g++ ? Maybe the newest from the repository
is sufficient.

Am 22.11.2025 um 18:35 schrieb bart:

On 22/11/2025 17:13, Bonita Montero wrote:

You can compile the code with -mavx512bw.
This is "inline.h":

But I now get, from:

g++ =std=c++23 -mavx512bw -O2 c.cpp

the errors shown below. I tried -fconcepts too.

So, what also do I need? (So far you're not selling C++ very well!)

---------------------------------

c.cpp:33:54: warning: use of C++23 'make_signed_t<size_t>' integer
constant
 33 | if( (*it & 0xC0) == 0x80 || width > min( 4Z, rem )
) [[unlikely]]
 | ^~
c.cpp:24:5: error: 'requires' does not name a type
 24 | requires std::same_as<View, string_view> || std::same_as<View, u8string_view>
 | ^~~~~~~~
c.cpp:24:5: note: 'requires' only available with '-std=c++20' or '-fconcepts'
c.cpp: In function 'size_t utf8widthC(const char*)':
c.cpp:52:10: error: 'char8_t' was not declared in this scope; did you
mean 'wchar_t'?
 52 | for( char8_t c; (c = *str); ++length )
 | ^~~~~~~
 | wchar_t
c.cpp:52:22: error: 'c' was not declared in this scope
 52 | for( char8_t c; (c = *str); ++length )
 | ^
c.cpp: In function 'size_t utf8Width512(const char*)':
c.cpp:99:21: error: 'countr_one' was not declared in this scope
 99 | unsigned ones = countr_one( nzMask );
 | ^~~~~~~~~~
c.cpp: In function 'size_t utf8Width256(const char*)':
c.cpp:150:21: error: 'countr_one' was not declared in this scope
150 | unsigned ones = countr_one( nzMask );
 | ^~~~~~~~~~
c.cpp: In function 'int main()':
c.cpp:192:5: error: 'span' was not declared in this scope
192 | span ranges( rawRanges );
 | ^~~~
c.cpp:192:5: note: 'std::span' is only available from C++20 onwards c.cpp:193:5: error: 'char8_t' was not declared in this scope; did you
mean 'wchar_t'?
193 | char8_t rawTypeHeads[4] { 0, 0xC0, 0xE0, 0xF0 };
 | ^~~~~~~
 | wchar_t
c.cpp:194:9: error: expected ';' before 'typeHeads'
194 | span typeHeads( rawTypeHeads );
 | ^~~~~~~~~~
 | ;
c.cpp:196:5: error: 'u8string' was not declared in this scope
196 | u8string u8Str( BUF_MIN + 3, (char8_t)0 );
 | ^~~~~~~~
c.cpp:196:5: note: 'std::u8string' is only available from C++20 onwards c.cpp:197:20: error: 'u8string' does not name a type
197 | using u8s_it = u8string::iterator;
 | ^~~~~~~~
c.cpp:198:5: error: 'u8s_it' was not declared in this scope
198 | u8s_it
 | ^~~~~~
c.cpp:201:30: error: 'itChar' was not declared in this scope
201 | for( size_t width, type; itChar < itCharEnd; itChar += width )
 | ^~~~~~ c.cpp:201:39: error: 'itCharEnd' was not declared in this scope
201 | for( size_t width, type; itChar < itCharEnd; itChar += width )
 | ^~~~~~~~~
c.cpp:205:23: error: 'ranges' was not declared in this scope; did you
mean 'rawRanges'?
205 | char32_t c = (ranges[type])( mt );
 | ^~~~~~
 | rawRanges c.cpp:206:20: error: expected ';' before 'itTail'
206 | for( u8s_it itTail = itChar + width; --itTail > itChar; c >>= 6 )
 | ^~~~~~~
 | ;
c.cpp:206:48: error: 'itTail' was not declared in this scope
206 | for( u8s_it itTail = itChar + width; --itTail > itChar; c >>= 6 )
 | ^~~~~~
c.cpp:208:19: error: 'typeHeads' was not declared in this scope
208 | *itChar = typeHeads[type] | (char8_t)c;
 | ^~~~~~~~~
c.cpp:210:5: error: 'u8Str' was not declared in this scope
210 | u8Str.resize( itChar - u8Str.begin() );
 | ^~~~~
c.cpp:210:19: error: 'itChar' was not declared in this scope
210 | u8Str.resize( itChar - u8Str.begin() );
 | ^~~~~~
c.cpp:228:25: error: 'u8string' is not a type
228 | bench( "my: ", [&]( u8string const &str ) { total += utf8Width256( (char *)str.c_str() ); } );
 | ^~~~~~~~
c.cpp: In lambda function:
c.cpp:228:84: error: request for member 'c_str' in 'str', which is of non-class type 'const int'
228 | bench( "my: ", [&]( u8string const &str ) { total += utf8Width256( (char *)str.c_str() ); } );
 | ^~~~~
c.cpp: In function 'int main()':
c.cpp:229:27: error: 'u8string' is not a type
229 | bench( "nerd: ", [&]( u8string const &str ) { total += utf8widthC( (char *)str.c_str() ); } );
 | ^~~~~~~~ c.cpp: In lambda function:
c.cpp:229:84: error: request for member 'c_str' in 'str', which is of non-class type 'const int'
229 | bench( "nerd: ", [&]( u8string const &str ) { total += utf8widthC( (char *)str.c_str() ); } );
 | ^~~~~

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Sat Nov 22 19:28:32 2025

From Newsgroup: comp.lang.c

On 22/11/2025 17:44, Bonita Montero wrote:

A lot of errors look like that you haven't enable at C++23 properly.
Can you install a current g++ ? Maybe the newest from the repository
is sufficient.

I said in a followup that I'd typed =std instead of -std, which didn't generate any error from the compiler.

But I managed to compile it. However the long program with a complicated main() just crashed trying to run it, sometime before it got to the
actual UTF8 bit.

So I applied those headers and options to the first mm512
single-function version you posted. There I only had to add std:: to
those countr.one's.

I used this test driver

int main() {
size_t n = 0;
n = utf8Width("Hello, 世界!" );
printf("%zu\n", n);
}

And it crashes inside that function.

It's all just too damn complicated, sorry. It might well be fast, but
that's no good if it is troublesome to build and run for anyone else.

Another factor is this: each build, even at -O0, takes 3 whole seconds
on my machine. That must be a huge pile of junk it is including.

Building my C version takes some 1/20th of a second (even gcc takes only
0.3 seconds).

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Nov 22 20:59:08 2025

From Newsgroup: comp.lang.c

For me the following code works:

size_t n = 0;
n = utf8Width( string_view( "Hello, 世界!" ) );
printf( "%zu\n", n );
return 0;

But this is the templated code for the Non-AVX-version.
Try utf8Width256 for the AVX version and utf8Width56
for the AVX-512 version. Do you have any IDE like CLion ?

Am 22.11.2025 um 20:28 schrieb bart:

On 22/11/2025 17:44, Bonita Montero wrote:

A lot of errors look like that you haven't enable at C++23 properly.
Can you install a current g++ ? Maybe the newest from the repository
is sufficient.

I said in a followup that I'd typed =std instead of -std, which didn't generate any error from the compiler.

But I managed to compile it. However the long program with a
complicated main() just crashed trying to run it, sometime before it
got to the actual UTF8 bit.

So I applied those headers and options to the first mm512
single-function version you posted. There I only had to add std:: to
those countr.one's.

I used this test driver

int main() {
      size_t n = 0;
      n = utf8Width("Hello, 世界!" );
      printf("%zu\n", n);
}

And it crashes inside that function.

It's all just too damn complicated, sorry. It might well be fast, but
that's no good if it is troublesome to build and run for anyone else.

Another factor is this: each build, even at -O0, takes 3 whole seconds
on my machine. That must be a huge pile of junk it is including.

Building my C version takes some 1/20th of a second (even gcc takes
only 0.3 seconds).

--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sat Nov 22 15:24:54 2025

From Newsgroup: comp.lang.c

bart <bc@freeuk.com> writes:

On 22/11/2025 17:35, bart wrote:

On 22/11/2025 17:13, Bonita Montero wrote:

You can compile the code with -mavx512bw.
This is "inline.h":

But I now get, from:
g++ =std=c++23 -mavx512bw -O2 c.cpp
the errors shown below. I tried -fconcepts too.
So, what also do I need? (So far you're not selling C++ very well!)

Wait, there's a "=std" in that command line instead of
"-std". Apparently it is not an error (?).

[...]

It seems that gcc and g++ interpret any unrecognized command line
argument as the name of a "linker input file".

BTW, comp.lang.c++ is down the hall, just past the water cooler.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Sun Nov 23 00:14:58 2025

From Newsgroup: comp.lang.c

On 22/11/2025 23:24, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 22/11/2025 17:35, bart wrote:

On 22/11/2025 17:13, Bonita Montero wrote:

You can compile the code with -mavx512bw.
This is "inline.h":

But I now get, from:
g++ =std=c++23 -mavx512bw -O2 c.cpp
the errors shown below. I tried -fconcepts too.
So, what also do I need? (So far you're not selling C++ very well!)

Wait, there's a "=std" in that command line instead of
"-std". Apparently it is not an error (?).

[...]

It seems that gcc and g++ interpret any unrecognized command line
argument as the name of a "linker input file".

It looks like it compiles any source code first, so won't get around to reporting an error if that compilation fails.

BTW, comp.lang.c++ is down the hall, just past the water cooler.

This was supposed be about comparing a C approach to C++. Except there
were problems in getting the 'fast' C++ code to compile and then to run.

I think I'll stick with the simple C version which can also be trivially ported to any language as there are no heavy dependencies.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Philipp Klaus Krause@pkk@spth.de to comp.lang.c on Sun Nov 23 12:42:20 2025

From Newsgroup: comp.lang.c

Am 14.11.25 um 22:03 schrieb Michael Sanders:

static int utf8_width(const char *s) {
int w = 0;
const unsigned char *p = (const unsigned char *)s;

while (*p) {
if (*p < 0x80) { w++; p++; } // ASCII 1-byte
else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
else { w++; p++; } // fallback
}

return w;
}

Do you need this to work under non-UTF-8 locales? If you only need that
length when the locale is UTF-8, why not just use mblen from stdlib.h?

Philipp

--- Synchronet 3.21a-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.lang.c on Sun Nov 23 13:32:18 2025

From Newsgroup: comp.lang.c

On 23/11/2025 01:14, bart wrote:

On 22/11/2025 23:24, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

On 22/11/2025 17:35, bart wrote:

On 22/11/2025 17:13, Bonita Montero wrote:

You can compile the code with -mavx512bw.
This is "inline.h":

But I now get, from:
g++ =std=c++23 -mavx512bw -O2 c.cpp
the errors shown below. I tried -fconcepts too.
So, what also do I need? (So far you're not selling C++ very well!)

Wait, there's a "=std" in that command line instead of
"-std". Apparently it is not an error (?).

[...]

It seems that gcc and g++ interpret any unrecognized command line
argument as the name of a "linker input file".

It looks like it compiles any source code first, so won't get around to reporting an error if that compilation fails.

Correct. Compile first, then link - that has to be the order of
business. So if the compilation fails, gcc or g++ (which are just
"driver" programs that start the real compiler, assembler, and linker)
doesn't get a far as trying to link, and thus doesn't get as far as
looking to see if this mysterious "=std=c++23" file exists or not.

gcc will happily complain when you give it an incorrect or unknown
option, but it has to recognise that it /is/ an option!

BTW, comp.lang.c++ is down the hall, just past the water cooler.

This was supposed be about comparing a C approach to C++. Except there
were problems in getting the 'fast' C++ code to compile and then to run.

I think I'll stick with the simple C version which can also be trivially ported to any language as there are no heavy dependencies.

Bonita has a years-long habit of interrupting C discussions with C++ distractions. I agree that sometimes a comparison to different
languages is relevant in a C discussion, but for C++ details it is
better to move over to c.l.c++.

However, this code is neither C nor C++ - it is x86 assembly, wrapped in
some C++ and a whole lot of Bonita-specific stuff. I've no idea if
there is a suitable forum for that!

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Sun Nov 23 22:05:35 2025

From Newsgroup: comp.lang.c

On Sun, 23 Nov 2025 12:42:20 +0100, Philipp Klaus Krause wrote:

Do you need this to work under non-UTF-8 locales? If you only need that length when the locale is UTF-8, why not just use mblen from stdlib.h?

Two reasons...

- mblen() does not return the display width of utf8 characters

- portability...

What I needed to accomplish was using a printf() expression where
columns were uniformly padded/aligned by the longest string
(whether utf8 or ascii) in the 1st column. Example...

123 foo
1 foo
12345 foo
12 foo

After studying *cough* stealing *cough* any/all examples I could
come across, I cobbled this composite together:

#include <stdio.h>
#include <string.h>

int utf8_display_width(const char *s) {
int w = 0;

while (*s) {
unsigned char b = *s;
unsigned cp;
int n;

// UTF-8 decoder
if (b <= 0x7F) { // 1-byte ASCII
cp = b;
n = 1;
} else if (b >= 0xC0 && b <= 0xDF) { // 2-byte
cp = ((b & 0x1F) << 6) |
(s[1] & 0x3F);
n = 2;
} else if (b >= 0xE0 && b <= 0xEF) { // 3-byte
cp = ((b & 0x0F) << 12) |
((s[1] & 0x3F) << 6) |
(s[2] & 0x3F);
n = 3;
} else if (b >= 0xF0 && b <= 0xF7) { // 4-byte
cp = ((b & 0x07) << 18) |
((s[1] & 0x3F) << 12) |
((s[2] & 0x3F) << 6) |
(s[3] & 0x3F);
n = 4;
} else { // invalid, treat as 1-byte
cp = b;
n = 1;
}

// display width
if (cp >= 0x0300 && cp <= 0x036F) {} // combining marks like é (zero width)
else if ( // double-width characters...
(cp >= 0x1100 && cp <= 0x115F) || // hangul jamo
(cp >= 0x2E80 && cp <= 0xA4CF) || // cjk radicals & unified ideographs
(cp >= 0xAC00 && cp <= 0xD7A3) || // hangul syllables
(cp >= 0xF900 && cp <= 0xFAFF) || // cjk compatibility ideographs
(cp >= 0x1F300 && cp <= 0x1FAFF) // emoji + symbols
) { w += 2; }
// exceptional wide characters (unicode requirement I've read elsewhere)
else if (cp == 0x2329 || cp == 0x232A) { w += 2; }
else { w += 1; } // normal width for everything else

s += n;
}

return w;
}

int main(void) {
const char *tests[] = {
"hello",
"Café",
"漢字",
"✓",
"🙂",
NULL
};

// find maximum display width in 1st column
int maxw = 0;
for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
if (w > maxw) maxw = w;
}

// total padding after each 1st column + 3 spaces
int total_pad = maxw + 3;

for (int i = 0; tests[i]; i++) {

int w = utf8_display_width(tests[i]);
int sl = strlen(tests[i]);
printf("%s", tests[i]);
int pad = total_pad - w;
while (pad-- > 0) putchar(' ');
printf("strlen: %d utf8 display width: %d\n", sl, w);
}

return 0;
}

// eof
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Wed Nov 26 19:42:09 2025

From Newsgroup: comp.lang.c

I've developed a UTF-8 width function with AVX-512 that can validate
for a proper number of extension bytes after the header bytes. The
validation is done with bit-masks delivered from AVX-intrinsics,
i.e. without loops.
The code accepts a basic_string_view with a chacacter widh of one
byte (all three char-types and char8_t). It's about 20 times faster
than a pure validation basing on non-vectored code.
I'll make an AVX (without 512) version so ghat you can test the code.

template<bool Validate, typename Char, typename Traits>
requires is_integral_v<Char> && (sizeof(Char) == 1)
size_t utf8Width512( basic_string_view<Char, Traits> str )
{
if( str.empty() )
return 0;
constexpr uint64_t ALL_ONES = -1;
__m512i const
oneMask = _mm512_set1_epi8( (char)0x80 ),
oneHead = _mm512_setzero_si512();
uintptr_t
uBegin = (uintptr_t)to_address( str.begin() ),
uEnd = (uintptr_t)to_address( str.end() );
using span_t = span<__m512i>;
span<__m512i> range64( (__m512i *)(uBegin & -64), (__m512i *)(uEnd
+ 63 & -64) );
size_t
head = uBegin & 63,
tail = uEnd & 63;
size_t n = 0;
uint64_t mask;
if constexpr( Validate )
{
__m512i const
extendMask = _mm512_set1_epi8( (char)0xC0 ),
extendHead = _mm512_set1_epi8( (char)0x80 ),
twoMask = _mm512_set1_epi8( (char)0xE0 ),
twoHead = _mm512_set1_epi8( (char)0xC0 ),
threeMask = _mm512_set1_epi8( (char)0xF0 ),
threeHead = _mm512_set1_epi8( (char)0xE0 ),
fourMask = _mm512_set1_epi8( (char)0xF8 ),
fourHead = _mm512_set1_epi8( (char)0xF0 ),
invalid = fourMask;
uint64_t one = 0, extend = 0, extendPrev = 0, two = 0, three =
0, four = 0;
auto doChunk = [&]( span_t::iterator it64 ) L_FORCEINLINE
{
(void)(it64 + 1);
__m512i chunk = _mm512_load_si512( to_address( it64 ) );
if( _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, invalid ), invalid ) & mask ) [[unlikely]]
return false;
one = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, oneMask ), oneHead ) & mask;
extend = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, extendMask ), extendHead ) & mask;
two = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, twoMask ), twoHead ) & mask;
three = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, threeMask ), threeHead ) & mask;
four = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk, fourMask ), fourHead ) & mask;
auto shrd = []( uint64_t left, uint64_t right, unsigned n ) L_FORCEINLINE { return left << 64 - n | right >> n; };
uint64_t
extend2 = shrd( extendPrev, extend, 1 ),
extend3 = shrd( extendPrev, extend, 2 ) & extend2,
extend4 = shrd( extendPrev, extend, 3 ) & extend3,
beyond = shrd( extendPrev, extend, 4 ) & extend4,
err;
err = (two & extend2) ^ two;
err |= (three & extend3) ^ three;
err |= (four & extend4) ^ four;
err |= one & extend2;
err |= two & extend3;
err |= three & extend4;
err |= four & beyond;
if( err ) [[unlikely]]
return false;
n += popcount( one | two | three | four );
extendPrev = extend;
return true;
};
span_t::iterator it64 = range64.end();
mask = tail ? ~(ALL_ONES << tail) : ALL_ONES;
while( it64 > range64.begin() + (size_t)(bool)head )
if( doChunk( --it64 ) ) [[likely]]
mask = ALL_ONES;
else
return -1;
if( head ) [[likely]]
{
mask &= ALL_ONES << head;
doChunk( it64 );
}
if( countr_zero( extendPrev ) < countr_zero( one | two | three
| four ) ) [[unlikely]]
return -1;
return n;
}
else
{
__m512i const
mask24 = _mm512_set1_epi8( (char)0xC0 ),
head24 = mask24;
auto doChunk = [&]( span_t::iterator it64 ) L_FORCEINLINE
{
(void)(it64 + 1);
__m512i chunk = _mm512_load_si512( to_address( it64 ) );
uint64_t
one = _mm512_cmpeq_epi8_mask( _mm512_and_si512( chunk,
oneMask ), oneHead ) & mask,
twoAndMore = _mm512_cmpeq_epi8_mask( _mm512_and_si512(
chunk, mask24 ), head24 ) & mask;
n += popcount( one | twoAndMore );
};
span_t::iterator it64 = range64.begin();
mask = ALL_ONES << head;;
for( ; it64 != range64.end() - (bool)tail; ++it64 )
{
doChunk( it64);
mask = -1;
}
if( !tail )
return n;
mask &= ~(ALL_ONES << tail);
doChunk( it64 );
return n;
}
}

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Wed Dec 3 06:24:23 2025

From Newsgroup: comp.lang.c

Am 18.11.2025 um 21:17 schrieb Michael Sanders:

Hi James, umm 'guarantees'? No no... It does NOT verify:

- whether the environment actually supports UTF8 fully
- whether multibyte functions are enabled
- whether the terminal supports UTF8
- whether the C library supports UTF8 normalization
(combining characters, etc. but it seems to work well here)

To be sure: It's not a UTF-8 capability test. It's only a
locale-string check. So it likely misses many valid UTF8
locale variants...

Here I'm running any mixture of: Windows/BSD/Linix Mint LMDE.

Windows has the ...W() APIs along with codepage-based APIs with
the ...A() Suffix. The W()-APIs support UTF-16, so no need for
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Wed Dec 3 18:33:05 2025

From Newsgroup: comp.lang.c

On Wed, 3 Dec 2025 06:24:23 +0100, Bonita Montero wrote:

Here I'm running any mixture of: Windows/BSD/Linix Mint LMDE.

Windows has the ...W() APIs along with codepage-based APIs with
the ...A() Suffix. The W()-APIs support UTF-16, so no need for

Hi Bonita.

Yes that's correct, but...

- that assumes we know in advance what the character is

- it would only work under Windows

We want portability across diverse OSs. In my case, the program
does NOT care what the character is, it simply needs to be able
to find it when searching data & displaying it in an ordered way.

The code below works perfectly:

#include <stdio.h>
#include <string.h>

int utf8_display_width(const char *s) {
int w = 0;

while (*s) {
unsigned char b = *s;
unsigned cp;
int n;

// UTF-8 decoder
if (b <= 0x7F) { // 1-byte ASCII
cp = b;
n = 1;
} else if (b >= 0xC0 && b <= 0xDF) { // 2-byte
cp = ((b & 0x1F) << 6) |
(s[1] & 0x3F);
n = 2;
} else if (b >= 0xE0 && b <= 0xEF) { // 3-byte
cp = ((b & 0x0F) << 12) |
((s[1] & 0x3F) << 6) |
(s[2] & 0x3F);
n = 3;
} else if (b >= 0xF0 && b <= 0xF7) { // 4-byte
cp = ((b & 0x07) << 18) |
((s[1] & 0x3F) << 12) |
((s[2] & 0x3F) << 6) |
(s[3] & 0x3F);
n = 4;
} else { // invalid, treat as 1-byte
cp = b;
n = 1;
}

// display width
if (cp >= 0x0300 && cp <= 0x036F) {} // combining marks like é (zero width)
else if ( // double-width characters...
(cp >= 0x1100 && cp <= 0x115F) || // hangul jamo
(cp >= 0x2E80 && cp <= 0xA4CF) || // cjk radicals & unified ideographs
(cp >= 0xAC00 && cp <= 0xD7A3) || // hangul syllables
(cp >= 0xF900 && cp <= 0xFAFF) || // cjk compatibility ideographs
(cp >= 0x1F300 && cp <= 0x1FAFF) // emoji + symbols
) { w += 2; }
// exceptional wide characters (unicode requirement I've read elsewhere)
else if (cp == 0x2329 || cp == 0x232A) { w += 2; }
else { w += 1; } // normal width for everything else

s += n;
}

return w;
}

int main(void) {
const char *tests[] = {
"hello",
"Café",
"漢字",
"✓",
"🙂",
NULL
};

// find maximum display width in 1st column
int maxw = 0;
for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
if (w > maxw) maxw = w;
}

// total padding after each 1st column + 3 spaces
int total_pad = maxw + 3;

for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
int sl = strlen(tests[i]);
printf("%s", tests[i]);
int pad = total_pad - w;
while (pad-- > 0) putchar(' ');
printf("strlen: %d utf8 display width: %d\n", sl, w);
}

return 0;
}

// eof
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Wed Dec 3 14:01:38 2025

From Newsgroup: comp.lang.c

On 2025-12-03 13:33, Michael Sanders wrote:
...

We want portability across diverse OSs. In my case, the program
does NOT care what the character is, it simply needs to be able
to find it when searching data & displaying it in an ordered way.

The code below works perfectly:

#include <stdio.h>
#include <string.h>

int utf8_display_width(const char *s) {
int w = 0;

while (*s) {
unsigned char b = *s;
unsigned cp;
int n;

// UTF-8 decoder
if (b <= 0x7F) { // 1-byte ASCII
cp = b;
n = 1;
} else if (b >= 0xC0 && b <= 0xDF) { // 2-byte
cp = ((b & 0x1F) << 6) |
(s[1] & 0x3F);
n = 2;
} else if (b >= 0xE0 && b <= 0xEF) { // 3-byte
cp = ((b & 0x0F) << 12) |
((s[1] & 0x3F) << 6) |
(s[2] & 0x3F);
n = 3;
} else if (b >= 0xF0 && b <= 0xF7) { // 4-byte
cp = ((b & 0x07) << 18) |
((s[1] & 0x3F) << 12) |
((s[2] & 0x3F) << 6) |
(s[3] & 0x3F);
n = 4;
} else { // invalid, treat as 1-byte
cp = b;
n = 1;
}

// display width
if (cp >= 0x0300 && cp <= 0x036F) {} // combining marks like é (zero
width)
else if ( // double-width characters...
(cp >= 0x1100 && cp <= 0x115F) || // hangul jamo
(cp >= 0x2E80 && cp <= 0xA4CF) || // cjk radicals & unified ideographs
(cp >= 0xAC00 && cp <= 0xD7A3) || // hangul syllables
(cp >= 0xF900 && cp <= 0xFAFF) || // cjk compatibility ideographs
(cp >= 0x1F300 && cp <= 0x1FAFF) // emoji + symbols
) { w += 2; }
// exceptional wide characters (unicode requirement I've read elsewhere)
else if (cp == 0x2329 || cp == 0x232A) { w += 2; }
else { w += 1; } // normal width for everything else

s += n;
}

return w;
}

int main(void) {
const char *tests[] = {
"hello",
"Café",
"漢字",
"✓",
"🙂",
NULL
};

// find maximum display width in 1st column
int maxw = 0;
for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
if (w > maxw) maxw = w;
}

// total padding after each 1st column + 3 spaces
int total_pad = maxw + 3;

for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
int sl = strlen(tests[i]);
printf("%s", tests[i]);
int pad = total_pad - w;
while (pad-- > 0) putchar(' ');
printf("strlen: %d utf8 display width: %d\n", sl, w);
}

return 0;
}

// eof

I find it confusing that this is supposed to "work perfectly" "across
diverse OSs". The amount of space that a character takes up varies
depending upon the installed fonts, especially on whether the font is monospaced or proportional. Those fonts can be different for display on
screen or on a printer. I don't see any query to determine even what the current font is, much less what it's characteristics are. I don't know
of any OS-independent way of collecting such information. Does this
solution "work perfectly" only for your own particular favorite font?

--- Synchronet 3.21a-Linux NewsLink 1.2

From bart@bc@freeuk.com to comp.lang.c on Wed Dec 3 20:15:02 2025

From Newsgroup: comp.lang.c

On 03/12/2025 19:01, James Kuyper wrote:

On 2025-12-03 13:33, Michael Sanders wrote:
...

We want portability across diverse OSs. In my case, the program
does NOT care what the character is, it simply needs to be able
to find it when searching data & displaying it in an ordered way.

The code below works perfectly:

#include <stdio.h>
#include <string.h>

int utf8_display_width(const char *s) {
int w = 0;

while (*s) {
unsigned char b = *s;
unsigned cp;
int n;

// UTF-8 decoder
if (b <= 0x7F) { // 1-byte ASCII
cp = b;
n = 1;
} else if (b >= 0xC0 && b <= 0xDF) { // 2-byte
cp = ((b & 0x1F) << 6) |
(s[1] & 0x3F);
n = 2;
} else if (b >= 0xE0 && b <= 0xEF) { // 3-byte
cp = ((b & 0x0F) << 12) |
((s[1] & 0x3F) << 6) |
(s[2] & 0x3F);
n = 3;
} else if (b >= 0xF0 && b <= 0xF7) { // 4-byte
cp = ((b & 0x07) << 18) |
((s[1] & 0x3F) << 12) |
((s[2] & 0x3F) << 6) |
(s[3] & 0x3F);
n = 4;
} else { // invalid, treat as 1-byte
cp = b;
n = 1;
}

// display width
if (cp >= 0x0300 && cp <= 0x036F) {} // combining marks like é (zero
width)
else if ( // double-width characters...
(cp >= 0x1100 && cp <= 0x115F) || // hangul jamo
(cp >= 0x2E80 && cp <= 0xA4CF) || // cjk radicals & unified ideographs
(cp >= 0xAC00 && cp <= 0xD7A3) || // hangul syllables
(cp >= 0xF900 && cp <= 0xFAFF) || // cjk compatibility ideographs
(cp >= 0x1F300 && cp <= 0x1FAFF) // emoji + symbols
) { w += 2; }
// exceptional wide characters (unicode requirement I've read elsewhere)
else if (cp == 0x2329 || cp == 0x232A) { w += 2; }
else { w += 1; } // normal width for everything else

s += n;
}

return w;
}

int main(void) {
const char *tests[] = {
"hello",
"Café",
"漢字",
"✓",
"🙂",
NULL
};

// find maximum display width in 1st column
int maxw = 0;
for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
if (w > maxw) maxw = w;
}

// total padding after each 1st column + 3 spaces
int total_pad = maxw + 3;

for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
int sl = strlen(tests[i]);
printf("%s", tests[i]);
int pad = total_pad - w;
while (pad-- > 0) putchar(' ');
printf("strlen: %d utf8 display width: %d\n", sl, w);
}

return 0;
}

// eof

I find it confusing that this is supposed to "work perfectly" "across
diverse OSs". The amount of space that a character takes up varies
depending upon the installed fonts, especially on whether the font is monospaced or proportional. Those fonts can be different for display on screen or on a printer. I don't see any query to determine even what the current font is, much less what it's characteristics are. I don't know
of any OS-independent way of collecting such information. Does this
solution "work perfectly" only for your own particular favorite font?

This looks like a solution for a fixed-pitch font. I get this output for
a Windows console display (with - used for space):

hello---strlen: 5 utf8 display width: 5
Café----strlen: 5 utf8 display width: 4
漢字----strlen: 6 utf8 display width: 4
✓-------strlen: 3 utf8 display width: 1
🙂------strlen: 4 utf8 display width: 2

I was hoping this would be lined up, but already, in a Thunderbird edit Window, the last lines aren't lined up properly.

Same problem with Notepad (fixed pitch) and LibreOffice (fixed pitch).

It only looks alright in Windows and WSL consoles/terminals. But maybe
that's all that's needed.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.lang.c on Wed Dec 3 22:43:05 2025

From Newsgroup: comp.lang.c

On Wed, 3 Dec 2025 20:15:02 +0000
bart <bc@freeuk.com> wrote:

This looks like a solution for a fixed-pitch font. I get this output
for a Windows console display (with - used for space):

hello---strlen: 5 utf8 display width: 5
Café----strlen: 5 utf8 display width: 4

It sounds as a luck. é in your text just happened to be encoded as
U+00E9. What if it was encoded as U+0065,U+00B4 ? (Hopefully, I got the
correct code, I can't really distinguish between similar diacritics).

漢字----strlen: 6 utf8 display width: 4
✓-------strlen: 3 utf8 display width: 1
🙂------strlen: 4 utf8 display width: 2

I was hoping this would be lined up, but already, in a Thunderbird
edit Window, the last lines aren't lined up properly.

Same problem with Notepad (fixed pitch) and LibreOffice (fixed pitch).

It only looks alright in Windows and WSL consoles/terminals. But
maybe that's all that's needed.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Dec 3 12:49:23 2025

From Newsgroup: comp.lang.c

bart <bc@freeuk.com> writes:

On 03/12/2025 19:01, James Kuyper wrote:

[...]

I find it confusing that this is supposed to "work perfectly"
"across
diverse OSs". The amount of space that a character takes up varies
depending upon the installed fonts, especially on whether the font is
monospaced or proportional. Those fonts can be different for display on
screen or on a printer. I don't see any query to determine even what the
current font is, much less what it's characteristics are. I don't know
of any OS-independent way of collecting such information. Does this
solution "work perfectly" only for your own particular favorite font?

This looks like a solution for a fixed-pitch font. I get this output
for a Windows console display (with - used for space):

[...]

I think bart is right that this is specific to fixed-width fonts.
For a variable width font, 'W' is going to be wider than '|'.

See also the POSIX `int wcwidth(wchar_t wc)` function, which returns
the "number of column positions of a wide-character code". It does
depend on the current locale.

The assumption seems to be that fixed-width fonts are expected to be
consistent about the widths of characters.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Wed Dec 3 23:23:30 2025

From Newsgroup: comp.lang.c

On Wed, 3 Dec 2025 14:01:38 -0500, James Kuyper wrote:

I find it confusing that this is supposed to "work perfectly" "across
diverse OSs". The amount of space that a character takes up varies
depending upon the installed fonts, especially on whether the font is monospaced or proportional. Those fonts can be different for display on screen or on a printer. I don't see any query to determine even what the current font is, much less what it's characteristics are. I don't know
of any OS-independent way of collecting such information. Does this
solution "work perfectly" only for your own particular favorite font?

Just for use in the terminal & yes it works as advertised.

In my case I simply need to match the character the user passed
to the program when searching for a record. I dont want or need
to know what font is used. If the terminal can display it, then
I want to use it.

Example, user invokes: tinybase -s=漢字 data/*.tbf

Output is...

FILE: data/history.tbf
LINE: 170
BLOCK: 4
CRC-8: 0x30
QUERY: 漢字
MATCH: 漢字

TAGS: China, History, <漢字>, [wrap:66]

Ancient China...

1. Geography and Early Beginnings: Ancient China, a cradle of
civilization, evolved along the Yellow River's fertile plains.
Protected by the Himalayas to the south, the Gobi Desert to the
north, and vast seas to the east, this geographic isolation
allowed for a unique and continuous cultural development spanning
millennia.

...

James, earnestly intending no offense - add something to the
conversion rather than complaining - I want to learn & solve
problems that's where I'm seeking help. Just modify the code,
make it get closer to your ideal. We'll all benefit.
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Dec 3 18:15:38 2025

From Newsgroup: comp.lang.c

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

bart <bc@freeuk.com> writes:

On 03/12/2025 19:01, James Kuyper wrote:

[...]

I find it confusing that this is supposed to "work perfectly"
"across
diverse OSs". The amount of space that a character takes up varies
depending upon the installed fonts, especially on whether the font is
monospaced or proportional. Those fonts can be different for display on
screen or on a printer. I don't see any query to determine even what the >>> current font is, much less what it's characteristics are. I don't know
of any OS-independent way of collecting such information. Does this
solution "work perfectly" only for your own particular favorite font?

This looks like a solution for a fixed-pitch font. I get this output
for a Windows console display (with - used for space):

[...]

I think bart is right that this is specific to fixed-width fonts.
For a variable width font, 'W' is going to be wider than '|'.

See also the POSIX `int wcwidth(wchar_t wc)` function, which returns
the "number of column positions of a wide-character code". It does
depend on the current locale.

The assumption seems to be that fixed-width fonts are expected to be consistent about the widths of characters.

And in fact Unicode specifies how many cell positions each printable
character occupies, or at least for some of them.

The following is quoted from wcwidth.c in the xterm sources. The text
was originally written by Markus Kuhn.

* For some graphical characters, the Unicode standard explicitly
* defines a character-cell width via the definition of the East Asian
* FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
* In all these cases, there is no ambiguity about which width a
* terminal shall use. For characters in the East Asian Ambiguous (A)
* class, the width choice depends purely on a preference of backward
* compatibility with either historic CJK or Western practice.
* Choosing single-width for these characters is easy to justify as
* the appropriate long-term solution, as the CJK practice of
* displaying these characters as double-width comes from historic
* implementation simplicity (8-bit encoded characters were displayed
* single-width and 16-bit ones double-width, even for Greek,
* Cyrillic, etc.) and not any typographic considerations.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.21a-Linux NewsLink 1.2

From Michael Sanders@porkchop@invalid.foo to comp.lang.c on Thu Dec 4 04:11:35 2025

From Newsgroup: comp.lang.c

Ever worked with binary search trees Bonita?

I've been playing around with them, or was awhile back at least...

My criteria was to build nodes alphabetically:

- Left subtree contains keys less than the node

- Right subtree contains keys greater than the node

INSTRUMENTATION

I
/ \
E N
/ / \
A M S
/ / \
I R T
/ \
N U
\ /
O T
/ \
N T
--
:wq
Mike Sanders
--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Dec 4 14:03:54 2025

From Newsgroup: comp.lang.c

Am 03.12.2025 um 19:33 schrieb Michael Sanders:

On Wed, 3 Dec 2025 06:24:23 +0100, Bonita Montero wrote:

Here I'm running any mixture of: Windows/BSD/Linix Mint LMDE.

Windows has the ...W() APIs along with codepage-based APIs with
the ...A() Suffix. The W()-APIs support UTF-16, so no need for

Hi Bonita.

Yes that's correct, but...

- that assumes we know in advance what the character is

- it would only work under Windows

We want portability across diverse OSs. In my case, the program
does NOT care what the character is, it simply needs to be able
to find it when searching data & displaying it in an ordered way.

VC++ supports C- and C++ locale if you like to have it portable.
Especially the locale-support in C++ with its facets is very nice
to handle: https://en.cppreference.com/w/cpp/locale.html

The code below works perfectly:

#include <stdio.h>
#include <string.h>

int utf8_display_width(const char *s) {
int w = 0;

while (*s) {
unsigned char b = *s;
unsigned cp;
int n;

// UTF-8 decoder
if (b <= 0x7F) { // 1-byte ASCII
cp = b;
n = 1;
} else if (b >= 0xC0 && b <= 0xDF) { // 2-byte
cp = ((b & 0x1F) << 6) |
(s[1] & 0x3F);
n = 2;
} else if (b >= 0xE0 && b <= 0xEF) { // 3-byte
cp = ((b & 0x0F) << 12) |
((s[1] & 0x3F) << 6) |
(s[2] & 0x3F);
n = 3;
} else if (b >= 0xF0 && b <= 0xF7) { // 4-byte
cp = ((b & 0x07) << 18) |
((s[1] & 0x3F) << 12) |
((s[2] & 0x3F) << 6) |
(s[3] & 0x3F);
n = 4;
} else { // invalid, treat as 1-byte
cp = b;
n = 1;
}

// display width
if (cp >= 0x0300 && cp <= 0x036F) {} // combining marks like é (zero width)
else if ( // double-width characters...
(cp >= 0x1100 && cp <= 0x115F) || // hangul jamo
(cp >= 0x2E80 && cp <= 0xA4CF) || // cjk radicals & unified ideographs
(cp >= 0xAC00 && cp <= 0xD7A3) || // hangul syllables
(cp >= 0xF900 && cp <= 0xFAFF) || // cjk compatibility ideographs
(cp >= 0x1F300 && cp <= 0x1FAFF) // emoji + symbols
) { w += 2; }
// exceptional wide characters (unicode requirement I've read elsewhere)
else if (cp == 0x2329 || cp == 0x232A) { w += 2; }
else { w += 1; } // normal width for everything else

s += n;
}

return w;
}

int main(void) {
const char *tests[] = {
"hello",
"Café",
"漢字",
"✓",
"🙂",
NULL
};

// find maximum display width in 1st column
int maxw = 0;
for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
if (w > maxw) maxw = w;
}

// total padding after each 1st column + 3 spaces
int total_pad = maxw + 3;

for (int i = 0; tests[i]; i++) {
int w = utf8_display_width(tests[i]);
int sl = strlen(tests[i]);
printf("%s", tests[i]);
int pad = total_pad - w;
while (pad-- > 0) putchar(' ');
printf("strlen: %d utf8 display width: %d\n", sl, w);
}

return 0;
}

// eof

--- Synchronet 3.21a-Linux NewsLink 1.2

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Thu Dec 4 14:15:35 2025

From Newsgroup: comp.lang.c

Am 03.12.2025 um 20:01 schrieb James Kuyper:

I find it confusing that this is supposed to "work perfectly" "across
diverse OSs". The amount of space that a character takes up varies
depending upon the installed fonts, especially on whether the font is monospaced or proportional. Those fonts can be different for display on screen or on a printer. I don't see any query to determine even what the current font is, much less what it's characteristics are. I don't know
of any OS-independent way of collecting such information. Does this
solution "work perfectly" only for your own particular favorite font?

Can C handle that with those means given by the standard itself.
And is this really necessary to consider. Consoles are almost always
fixed space. I guess the standard output for an laser printer in line
printed mode is also fixed space.

--- Synchronet 3.21a-Linux NewsLink 1.2

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,089
Nodes:	10 (0 / 10)
Uptime:	153:54:12
Calls:	13,921
Calls today:	2
Files:	187,021
D/L today:	3,760 files (944M bytes)
Messages:	2,457,163

Re: Unicode...

Who's Online

Recent Visitors

System Info