This is the relevant part of XS, which should convert an Perl string from UTF-8 to codepoints (unsigned 32-bit integers):
UV *
text2UV (SV *sv, STRLEN *lenp)
{
STRLEN len;
// char *str = SvPV(foo_sv, strlen);
// char *s = SvPV (sv, len); // This original version warns
U8 *s = (U8 *)SvPV (sv, len); // This casts without warning
UV *r = (UV *)SvPVX (sv_2mortal (NEWSV (0, (len + 1) * sizeof (UV))));
UV *p = r;
if (SvUTF8 (sv))
{
STRLEN clen;
while (len)
{
// UV utf8_to_uvchr_buf(const U8 *s, const U8 *send, STRLEN *retlen)
*p++ = utf8n_to_uvchr (s, len, &clen, 0);
if (clen < 0)
croak ("illegal unicode character in string");
s += clen;
len -= clen;
}
}
else
while (len--)
*p++ = *(unsigned char *)s++;
*lenp = p - r;
return r;
}
It throws this warning:
~/github/perl/Text-Levenshtein-BVXS$ make
cp BVXS.pm blib/lib/Text/Levenshtein/BVXS.pm
Running Mkbootstrap for BVXS ()
chmod 644 "BVXS.bs"
"/Users/helmut/perl5/perlbrew/perls/perl-5.32.0/bin/perl" -MExtUtils::Command::MM -e 'cp_nonempty' -- BVXS.bs blib/arch/auto/Text/Levenshtein/BVXS/BVXS.bs 644
"/Users/helmut/perl5/perlbrew/perls/perl-5.32.0/bin/perl" "/Users/helmut/perl5/perlbrew/perls/perl-5.32.0/lib/5.32.0/ExtUtils/xsubpp" -typemap '/Users/helmut/perl5/perlbrew/perls/perl-5.32.0/lib/5.32.0/ExtUtils/typemap' BVXS.xs > BVXS.xsc
mv BVXS.xsc BVXS.c
cc -c -I. -fno-common -DPERL_DARWIN -mmacosx-version-min=10.14 -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -I/opt/local/include -DPERL_USE_SAFE_PUTENV -O3 -DVERSION=\"0.01\" -DXS_VERSION=\"0.01\" "-I/Users/helmut/perl5/perlbrew/perls/perl-5.32.0/lib/5.32.0/darwin-2level/CORE" BVXS.c
BVXS.xs:26:35: warning: passing 'char *' to parameter of type 'const U8 *' (aka 'const unsigned char *') converts between pointers to integer types with different sign [-Wpointer-sign]
*p++ = utf8n_to_uvchr (s, len, &clen, 0);
^
/Users/helmut/perl5/perlbrew/perls/perl-5.32.0/lib/5.32.0/darwin-2level/CORE/utf8.h:74:54: note: expanded from macro 'utf8n_to_uvchr'
utf8n_to_uvchr_error(s, len, lenp, flags, 0)
^
/Users/helmut/perl5/perlbrew/perls/perl-5.32.0/lib/5.32.0/darwin-2level/CORE/utf8.h:76:45: note: expanded from macro 'utf8n_to_uvchr_error'
utf8n_to_uvchr_msgs(s, len, lenp, flags, errors, 0)
^
/Users/helmut/perl5/perlbrew/perls/perl-5.32.0/lib/5.32.0/darwin-2level/CORE/inline.h:1781:36: note: passing argument to parameter 's' here
Perl_utf8n_to_uvchr_msgs(const U8 *s,
^
1 warning generated.
rm -f blib/arch/auto/Text/Levenshtein/BVXS/BVXS.bundle
cc -mmacosx-version-min=10.14 -bundle -undefined dynamic_lookup -L/usr/local/lib -L/opt/local/lib -fstack-protector-strong BVXS.o -o blib/arch/auto/Text/Levenshtein/BVXS/BVXS.bundle \
\
It works and passes my tests. But if I want to deliver it to CPAN the distribution should not throw warnings.
Decode it with own code in C would be a work-around (and faster).
For me it looks like a bug in the XS macros and/or the example in the documentation are wrong.
The interplay of U8 and char in the API is a bit weird. You might ask #p5p to see why it works that way.
Failing that, though, would some plain typecasting suppress the warnings? Is this in a public repository somewhere?
Aside: SvPV is evil. Its prevalence in XS modules causes quite a lot of pain. Avoid it when possible. See: https://dev.to/fgasper/perl-s-svpv-menace-5515
Update: This looks to be a case where it's necessary to break the abstraction. Alas.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.