Man page - scan_utf8(3)

Packages contains this manual

Manual

scan_utf8

NAME
SYNTAX
DESCRIPTION
NOTE
SEE ALSO

NAME

scan_utf8 - decode an unsigned integer from UTF-8 encoding

SYNTAX

#include <libowfat/scan.h>

size_t scan_utf8 (const char * src ,size_t len ,uint32_t * dest );

size_t scan_utf8_sem (const char * src ,size_t len ,uint32_t * dest );

DESCRIPTION

scan_utf8 decodes an unsigned integer in UTF-8 encoding from a memory area holding binary data. It writes the decode value in dest and returns the number of bytes it read from src .

scan_utf8 never reads more than len bytes from src . If the sequence is longer than that, or the memory area contains an invalid sequence, scan_utf8 returns 0 and does not touch dest .

The length of the longest valid UTF-8 sequence is 6.

scan_utf8 will reject syntactically invalid encodings, but not semantically invalid ones. scan_utf8_sem will additionally reject surrogates.

NOTE

fmt_utf8 and scan_utf8 implement the encoding from UTF-8, but are meant to be able to store integers, not just Unicode code points. Values above 0x10ffff are not valid UTF-8. If you are using this function to parse UTF-8, you need to reject them (see RFC 3629).

SEE ALSO

fmt_utf8(3), scan_utf8_sem(3)