- unicode_map(+In, -Out, +Options) is det
- Perform a Unicode mapping on In, returning Out. Options is a list
that may contain any combination of the flags below; a call is
roughly equivalent to
utf8proc_map(In, Options) in the C API.
- stable
- Respect Unicode versioning stability --- the result does not
depend on which (recent) version of Unicode is in use.
- compat
- Use compatibility decomposition (i.e. formatting information is
lost).
- compose
- Produce a composed result (e.g. NFC or NFKC, depending on the
presence of
compat).
- decompose
- Produce a decomposed result (NFD/NFKD).
- ignore
- Strip "default ignorable" characters (e.g. soft hyphen, zero-width
space).
- rejectna
- Raise an error instead of returning output when the input contains
unassigned code points.
- nlf2ls
- Convert all NLF-sequences (LF, CRLF, CR, NEL) to U+2028 LINE
SEPARATOR.
- nlf2ps
- Convert all NLF-sequences to U+2029 PARAGRAPH SEPARATOR.
- nlf2lf
- Convert all NLF-sequences to U+000A LINE FEED.
- stripcc
- Strip or convert control characters. NLF-sequences become a
space, except if one of the NLF-conversion flags is set; HT and
FF are treated as NLF in this case. All other control
characters are removed.
- casefold
- Apply Unicode case folding (for caseless comparison).
- charbound
- Insert a U+00FF byte at the beginning of every grapheme cluster
(UAX#29). The result can be split on 0xFF to recover individual
graphemes; atom_graphemes/2 wraps this pattern.
- lump
- Normalise typographic variants to their ASCII equivalents
(see module header for the full list). Combined with
nlf2lf, paragraph and line separators become U+000A as well.
- stripmark
- Strip all combining marks (non-spacing, spacing, enclosing).
Must be combined with
compose or decompose.