[det]unicode_map(+In,
-Out, +Options)Perform a Unicode mapping on In, returning Out. Options
is a list that may contain any combination of the flags below; a call is
roughly equivalent to utf8proc_map(In, Options) in the C
API.
- stable
- Respect Unicode versioning stability --- the result does not depend on
which (recent) version of Unicode is in use.
- compat
- Use compatibility decomposition (i.e. formatting information is lost).
- compose
- Produce a composed result (e.g. NFC or NFKC, depending on the presence
of
compat).
- decompose
- Produce a decomposed result (NFD/NFKD).
- ignore
- Strip "default ignorable" characters (e.g. soft hyphen, zero-width
space).
- rejectna
- Raise an error instead of returning output when the input contains
unassigned code points.
- nlf2ls
- Convert all NLF-sequences (LF, CRLF, CR, NEL) to U+2028 LINE SEPARATOR.
- nlf2ps
- Convert all NLF-sequences to U+2029 PARAGRAPH SEPARATOR.
- nlf2lf
- Convert all NLF-sequences to U+000A LINE FEED.
- stripcc
- Strip or convert control characters. NLF-sequences become a space,
except if one of the NLF-conversion flags is set; HT and FF are treated
as NLF in this case. All other control characters are removed.
- casefold
- Apply Unicode case folding (for caseless comparison).
- charbound
- Insert a U+00FF byte at the beginning of every grapheme cluster (UAX#29).
The result can be split on 0xFF to recover individual graphemes; atom_graphemes/2
wraps this pattern.
- lump
- Normalise typographic variants to their ASCII equivalents (see module
header for the full list). Combined with
nlf2lf, paragraph and line separators become U+000A as
well.
- stripmark
- Strip all combining marks (non-spacing, spacing, enclosing). Must be
combined with
compose or decompose.