Re-exported predicates
The following predicates are exported from this file while their implementation is defined in imported modules or non-module files loaded by this module.
unicode_script(+Code:integer, -Script:atom) is semidet- True when Script is the UAX #24 Script_Property of Code. Script is a
lower-case atom of the long property value (
latin, cyrillic,
han, common, inherited, ...). Fails for code points outside
the Unicode range or with no entry in Scripts.txt.
unicode_script_extensions(+Code:integer, -Scripts:list(atom)) is semidet- Scripts is the sorted list of UAX #24 Script_Extensions of Code. For
most code points this is a singleton
[Script]. Fails for code
points outside the Unicode range and for code points with no entry
in either ScriptExtensions.txt or Scripts.txt.
unicode_identifier_status(+Code:integer, -Status:atom) is semidet- Succeeds, unifying Status with
allowed, when Code is listed in UTS
#39 IdentifierStatus.txt with status Allowed. Fails otherwise —
per UTS #39 every code point not listed there is Restricted by
default; rather than return restricted for everything else, this
predicate simply fails.
unicode_identifier_type(+Code:integer, -Types:list(atom)) is semidet- Types is the sorted list of UTS #39 Identifier_Type atoms for Code
(
recommended, inclusion, technical, obsolete, limited_use,
exclusion, not_nfkc, not_xid, default_ignorable,
deprecated, uncommon_use). Fails for code points outside the
Unicode range or with no entry in IdentifierType.txt.
unicode_skeleton(+Text, -Skeleton:atom) is det- Compute the UTS #39 §4 skeleton of Text: apply NFD, substitute each
code point with its
confusables.txt prototype string, then apply NFD
again. Two strings are confusable iff their skeletons compare equal.
unicode_confusable(+T1, +T2) is semidet- True when unicode_skeleton/2 of T1 and T2 are equal.
unicode_confusable(+T1, +T2, +Options) is semidet- As unicode_confusable/2. Options:
- ignore_intentional(+Bool)
- If
true, skip the per-character substitution when the source
and target form a pair listed in UTS #39 intentional.txt (e.g.
Latin A versus Greek capital Alpha). Default false.
unicode_resolved_scripts(+Text, -Scripts:list(atom)) is det- Scripts is the UTS #39 §5.1 resolved augmented Script_Extensions set
of Text: the intersection of
augscx(c) over all
non-Common/non-Inherited characters, with the augmentation rules for
Han, Hiragana, Katakana, Hangul and Bopomofo applied. The empty list
signals a mixed-script string.
unicode_restriction_level(+Text, -Level:atom) is det- Classify Text under UTS #39 §5.2 at the most restrictive level for
which it qualifies. Level is one of:
ascii_only — every code point in U+0020..U+007E and Allowed.
single_script — augmented resolved-script-set non-empty and
every code point Allowed.
highly_restrictive — covered by Latin plus one of Hanb,
Jpan or Kore (UTS #39 §5.1 augmented profiles).
moderately_restrictive — covered by Latin plus a single
non-Latin Recommended script (Cyrl or Grek).
minimally_restrictive — every code point has Identifier_Type
in {recommended, inclusion}.
unrestricted — otherwise.
A linter that walks source clauses and reports atoms with the
confusability issues above is registered in library(check) itself
(predicate list_confusable_identifiers/0); see the
library(check) documentation for details.