unicode_security.pl

Re-exported predicates

The following predicates are exported from this file while their implementation is defined in imported modules or non-module files loaded by this module.

unicode_script(+Code:integer, -Script:atom) is semidet

True when Script is the UAX #24 Script_Property of Code. Script is a lower-case atom of the long property value (latin, cyrillic, han, common, inherited, ...). Fails for code points outside the Unicode range or with no entry in Scripts.txt.

unicode_script_extensions(+Code:integer, -Scripts:list(atom)) is semidet

Scripts is the sorted list of UAX #24 Script_Extensions of Code. For most code points this is a singleton [Script]. Fails for code points outside the Unicode range and for code points with no entry in either ScriptExtensions.txt or Scripts.txt.

unicode_identifier_status(+Code:integer, -Status:atom) is semidet

Succeeds, unifying Status with allowed, when Code is listed in UTS #39 IdentifierStatus.txt with status Allowed. Fails otherwise — per UTS #39 every code point not listed there is Restricted by default; rather than return restricted for everything else, this predicate simply fails.

unicode_identifier_type(+Code:integer, -Types:list(atom)) is semidet

Types is the sorted list of UTS #39 Identifier_Type atoms for Code (recommended, inclusion, technical, obsolete, limited_use, exclusion, not_nfkc, not_xid, default_ignorable, deprecated, uncommon_use). Fails for code points outside the Unicode range or with no entry in IdentifierType.txt.

unicode_skeleton(+Text, -Skeleton:atom) is det

Compute the UTS #39 §4 skeleton of Text: apply NFD, substitute each code point with its confusables.txt prototype string, then apply NFD again. Two strings are confusable iff their skeletons compare equal.

unicode_confusable(+T1, +T2) is semidet

True when unicode_skeleton/2 of T1 and T2 are equal.

unicode_confusable(+T1, +T2, +Options) is semidet

As unicode_confusable/2. Options:

ignore_intentional(+Bool): If true, skip the per-character substitution when the source and target form a pair listed in UTS #39 intentional.txt (e.g. Latin A versus Greek capital Alpha). Default false.

unicode_resolved_scripts(+Text, -Scripts:list(atom)) is det

Scripts is the UTS #39 §5.1 resolved augmented Script_Extensions set of Text: the intersection of augscx(c) over all non-Common/non-Inherited characters, with the augmentation rules for Han, Hiragana, Katakana, Hangul and Bopomofo applied. The empty list signals a mixed-script string.

unicode_restriction_level(+Text, -Level:atom) is det

Classify Text under UTS #39 §5.2 at the most restrictive level for which it qualifies. Level is one of:

ascii_only — every code point in U+0020..U+007E and Allowed.
single_script — augmented resolved-script-set non-empty and every code point Allowed.
highly_restrictive — covered by Latin plus one of Hanb, Jpan or Kore (UTS #39 §5.1 augmented profiles).
moderately_restrictive — covered by Latin plus a single non-Latin Recommended script (Cyrl or Grek).
minimally_restrictive — every code point has Identifier_Type in {recommended, inclusion}.
unrestricted — otherwise. A linter that walks source clauses and reports atoms with the confusability issues above is registered in library(check) itself (predicate list_confusable_identifiers/0); see the library(check) documentation for details.