Did you know ... | Search Documentation: |
Pack pinyin -- prolog/pinyin.pl |
This module implements a grammar that parses and generates words written in Hanyu Pinyin, the standard romanization system for Mandarin Chinese. It also provides a utility to convert whole texts between diacritics and numbers for writing tones.
Maximal substrings of Pinyin letters, the numbers 1-4 as well as the
characters '
and -
are converted if they can be parsed as
lower-case, capitalized or all-caps Pinyin words in the input format,
everything else is left alone. Case is preserved.
The assumed input format is diacritics if Num is variable, numbers otherwise.
Example usage:
?- set_prolog_flag(double_quotes, codes). true. ?- num_dia("Wo3 xian4zai4 dui4 jing1ju4 hen3 gan3 xing4qu4.", | Codes), atom_codes(Atom, Codes). Codes = [87, 466, 32, 120, 105, 224, 110, 122, 224|...], Atom = 'Wǒ xiànzài duì jīngjù hěn gǎn xìngqù.'. ?- num_dia(Codes, "Nǐ ne?"), atom_codes(Atom, Codes). Codes = [78, 105, 51, 32, 110, 101, 63], Atom = 'Ni3 ne?'.
Morphs is a list of "morphs" (not in the strictest linguistic sense) that make up the word. They take one of three forms:
Initial-Final-Tone
where Initial
and Final
are atoms.
The final takes the "underlying" form, which may be different
from the written form. Tone
is one of the integers from 0
to 4
.r
(for the erhuayin suffix)-
(for word-internal hyphens)
ND is either num
or dia
, depending on how tones are represented
(numbers or diacritics).
The grammar implements the following tricky aspects of Hanyu Pinyin:
The following aspects are currently not supported:
Example usage:
?- set_prolog_flag(double_quotes, codes). true. ?- phrase(word([n-ü-3, ''-er-2], ND), Codes), atom_codes(Atom, Codes). ND = dia, Codes = [110, 474, 39, 233, 114], Atom = 'nǚ\'ér' ; ND = num, Codes = [110, 252, 51, 39, 101, 114, 50], Atom = 'nü3\'er2' ; false. ?- phrase(word(Morphs, ND), "yìjué"). Morphs = [''-i-4, j-üe-2], ND = dia ; false.