1% This file is part of the Attempto Parsing Engine (APE).
    2% Copyright 2008-2013, Attempto Group, University of Zurich (see http://attempto.ifi.uzh.ch).
    3%
    4% The Attempto Parsing Engine (APE) is free software: you can redistribute it and/or modify it
    5% under the terms of the GNU Lesser General Public License as published by the Free Software
    6% Foundation, either version 3 of the License, or (at your option) any later version.
    7%
    8% The Attempto Parsing Engine (APE) is distributed in the hope that it will be useful, but WITHOUT
    9% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
   10% PURPOSE. See the GNU Lesser General Public License for more details.
   11%
   12% You should have received a copy of the GNU Lesser General Public License along with the Attempto
   13% Parsing Engine (APE). If not, see http://www.gnu.org/licenses/.
   14
   15
   16:- module(chars, [
   17	is_digit/1,
   18	is_capitalized/1,
   19	to_lowercase/2,
   20	is_lowercase/1,
   21	is_uppercase/1,
   22	is_letter/1,
   23	is_unused/1,
   24	is_sentence_end_symbol/1
   25	]).

Characters

author: - Kaarel Kaljurand; - Tobias Kuhn
version: - 2009-04-29
Note: SWI has char_type/2 which could be used instead of the predicates defined in this module. But Sicstus does not seem to have anything similar. Note also: char_type/2 does not seem to be good because it does not handle anything beyond ASCII-7. */

is_capitalized(+Token:atom) is semidet

Succeeds if Token starts with an uppercase letter.

Arguments:

Token

- is an ACE token

   46is_capitalized(Token) :-
   47	atom(Token),
   48	atom_codes(Token, [Code | _]),
   49	is_uppercase(Code).

to_lower_case(+TokenIn, -TokenOut): Transforms the first character into a lowercase character. The same token is returned if the first character is not an uppercase character.

   57to_lowercase(TokenIn, TokenOut) :-
   58    atom(TokenIn),
   59    atom_codes(TokenIn, [Code|CodesRest]),
   60    is_uppercase(Code),
   61    !,
   62    NewCode is Code + 32,
   63    atom_codes(TokenOut, [NewCode|CodesRest]).
   64
   65to_lowercase(Token, Token).

is_digit(+Code:integer) is semidet

Succeeds if Code corresponds to an ASCII-7 digit symbol.

Arguments:

Code

- is a character code

   74is_digit(Code) :-
   75	48 =< Code, Code =< 57.

is_lowercase(+Code:integer) is semidet

Succeeds iff Code corresponds to a lowercase letter.

Arguments:

Code

- is a character code

We also test lowercase letters from the upper half of latin1.

   86% 0x61 -- 0x7a (a..z)
   87is_lowercase(Code) :-
   88	97 =< Code, Code =< 122.
   89
   90% 0xdf -- 0xf6
   91is_lowercase(Code) :-
   92	223 =< Code, Code =< 246.
   93
   94% 0xf8 -- 0xff
   95is_lowercase(Code) :-
   96	248 =< Code, Code =< 255.

is_uppercase(+Code:integer) is semidet

Succeeds iff Code corresponds to an uppercase letter.

Arguments:

Code

- is a character code

We also test uppercase letters from the upper half of latin1.

  107% 0x41 -- 0x5a (A..Z)
  108is_uppercase(Code) :-
  109	65 =< Code, Code =< 90.
  110
  111% 0xc0 -- 0xd6
  112is_uppercase(Code) :-
  113	192 =< Code, Code =< 214.
  114
  115% 0xd8 -- 0xde
  116is_uppercase(Code) :-
  117	216 =< Code, Code =< 222.

is_letter(+Code:integer) is semidet

Succeeds iff Code corresponds to a lower- or uppercase letter.

Arguments:

Code

- is a character code

  126is_letter(Code) :-
  127	is_lowercase(Code).
  128
  129is_letter(Code) :-
  130	is_uppercase(Code).

is_unused(+Code:integer) is semidet

Succeeds iff Code corresponds to an unused character according to http://www.w3.org/MarkUp/html3/latin1.html In addition: tab, linefeed.

Arguments:

Code

- is a character code

  141is_unused(Code) :-
  142	Code =< 31.
  143
  144is_unused(Code) :-
  145	127 =< Code, Code =< 160.

is_sentence_end_symbol(?Token:atom) is semidet

Tests if Token is an ACE sentence end symbol.

Arguments:

Token

- is a token.

  154is_sentence_end_symbol('.').
  155is_sentence_end_symbol('?').
  156is_sentence_end_symbol('!')