|Did you know ...||Search Documentation:|
|library(semweb/rdf_litindex): Indexing words in literals|
library(semweb/rdf_litindex.pl) exploits the
primitives of section 4.5.1 and the
NLP package to provide indexing on words inside literal constants. It
also allows for fuzzy matching using stemming and `sounds-like' based on
the double metaphone algorithm of the NLP package.
prefix(Prefix, Words). On compound expressions, only combinations that provide literals are returned. Below is an example after loading the ULAN2Unified List of Artist Names from the Getty Foundation. database and showing all words that sounds like `rembrandt' and appear together in a literal with the word `Rijn'. Finding this result from the 228,710 literals contained in ULAN requires 0.54 milliseconds (AMD 1600+).
?- rdf_token_expansions(and('Rijn', sounds(rembrandt)), L). L = [sounds(rembrandt, ['Rambrandt', 'Reimbrant', 'Rembradt', 'Rembrand', 'Rembrandt', 'Rembrandtsz', 'Rembrant', 'Rembrants', 'Rijmbrand'])]
Here is another example, illustrating handling of diacritics:
?- rdf_token_expansions(case(cafe), L). L = [case(cafe, [cafe, caf\'e])]
rdf_litindex:tokenization(Literal, -Tokens). On failure it calls tokenize_atom/2 from the NLP package and deletes the following: atoms of length 1, floats, integers that are out of range and the english words
the. Deletion first calls the hook
rdf_litindex:exclude_from_index(token, X). This hook is called as follows:
no_index_token(X) :- exclude_from_index(token, X), !. no_index_token(X) :- ...
`Literal maps' provide a relation between literal values, intended to create additional indexes on literals. The current implementation can only deal with integers and atoms (string literals). A literal map maintains an ordered set of keys. The ordering uses the same rules as described in section 4.5. Each key is associated with an ordered set of values. Literal map objects can be shared between threads, using a locking strategy that allows for multiple concurrent readers.
Typically, this module is used together with rdf_monitor/2
on the channals
maintain an index of words that appear in a literal. Further abstraction
using Porter stemming or Metaphone can be used to create additional
search indices. These can map either directly to the literal values, or
indirectly to the plain word-map. The SWI-Prolog NLP package provides
complimentary building blocks, such as a tokenizer, Porter stem and
not(Key). If not-terms are provided, there must be at least one positive keywords. The negations are tested after establishing the positive matches.