|Did you know ...||Search Documentation:|
|Plugin modules for rdf_db|
library(rdf_db) module provides several hooks for
extending its functionality. Database updates can be monitored and acted
upon through the features described in section
3.4. The predicate rdf_load/2
can be hooked to deal with different formats such as rdfturtle,
different input sources (e.g. http) and different strategies for caching
The hooks below are used to add new RDF file formats and sources from which to load data to the library. They are used by the modules described below and distributed with the package. Please examine the source-code if you want to add new formats or locations.
library(library(semweb/rdf_http_plugin))to load RDF from HTTPS servers.
url(Protocol, URL). If this hook succeeds, the RDF will be read from Stream using rdf_load_stream/3. Otherwise the default open functionality for file and stream are used.
owl. Format is either a built-in format (
triples) or a format understood by the rdf_load_stream/3 hook.
module uses the
library(zlib) library to load compressed
files on the fly. The extension of the file must be
The file format is deduced by the extension after stripping the
This module allows for
It exploits the library
format of the URL is determined from the mime-type returned by the
server if this is one of
application/turtle. As RDF mime-types are not yet widely
supported, the plugin uses the extension of the URL if the claimed
mime-type is not one of the above. In addition, it recognises
the XML content for embedded RDF.
library(semweb/rdf_cache) defines the
caching strategy for triples sources. When using large RDF sources,
caching triples greatly speedup loading RDF documents. The cache library
implements two caching strategies that are controlled by rdf_set_cache_options/1.
Local caching This approach applies to files only. Triples are
cached in a sub-directory of the directory holding the source. This
directory is called
Windows). If the cache option
a cache directory is created if posible.
Global caching This approach applies to all sources, except
for unnamed streams. Triples are cached in directory defined by the
When loading an RDF file, the system scans the configured cache files
cache(false) is specified as option to rdf_load/2
or caching is disabled. If caching is enabled but no cache exists, the
system will try to create a cache file. First it will try to do this
locally. On failure it will try to configured global cache.
true, caching is enabled.
local_directory(Name). Plain name of local directory. Default
true, try to create local cache directories
global_directory(Dir)Writeable directory for storing cached parsed files.
true, try to create the global cache directory.
read, it returns the name of an existing file. If
writeit returns where a new cache file can be overwritten or created.
library(semweb/rdf_litindex.pl) exploits the
primitives of section 4.5.1 and the
NLP package to provide indexing on words inside literal constants. It
also allows for fuzzy matching using stemming and `sounds-like' based on
the double metaphone algorithm of the NLP package.
prefix(Prefix, Words). On compound expressions, only combinations that provide literals are returned. Below is an example after loading the ULAN2Unified List of Artist Names from the Getty Foundation. database and showing all words that sounds like `rembrandt' and appear together in a literal with the word `Rijn'. Finding this result from the 228,710 literals contained in ULAN requires 0.54 milliseconds (AMD 1600+).
?- rdf_token_expansions(and('Rijn', sounds(rembrandt)), L). L = [sounds(rembrandt, ['Rambrandt', 'Reimbrant', 'Rembradt', 'Rembrand', 'Rembrandt', 'Rembrandtsz', 'Rembrant', 'Rembrants', 'Rijmbrand'])]
Here is another example, illustrating handling of diacritics:
?- rdf_token_expansions(case(cafe), L). L = [case(cafe, [cafe, caf\'e])]
rdf_litindex:tokenization(Literal, -Tokens). On failure it calls tokenize_atom/2 from the NLP package and deletes the following: atoms of length 1, floats, integers that are out of range and the english words
the. Deletion first calls the hook
rdf_litindex:exclude_from_index(token, X). This hook is called as follows:
no_index_token(X) :- exclude_from_index(token, X), !. no_index_token(X) :- ...
`Literal maps' provide a relation between literal values, intended to create additional indexes on literals. The current implementation can only deal with integers and atoms (string literals). A literal map maintains an ordered set of keys. The ordering uses the same rules as described in section 4.5. Each key is associated with an ordered set of values. Literal map objects can be shared between threads, using a locking strategy that allows for multiple concurrent readers.
Typically, this module is used together with rdf_monitor/2
on the channals
maintain an index of words that appear in a literal. Further abstraction
using Porter stemming or Metaphone can be used to create additional
search indices. These can map either directly to the literal values, or
indirectly to the plain word-map. The SWI-Prolog NLP package provides
complimentary building blocks, such as a tokenizer, Porter stem and
not(Key). If not-terms are provided, there must be at least one positive keywords. The negations are tested after establishing the positive matches.
provides reliable persistent storage for the RDF data. The store uses a
directory with files for each source (see rdf_source/1)
present in the database. Each source is represented by two files, one in
binary format (see rdf_save_db/2)
representing the base state and one represented as Prolog terms
representing the changes made since the base state. The latter is called
cpu_countor 1 (one) on systems where this number is unknown. See also concurrent/3.
true, supress loading messages from rdf_attach_db/2.
true, nested log transactions are added to the journal information. By default (
false), no log-term is added for nested transactions.
The database is locked against concurrent access using a file
lock in Directory. An attempt to attach to a
locked database raises a
permission_error exception. The
error context contains a term
rdf_locked(Args), where args
is a list containing
The error can be caught by the application. Otherwise it prints:
ERROR: No permission to lock rdf_db `/home/jan/src/pl/packages/semweb/DB' ERROR: locked at Wed Jun 27 15:37:35 2007 by process id 1748
false, the journal and snapshot for the database are deleted and further changes to triples associated with DB are not recorded. If Bool is
truea snapshot is created for the current state and further modifications are monitored. Switching persistency does not affect the triples in the in-memory RDF database.
min_size(KB)only journals larger than KB Kbytes are merged with the base state. Flushing a journal takes the following steps, ensuring a stable state can be recovered at any moment.
.newfile over the base state.
Note that journals are not merged automatically for two reasons. First of all, some applications may decide never to merge as the journal contains a complete changelog of the database. Second, merging large databases can be slow and the application may wish to schedule such actions at quiet times or scheduled maintenance periods.
The above predicates suffice for most applications. The predicates in
this section provide access to the journal files and the base state
files and are intented to provide additional services, such as reasoning
about the journals, loaded files, etc.3A
library(rdf_history) is under development
exploiting these features supporting wiki style editing of RDF.
rdf_transaction(Goal, log(Message)), we can add
additional records to enrich the journal of affected databases with Term
and some additional bookkeeping information. Such a transaction adds a
begin(Id, Nest, Time, Message) before the change operations
on each affected database and
end(Id, Nest, Affected) after
the change operations. Here is an example call and content of the
mydb.jrn. A full explanation of the terms that
appear in the journal is in the description of rdf_journal_file/2.
?- rdf_transaction(rdf_assert(s,p,o,mydb), log(by(jan))).
start([time(1183540570)]). begin(1, 0, 1183540570.36, by(jan)). assert(s, p, o). end(1, 0, ). end([time(1183540578)]).
rdf_transaction(Goal, log(Message, DB)), where DB
is an atom denoting a (possibly empty) named graph, the system
guarantees that a non-empty transaction will leave a possibly empty
transaction record in DB. This feature assumes named graphs are named
after the user making the changes. If a user action does not affect the
user's graph, such as deleting a triple from another graph, we still
find record of all actions performed by some user in the journal of that
log(Message). Id is an integer counting the logged transactions to this database. Numbers are increasing and designed for binary search within the journal file. Nest is the nesting level, where `0' is a toplevel transaction. Time is a time-stamp, currently using float notation with two fractional digits. Message is the term provided by the user as argument of the
log(Message). Id and Nest match the begin-term. Others gives a list of other databases affected by this transaction and the Id of these records. The terms in this list have the format DB:Id.
.trpfor the base state and
.jrnfor the journal.