3 The HTTP server libraries
The HTTP server library consists of two parts obligatory and one
optional part. The first deals with connection management and has three
different implementation depending on the desired type of server. The
second implements a generic wrapper for decoding the HTTP request,
calling user code to handle the request and encode the answer. The
optional http_dispatch module can be used to assign HTTP
locations (paths) to predicates. This design is summarised in
figure 1.
| Figure 1 : Design of the HTTP server |
The functional body of the user's code is independent from the selected server-type, making it easy to switch between the supported server types.
3.1 The `Body'
The server-body is the code that handles the request and formulates a
reply. To facilitate all mentioned setups, the body is driven by
http_wrapper/5.
The goal is called with the parsed request (see
section 3.8) as argument and current_output
set to a temporary buffer. Its task is closely related to the task of a
CGI script; it must write a header declaring holding at least the
Content-type field and a body. Here is a simple body
writing the request as an HTML table.
reply(Request) :-
format('Content-type: text/html~n~n', []),
format('<html>~n', []),
format('<table border=1>~n'),
print_request(Request),
format('~n</table>~n'),
format('</html>~n', []).
print_request([]).
print_request([H|T]) :-
H =.. [Name, Value],
format('<tr><td>~w<td>~w~n', [Name, Value]),
print_request(T).
The infrastructure recognises the header
Transfer-encoding: chunked, causing it to use chunked
encoding if the client allows for it. See also section
4 and the
chunked option in http_handler/3.
Other header lines are passed verbatim to the client. Typical examples
are Set-Cookie and authentication headers (see section
3.5.
3.1.1 Returning special status codes
Besides returning a page by writing it to the current output stream,
the server goal can raise an exception using throw/1
to generate special pages such as not_found, moved,
etc. The defined exceptions are:
- http_reply(+Reply, +HdrExtra)
- Return a result page using http_reply/3. See http_reply/3 for details.
- http_reply(+Reply)
- Equivalent to
http_reply(Reply,[]). - http(not_modified)
- Equivalent to
http_reply(not_modified,[]). This exception is for backward compatibility and can be used by the server to indicate the referenced resource has not been modified since it was requested last time.
3.2 Library http/http_dispatch -- Dispatch requests in the HTTP server
This module can be placed between http_wrapper.pl and
the application code to associate HTTP locations to predicates
that serve the pages. In addition, it associates parameters with
locations that deal with timeout handling and user authentication. The
typical setup is:
server(Port, Options) :-
http_server(http_dispatch,
[ port(Port),
| Options
]).
:- http_handler('/index.html', write_index, []).
write_index(Request) :-
...
- [det]http_handler(+Path, :Closure, +Options)
- Register Closure as a handler for HTTP requests. Path
is a specification as provided by
http_path.pl. If an HTTP request arrives at the server that matches Path, Closure is called with one extra argument: the parsed HTTP request. Options is a list containing the following options:- authentication(+Type)
- Demand authentication. Authentication methods are pluggable. The library
http_authenticate.plprovides a plugin for user/password basedBasicHTTP authentication. - chunked
- Use
Transfer-encoding: chunkedif the client allows for it. - content_type(+Term)
- Specifies the content-type of the reply. This value is currently not used by this library. It enhances the reflexive capabilities of this library through http_current_handler/3.
- id(+Term)
- Identifier of the handler. The default identifier is the predicate name. Used by http_location_by_id/2.
- hide_children(+Bool)
- If
trueon a prefix-handler (see prefix), possible children are masked. This can be used to (temporary) overrule part of the tree. - prefix
- Call Pred on any location that is a specialisation of Path. If multiple handlers match, the one with the longest path is used.
- priority(+Integer)
- If two handlers handle the same path, the one with the highest priority is used. If equal, the last registered is used. Please be aware that the order of clauses in multifile predicates can change due to reloading files. The default priority is 0 (zero).
- spawn(+SpawnOptions)
- Run the handler in a seperate thread. If SpawnOptions is an atom, it is interpreted as a thread pool name (see create_thread_pool/3). Otherwise the options are passed to http_spawn/2 and from there to thread_create/3. These options are typically used to set the stack limits.
- time_limit(+Spec)
- One of
infinite,defaultor a positive number (seconds)
Note that http_handler/3 is normally invoked as a directive and processed using term-expansion. Using term-expansion ensures proper update through make/0 when the specification is modified. We do not expand when the cross-referencer is running to ensure proper handling of the meta-call.
- Errors
- existence_error(http_location, Location)
- See also
- http_reply_file/3 and http_redirect/3 are generic handlers to serve files and achieve redirects.
- [det]http_delete_handler(+Spec)
- Delete handler for Spec. Typically, this should only be used
for handlers that are registered dynamically. Spec is one of:
- id(Id)
- Delete a handler with the given id. The default id is the handler-predicate-name.
- path(Path)
- Delete handler that serves the given path.
- [det]http_dispatch(Request)
- Dispatch a Request using http_handler/3 registrations.
- [semidet]http_current_handler(+Location, :Closure)
- [nondet]http_current_handler(-Location, :Closure)
- True if Location is handled by Closure.
- [semidet]http_current_handler(+Location, :Closure, -Options)
- [nondet]http_current_handler(?Location, :Closure, ?Options)
- Resolve the current handler and options to execute it.
- [det]http_location_by_id(+ID, -Location)
- Find the HTTP Location of handler with ID. If the
setting (see
setting/2) http:prefix is active, Location
is the handler location prefixed with the prefix setting. Handler IDs
can be specified in two ways:
- id(ID)
- If this appears in the option list of the handler, this it is used and takes preference over using the predicate.
- M
:PredName - The module-qualified name of the predicate.
- PredName
- The unqualified name of the predicate.
- Errors
- existence_error(http_handler_id, Id).
- http_link_to_id(+HandleID, +Parameters, -HREF)
- HREF is a link on the local server to a handler with given ID, passing the given Parameters.
- [det]http_reply_file(+FileSpec, +Options, +Request)
- Options is a list of
- cache(+Boolean)
- If
true(default), handle If-modified-since and send modification time. - mime_type(+Type)
- Overrule mime-type guessing from the filename as provided by file_mime_type/2.
- unsafe(+Boolean)
- If
false(default), validate that FileSpec does not contain references to parent directories. E.g., specifications such aswww('../../etc/passwd')are not allowed.
If caching is not disabled, it processed the request headers
If-modified-sinceandRange.- throws
- - http_reply(not_modified)
- http_reply(file(MimeType, Path))
- [det]http_safe_file(+FileSpec, +Options)
- True if FileSpec is considered safe. If it is an atom,
it cannot be absolute and cannot have references to parent directories.
If it is of the form alias(Sub), than Sub cannot have references to
parent directories.
- Errors
- - instantiation_error
- permission_error(read, file, FileSpec)
- [det]http_redirect(+How, +To, +Request)
- Redirect to a new location. The argument order, using the
Request as last argument, allows for calling this directly
from the handler declaration:
:- http_handler(root(.), http_redirect(moved, myapp('index.html')), []).How is one of moved,moved_temporaryorsee_alsoTo is an atom, a aliased path as defined by http_absolute_location/3. or a term location_by_id(Id). If To is not absolute, it is resolved relative to the current location.
3.3 Library http/http_dirindex -- HTTP directory listings
- To be done
- Provide more options (sorting, selecting columns, hiding files)
This module provides a simple API to generate an index for a physical directory. The index can be customised by overruling the dirindex.css CSS file and by defining additional rules for icons using the hook http:file_extension_icon/2.
- [det]http_reply_dirindex(+DirSpec, +Options, +Request)
- Provide a directory listing for Request, assuming it is an
index for the physical directrory Dir. If the request-path does not end
with /, first return a moved (301 Moved Permanently) reply.
The calling conventions allows for direct calling from http_handler/3.
3.4 Library http/http_session -- HTTP Session management
This library defines session management based on HTTP cookies. Session management is enabled simply by loading this module. Details can be modified using http_set_session_options/1. If sessions are enabled, http_session_id/1 produces the current session and http_session_assert/1 and friends maintain data about the session. If the session is reclaimed, all associated data is reclaimed too.
Begin and end of sessions can be monitored using library(broadcast). The broadcasted messages are:
- http_session(begin(SessionID,Peer))
- Broadcasted if a session is started
- http_session(end(SessionId,Peer))
- Broadcasted if a session is ended. See http_close_session/1.
For example, the following calls end_session(SessionId) whenever a session terminates. Please note that sessions ends are not scheduled to happen at the actual timeout moment of the session. Instead, creating a new session scans the active list for timed-out sessions. This may change in future versions of this library.
:- listen(http_session(end(SessionId, Peer)),
end_session(SessionId)).
- [det]http_set_session_options(+Options)
- Set options for the session library. Provided options are:
- timeout(+Seconds)
- Session timeout in seconds. Default is 600 (10 min).
- cookie(+Cookiekname)
- Name to use for the cookie to identify the session. Default
swipl_session. - path(+Path)
- Path to which the cookie is associated. Default is
/. Cookies are only sent if the HTTP request path is a refinement of Path. - route(+Route)
- Set the route name. Default is the unqualified hostname. To cancel adding a route, use the empty atom. See route/1.
- enabled(+Boolean)
- Enable/disable session management. Sesion management is enabled by default after loading this file.
- [det]http_session_id(-SessionId)
- True if SessionId is an identifier for the current session.
SessionId is an atom. - Errors
- existence_error(http_session, _)
- See also
- http_in_session/1 for a version that fails if there is no session.
- [semidet]http_in_session(-SessionId)
- True if SessionId is an identifier for the current session.
The current session is extracted from session(ID) from the current HTTP
request (see http_current_request/1).
The value is cached in a backtrackable global variable
http_session_id. Using a backtrackable global variable is safe because continuous worker threads use a failure driven look and spawned threads start without any global variables. This variable can be set from the commandline to fake running a goal from the commandline in the context of a session.- See also
- http_session_id/1
- [det]http_session_asserta(+Data)
- [det]http_session_assert(+Data)
- [nondet]http_session_retract(?Data)
- [det]http_session_retractall(?Data)
- Versions of assert/1, retract/1 and retractall/1 that associate data with the current HTTP session.
- [nondet]http_current_session(?SessionID, ?Data)
- Enumerate the current sessions and associated data. There are two Pseudo
data elements:
- idle(Seconds)
- Session has been idle for Seconds.
- peer(Peer)
- Peer of the connection.
- [det]http_close_session(+SessionID)
- Closes an HTTP session. This predicate can be called from any thread to
terminate a session. It uses the broadcast/1
service with the message below.
http_session(end(SessionId, Peer))
The broadcast is done before the session data is destroyed and the listen-handlers are executed in context of the session that is being closed. Here is an example that destroys a Prolog thread that is associated to a thread:
:- listen(http_session(end(SessionId, _Peer)), kill_session_thread(SessionID)). kill_session_thread(SessionID) :- http_session_data(thread(ThreadID)), thread_signal(ThreadID, throw(session_closed)).Succeed without any effect if SessionID does not refer to an active session.
- Errors
- type_error(atom, SessionID)
- See also
- listen/2 for acting upon closed sessions
3.5 HTTP Authentication
The module http/http_authenticate provides the basics to
validate an HTTP Authorization error. User and password
information are read from a Unix/Apache compatible password file. This
information, as well as the validation process is cached to achieve
optimal performance.
- http_authenticate(+Type, +Request, -User)
- rue if Request contains the information to continue according to Type.
Type identifies the required authentication technique:
- basic(+PasswordFile)
- Use HTTP
Basicauthentication and verify the password from PasswordFile. PasswordFile is a file holding usernames and passwords in a format compatible to Unix and Apache. Each line is record with:separated fields. The first field is the username and the second the password _hash_. Password hashes are validated using crypt/2.
Successful authorization is cached for 60 seconds to avoid overhead of decoding and lookup of the user and password data.
http_authenticate/3 just validates the header. If authorization is not provided the browser must be challenged, in response to which it normally opens a user-password dialogue. Example code realising this is below. The exception causes the HTTP wrapper code to generate an HTTP 401 reply.
..., ( http_authenticate(basic(passwd), Request, User) -> true ; throw(http_reply(authorise(basic, Realm))) ).Alternatively
basic(+PasswordFile)can be passed as an option to http_handler/3.
3.6 Library http/http_openid -- OpenID consumer and server library
This library implements the OpenID protocol (http://openid.net/). OpenID is a protocol to share identities on the network. The protocol itself uses simple basic HTTP, adding reliability using digitally signed messages.
Steps, as seen from the consumer (or relying partner).
- Show login form, asking for
openid_identifier - Get HTML page from
openid_identifierand lookup<link rel="openid.server" href="server"> - Associate to server
- Redirect browser (302) to server using mode
checkid_setup, asking to validate the given OpenID. - OpenID server redirects back, providing digitally signed conformation of the claimed identity.
- Validate signature and redirect to the target location.
A consumer (an application that allows OpenID login) typically uses this library through openid_user/3. In addition, it must implement the hook http_openid:openid_hook(trusted(OpenId, Server)) to define accepted OpenID servers. Typically, this hook is used to provide a white-list of aceptable servers. Note that accepting any OpenID server is possible, but anyone on the internet can setup a dummy OpenID server that simply grants and signs every request. Here is an example:
:- multifile http_openid:openid_hook/1.
http_openid:openid_hook(trusted(_, OpenIdServer)) :-
( trusted_server(OpenIdServer)
-> true
; throw(http_reply(moved_temporary('/openid/trustedservers')))
).
trusted_server('http://www.myopenid.com/server').
By default, information who is logged on is maintained with the session using http_session_assert/1 with the term openid(Identity). The hooks login/logout/logged_in can be used to provide alternative administration of logged-in users (e.g., based on client-IP, using cookies, etc.).
To create a server, you must do four things: bind the handlers
openid_server/2 and openid_grant/1
to HTTP locations, provide a user-page for registered users and define
the grant(Request, Options) hook to verify your users. An example server
is provided in in
<plbase>/doc/packages/examples/demo_openid.pl
- [det]openid_login(+OpenID)
- Associate the current HTTP session with OpenID. If another OpenID is already associated, this association is first removed.
- [det]openid_logout(+OpenID)
- Remove the association of the current session with any OpenID
- [semidet]openid_logged_in(-OpenID)
- True if session is associated with OpenID.
- [det]openid_user(+Request:http_request, -OpenID:url, +Options)
- True if OpenID is a validated OpenID associated
with the current session. The scenario for which this predicate is
designed is to allow an HTTP handler that requires a valid login to use
the transparent code below.
handler(Request) :- openid_user(Request, OpenID, []), ...If the user is not yet logged on a sequence of redirects will follow:
- Show a page for login (default: page /openid/login), predicate reply_openid_login/1)
- Redirect to OpenID server to validate
- Redirect to validation
Options:
- login_url(Login)
- (Local) URL of page to enter OpenID information. Default is
/openid/login.
- See also
- openid_authenticate/4 produces errors if login is invalid or cancelled.
- [det]openid_login_form(+ReturnTo,
+Options)
// - Create the OpenID form. This exported as a seperate DCG, allowing applications to redefine /openid/login and reuse this part of the page.
- openid_verify(+Options, +Request)
- Handle the initial login form presented to the user by the relying party
(consumer). This predicate discovers the OpenID server, associates
itself with this server and redirects the user's browser to the OpenID
server, providing the extra openid.X name-value pairs. Options
is, against the conventions, placed in front of the Request
to allow for smooth cooperation with
http_dispatch.pl.The OpenId server will redirect to the openid.return_to URL.
- throws
- http_reply(moved_temporary(Redirect))
- [nondet]openid_server(?OpenIDLogin, ?OpenID, ?Server)
- True if OpenIDLogin is the typed id for OpenID
verified by
Server.
OpenIDLogin ID as typed by user (canonized) OpenID ID as verified by server Server URL of the OpenID server - openid_current_host(Request, Host, Port)
- Find current location of the server.
- [semidet]openid_authenticate(+Request, -Server:url, -OpenID:url, -ReturnTo:url)
- Succeeds if Request comes from the OpenID server
and confirms that User is a verified OpenID user. ReturnTo
provides the URL to return to.
After openid_verify/2 has redirected the browser to the OpenID server, and the OpenID server did its magic, it redirects the browser back to this address. The work is fairly trivial. If
modeiscancel, the OpenId server denied. Ifid_res, the OpenId server replied positive, but we must verify what the server told us by checking the HMAC-SHA signature.This call fails silently if their is no
openid.modefield in the request.- throws
- - openid(cancel) if request was cancelled by the
OpenId server
- openid(signature_mismatch) if the HMAC signature check failed
- openid_server(+Options, +Request)
- Realise the OpenID server. The protocol demands a POST request here.
- openid_grant(+Request)
- Handle the reply from checkid_setup_server/3.
If the reply is
yes, check the authority (typically the password) and if all looks good redirect the browser to ReturnTo, adding the OpenID properties needed by the Relying Party to verify the login. - [det]openid_associate(+URL, -Handle, -Assoc)
- [semidet]openid_associate(?URL, +Handle, -Assoc)
- Associate with an open-id server. We first check for a still valid old
association. If there is none or it is expired, we esstablish one and
remember it.
- To be done
- Should we store known associations permanently? Where?
3.7 Get parameters from HTML forms
The library library(http/http_parameters) provides two
predicates to fetch HTTP request parameters as a type-checked list
easily. The library transparently handles both GET and POST requests. It
builds on top of the low-level request representation described in
section 3.8.
- http_parameters(+Request, ?Parameters)
- The predicate is passes the Request as provided to the
handler goal by http_wrapper/5
as well as a partially instantiated lists describing the requested
parameters and their types. Each parameter specification in Parameters
is a term of the format
Name(-Value, +Options) . Options
is a list of option terms describing the type, default, etc. If no
options are specified the parameter must be present and its value is
returned in
Value as an atom. If a parameter is missing the exception
error(is thrown. Options fall into three categories: those that handle presence of the parameter, those that guide conversion and restrict types and those that support automatic generation of documention. First, the presence-options:existence_error(form_data, Name), _)- default(Default)
- If the named parameter is missing, Value is unified to Default.
- optional(true)
- If the named parameter is missing, Value is left unbound and no error is generated.
- list(Type)
- The same parameter may not appear or appear multiple times. If this
option is present,
defaultandoptionalare ignored and the value is returned as a list. Type checking options are processed on each value. - zero_or_more
- Deprecated. Use
List(Type).
The type and conversion options are given below. The type-language can be extended by providing clauses for the multifile hook http:convert_parameter/3.
- ;(Type1, Type2)
- Succeed if either Type1 or Type2 applies. It
allows for checks such as
(nonneg;oneof([infinite]))to specify an integer or a symbolic value. - oneof(List)
- Succeeds if the value is member of the given list.
- length > N
- Succeeds if value is an atom of more than N characters.
- length >= N
- Succeeds if value is an atom of more or than equal to N characters.
- length < N
- Succeeds if value is an atom of less than N characters.
- length =< N
- Succeeds if value is an atom of length than or equal to N characters.
- atom
- No-op. Allowed for consistency.
- between(+Low, +High)
- Convert value to a number and if either Low or High is a float, force value to be a float. Then check that the value is in the given range, which includes the boundaries.
- boolean
- Translate =true=, =yes=, =on= and '1' into =true=; =false=, =no=, =off= and '0' into =false= and raises an error otherwise.
- float
- Convert value to a float. Integers are transformed into float. Throws a type-error otherwise.
- integer
- Convert value to an integer. Throws a type-error otherwise.
- nonneg
- Convert value to a non-negative integer. Throws a type-error of the value cannot be converted to an integer and a domain-error otherwise.
- number
- Convert value to a number. Throws a type-error otherwise.
The last set of options is to support automatic generation of HTTP API documentation from the sources.2This facility is under development in ClioPatria; see
http_help.pl.- description(+Atom)
- Description of the parameter in plain text.
- group(+Parameters, +Options)
- Define a logical group of parameters. Parameters are processed as normal. Options may include a description of the group. Groups can be nested.
Below is an example
reply(Request) :- http_parameters(Request, [ title(Title, [ optional(true) ]), name(Name, [ length >= 2 ]), age(Age, [ between(0, 150) ]) ]), ...Same as
http_parameters(Request, Parameters,[]) - http_parameters(+Request, ?Parameters, +Options)
- In addition to http_parameters/2,
the following options are defined.
- form_data(-Data)
- Return the entire set of provided Name=Value pairs from the GET or POST request. All values are returned as atoms.
- attribute_declarations(:Goal)
- If a parameter specification lacks the parameter options, call
call(Goal, +ParamName, -Options)to find the options. Intended to share declarations over many calls to http_parameters/3. Using this construct the above can be written as below.reply(Request) :- http_parameters(Request, [ title(Title), name(Name), age(Age) ], [ attribute_declarations(param) ]), ... param(title, [optional(true)]). param(name, [length >= 2 ]). param(age, [integer]).
3.8 Request format
The body-code (see section 3.1) is
driven by a Request. This request is generated from http_read_request/2
defined in
library(http/http_header).
- http_read_request(+Stream, -Request)
- Reads an HTTP request from Stream and unify Request
with the parsed request. Request is a list of
Name(Value)elements. It provides a number of predefined elements for the result of parsing the first line of the request, followed by the additional request parameters. The predefined fields are:- host(Host)
- If the request contains
Host:Host, Host is unified with the host-name. If Host is of the format <host>:<port> Host only describes <host> and a fieldport(Port)where Port is an integer is added. - input(Stream)
- The Stream is passed along, allowing to read more data or requests from the same stream. This field is always present.
- method(Method)
- Method is one of
get,putorpost. This field is present if the header has been parsed successfully. - path(Path)
- Path associated to the request. This field is always present.
- peer(Peer)
- Peer is a term
ip(A,B,C,D)containing the IP address of the contacting host. - port(Port)
- Port requested. See
hostfor details. - request_uri(RequestURI)
- This is the untranslated string that follows the method in the request header. It is used to construct the path and search fields of the Request. It is provided because reconstructing this string from the path and search fields may yield a different value due to different usage of percent encoding.
- search(ListOfNameValue)
- Search-specification of URI. This is the part after the
?, normally used to transfer data from HTML forms that use the `GET' protocol. In the URL it consists of a www-form-encoded list of Name=Value pairs. This is mapped to a list of Prolog Name=Value terms with decoded names and values. This field is only present if the location contains a search-specification. - http_version(Major-Minor)
- If the first line contains the
HTTP/Major.Minor version indicator this element indicate the HTTP version of the peer. Otherwise this field is not present. - cookie(ListOfNameValue)
- If the header contains a
Cookieline, the value of the cookie is broken down in Name=Value pairs, where the Name is the lowercase version of the cookie name as used for the HTTP fields. - set_cookie(set_cookie(Name, Value, Options))
- If the header contains a
SetCookieline, the cookie field is broken down into the Name of the cookie, the Value and a list of Name=Value pairs for additional options such asexpire,path,domainorsecure.
If the first line of the request is tagged with
HTTP/Major.Minor, http_read_request/2 reads all input upto the first blank line. This header consists of Name:Value fields. Each such field appears as a termName(Value)in the Request, where Name is canonised for use with Prolog. Canonisation implies that the Name is converted to lower case and all occurrences of the-are replaced by_. The value for theContent-lengthfields is translated into an integer.
Here is an example:
?- http_read_request(user, X).
|: GET /mydb?class=person HTTP/1.0
|: Host: gollem
|:
X = [ input(user),
method(get),
search([ class = person
]),
path('/mydb'),
http_version(1-0),
host(gollem)
].
3.8.1 Handling POST requests
Where the HTTP GET operation is intended to get a
document, using a path and possibly some additional search
information, the POST operation is intended to hand
potentially large amounts of data to the server for processing.
The Request parameter above contains the term method(post).
The data posted is left on the input stream that is available through
the term input(Stream) from the Request header.
This data can be read using http_read_data/3
from the HTTP client library. Here is a demo implementation simply
returning the parsed posted data as plain text (assuming pp/1
pretty-prints the data).
reply(Request) :-
member(method(post), Request), !,
http_read_data(Request, Data, []),
format('Content-type: text/plain~n~n', []),
pp(Data).
If the POST is initiated from a browser, content-type is generally
either application/x-www-form-urlencoded or
multipart/form-data. The latter is broken down
automatically if the plug-in library(http/http_mime_plugin)
is loaded.
3.9 Running the server
The functionality of the server should be defined in one Prolog file (of course this file is allowed to load other files). Depending on the wanted server setup this `body' is wrapped into a small Prolog file combining the body with the appropriate server interface. There are three supported server-setups. For most applications we advice the multi-threaded server. Examples of this server architecture are the PlDoc documentation system and the SeRQL Semantic Web server infrastructure.
All the server setups may be wrapped in a reverse proxy to make them available from the public web-server as described in section 3.9.7.
- Using
library(thread_httpd)for a multi-threaded server
This server exploits the multi-threaded version of SWI-Prolog, running the users body code parallel from a pool of worker threads. As it avoids the state engine and copying required in the event-driven server it is generally faster and capable to handle multiple requests concurrently.This server is harder to debug due to the involved threading, although the GUI tracer provides reasonable support for multi-threaded applications using the tspy/1 command. It can provide fast communication to multiple clients and can be used for more demanding servers.
- Using
library(xpce_httpd)for an event-driven server
This approach provides a single-threaded event-driven application. The clients talk to XPCE sockets that collect an HTTP request. The server infra-structure can talk to multiple clients simultaneously, but once a request is complete the wrappers call the user's goal and blocks all further activity until the request is handled. Requests from multiple clients are thus fully serialised in one Prolog process.This server setup is very suitable for debugging as well as embedded server in simple applications in a fairly controlled environment.
- Using
library(inetd_httpd)for server-per-client
In this setup the Unix inetd user-daemon is used to initialise a server for each connection. This approach is especially suitable for servers that have a limited startup-time. In this setup a crashing client does not influence other requests.This server is very hard to debug as the server is not connected to the user environment. It provides a robust implementation for servers that can be started quickly.
3.9.1 Common server interface options
All the server interfaces provide http_server(:Goal, +Options)
to create the server. The list of options differ, but the servers share
common options:
- port(?Port)
- Specify the port to listen to for stand-alone servers. Port is either an integer or unbound. If unbound, it is unified to the selected free port.
3.9.2 Multi-threaded Prolog
The library(http/thread_httpd.pl) provides the
infrastructure to manage multiple clients using a pool of worker-threads.
This realises a popular server design, also seen in Java Tomcat and
Microsoft .NET. As a single persistent server process maintains
communication to all clients startup time is not an important issue and
the server can easily maintain state-information for all clients.
In addition to the functionality provided by the other (XPCE and
inetd) servers, the threaded server can also be used to realise an HTTPS
server exploiting the library(ssl) library. See option
ssl(+SSLOptions) below.
- http_server(:Goal, +Options)
- Create the server. Options must provide the
port(?Port)option to specify the port the server should listen to. If Port is unbound an arbitrary free port is selected and Port is unified to this port-number. The server consists of a small Prolog thread accepting new connection on Port and dispatching these to a pool of workers. Defined Options are:- port(?Port)
- Port the server should listen to. If unbound Port is unified with the selected free port.
- workers(+N)
- Defines the number of worker threads in the pool. Default is to use two workers. Choosing the optimal value for best performance is a difficult task depending on the number of CPUs in your system and how much resources are required for processing a request. Too high numbers makes your system switch too often between threads or even swap if there is not enough memory to keep all threads in memory, while a too low number causes clients to wait unnecessary for other clients to complete. See also http_workers/2.
- timeout(+SecondsOrInfinite)
- Determines the maximum period of inactivity handling a request. If no
data arrives within the specified time since the last data arrived the
connection raises an exception, the worker discards the client and
returns to the pool-queue for a new client. Default is
infinite, making each worker wait forever for a request to complete. Without a timeout, a worker may wait forever on an a client that doesn't complete its request. - keep_alive_timeout(+SecondsOrInfinite)
- Maximum time to wait for new activity on Keep-Alive connections. Choosing the correct value for this parameter is hard. Disabling Keep-Alive is bad for performance if the clients request multiple documents for a single page. This may ---for example-- be caused by HTML frames, HTML pages with images, associated CSS files, etc. Keeping a connection open in the threaded model however prevents the thread servicing the client servicing other clients. The default is 5 seconds.
- local(+KBytes)
- Size of the local-stack for the workers. Default is taken from the commandline option.
- global(+KBytes)
- Size of the global-stack for the workers. Default is taken from the commandline option.
- trail(+KBytes)
- Size of the trail-stack for the workers. Default is taken from the commandline option.
- ssl(+SSLOptions)
- Use SSL (Secure Socket Layer) rather than plan TCP/IP. A server created
this way is accessed using the
https://protocol. SSL allows for encrypted communication to avoid others from tapping the wire as well as improved authentication of client and server. The SSLOptions option list is passed to ssl_init/3. The port option of the main option list is forwarded to the SSL layer. See thelibrary(ssl)library for details.
- http_server_property(?Port, ?Property)
- True if Property is a property of the HTTP server running at
Port. Defined properties are:
- goal(:Goal)
- Goal used to start the server. This is often http_dispatch/1.
- start_time(?Time)
- Time-stamp when the server was created. See format_time/3 for creating a human-readable representation.
- http_workers(:Port, ?Workers)
- Query or manipulate the number of workers of the server identified by
Port. If Workers is unbound it is unified with the
number of running servers. If it is an integer greater than the current
size of the worker pool new workers are created with the same
specification as the running workers. If the number is less than the
current size of the worker pool, this predicate inserts a number of
`quit' requests in the queue, discarding the excess workers as they
finish their jobs (i.e. no worker is abandoned while serving a client).
This can be used to tune the number of workers for performance. Another possible application is to reduce the pool to one worker to facilitate easier debugging.
- http_stop_server(+Port, +Options)
- Stop the HTTP server at Port. Halting a server is done gracefully, which means that requests being processed are not abandoned. The Options list is for future refinements of this predicate such as a forced immediate abort of the server, but is currently ignored.
- http_current_worker(?Port, ?ThreadID)
- True if ThreadID is the identifier of a Prolog thread serving Port. This predicate is motivated to allow for the use of arbitrary interaction with the worker thread for development and statistics.
- http_spawn(:Goal, +Spec)
- Continue handling this request in a new thread running Goal.
After
http_spawn/2,
the worker returns to the pool to process new requests. In its simplest
form, Spec is the name of a thread pool as defined by
thread_pool_create/3.
Alternatively it is an option list, whose options are passed to thread_create_in_pool/4
if Spec contains
pool(Pool)or to thread_create/3 of the pool option is not present. If the dispatch module is used (see section 3.2), spawning is normally specified as an option to the http_handler/3 registration.We recomment the use of thread pools. They allow registration of a set of threads using common characteristics, specify how many can be active and what to do if all threads are active. A typical application may define a small pool of threads with large stacks for computation intensive tasks, and a large pool of threads with small stacks to serve media. The declaration could be the one below, allowing for max 3 concurrent solvers and a maximum backlog of 5 and 30 tasks creating image thumbnails.
:- use_module(library(thread_pool)). :- thread_pool_create(compute, 3, [ local(20000), global(100000), trail(50000), backlog(5) ]). :- thread_pool_create(media, 30, [ local(100), global(100), trail(100), backlog(100) ]). :- http_handler('/solve', solve, [spawn(compute)]). :- http_handler('/thumbnail', thumbnail, [spawn(media)]).
3.9.3 From an interactive Prolog session using XPCE
The library(http/xpce_httpd.pl) provides the
infrastructure to manage multiple clients with an event-driven
control-structure. This version can be started from an interactive
Prolog session, providing a comfortable infra-structure to debug the
body of your server. It also allows the combination of an (XPCE-based)
GUI with web-technology in one application.
- http_server(:Goal, +Options)
- Create an instance of interactive_httpd. Options must
provide the
port(?Port)option to specify the port the server should listen to. If Port is unbound an arbitrary free port is selected and Port is unified to this port-number. Currently no options are defined.
The file demo_xpce gives a typical example of this
wrapper, assuming demo_body defines the predicate reply/1.
:- use_module(xpce_httpd).
:- use_module(demo_body).
server(Port) :-
http_server(reply, Port, []).
The created server opens a server socket at the selected address and waits for incoming connections. On each accepted connection it collects input until an HTTP request is complete. Then it opens an input stream on the collected data and using the output stream directed to the XPCE socket it calls http_wrapper/5. This approach is fundamentally different compared to the other approaches:
- Server can handle multiple connections
When inetd will start a server for each client, and CGI starts a server for each request, this approach starts a single server handling multiple clients. - Requests are serialised
All calls to Goal are fully serialised, processing on behalf of a new client can only start after all previous requests are answered. This easier and quite acceptable if the server is mostly inactive and requests take not very long to process. - Lifetime of the server
The server lives as long as Prolog runs.
3.9.4 From (Unix) inetd
All modern Unix systems handle a large number of the services they
run through the super-server inetd. This program reads
/etc/inetd.conf and opens server-sockets on all ports
defined in this file. As a request comes in it accepts it and starts the
associated server such that standard I/O refers to the socket. This
approach has several advantages:
- Simplification of servers
Servers don't have to know about sockets and -operations. - Centralised authorisation
Using tcpwrappers simple and effective firewalling of all services is realised. - Automatic start and monitor
The inetd automatically starts the server `just-in-time' and starts additional servers or restarts a crashed server according to the specifications.
The very small generic script for handling inetd based connections is
in inetd_httpd, defining http_server/1:
- http_server(:Goal, +Options)
- Initialises and runs http_wrapper/5 in a loop until failure or end-of-file. This server does not support the Port option as the port is specified with the inetd configuration. The only supported option is After.
Here is the example from demo_inetd
#!/usr/bin/pl -t main -q -f
:- use_module(demo_body).
:- use_module(inetd_httpd).
main :-
http_server(reply).
With the above file installed in /home/jan/plhttp/demo_inetd,
the following line in /etc/inetd enables the server at port
4001 guarded by tcpwrappers. After modifying inetd, send the
daemon the HUP signal to make it reload its configuration.
For more information, please check inetd.conf(5).
4001 stream tcp nowait nobody /usr/sbin/tcpd /home/jan/plhttp/demo_inetd
3.9.5 MS-Windows
There are rumours that inetd has been ported to Windows.
3.9.6 As CGI script
To be done.
3.9.7 Using a reverse proxy
There are three options for public deployment of a service. One is to run it on a dedicated machine on port 80, the standard HTTP port. The machine may be a virtual machine running ---for example--- under VMWARE or XEN. The (virtual) machine approach isolates security threads and allows for using a standard port. The server can also be hosted on a non-standard port such as 8000, or 8080. Using non-standard ports however may cause problems with intermediate proxy- and/or firewall policies. Isolation can be achieved using a Unix chroot environment. Another option, also recommended for Tomcat servers, is the use of Apache reverse proxies. This causes the main web-server to relay requests below a given URL location to our Prolog based server. This approach has several advantages:
- We can access the server on port 80, just as for a dedicated machine. We do not need a machine though and we only need access to the Apache configuration.
- As Apache is doing the front-line service, the Prolog server is normally protected from malformed HTTP requests that could result in denial of service or otherwise compromise the server. In addition, Apache can provide encodings such as compression to the outside world.
Note that the proxy technology can be combined with isolation methods such as dedicated machines, virtual machines and chroot jails. The proxy can also provide load balancing.
Setting up a reverse proxy
The Apache reverse proxy setup is really simple. Ensure the modules
proxy and proxy_http are loaded. Then add two
simple rules to the server configuration. Below is an example that makes
a PlDoc server on port 4000 available from the main Apache server at
port 80.
ProxyPass /pldoc/ http://localhost:4000/pldoc/ ProxyPassReverse /pldoc/ http://localhost:4000/pldoc/
Apache rewrites the HTTP headers passing by, but using the above
rules it does not examine the content. This implies that URLs embedded
in the (HTML) content must use relative addressing. If the locations on
the public and Prolog server are the same (as in the example above) it
is allowed to use absolute locations. I.e. /pldoc/search is
ok, but http://myhost.com:4000/pldoc/search is not.
If the locations on the server differ, locations must be relative (i.e. not
start with /.
This problem can also be solved using the contributed Apache module
proxy_html that can be instructed to rewrite URLs embedded
in HTML documents. In our experience, this is not troublefree as URLs
can appear in many places in generated documents. JavaScript can create
URLs on the fly, which makes rewriting virtually impossible.
3.10 The wrapper library
The body is called by the module library(http/http_wrapper.pl).
This module realises the communication between the I/O streams and the
body described in section 3.1. The
interface is realised by
http_wrapper/5:
- http_wrapper(:Goal, +In, +Out, -Connection, +Options)
- Handle an HTTP request where In is an input stream from the
client, Out is an output stream to the client and Goal
defines the goal realising the body. Connection is unified to
'Keep-alive'if both ends of the connection want to continue the connection orcloseif either side wishes to close the connection.This predicate reads an HTTP request-header from In, redirects current output to a memory file and then runs
call(Goal, Request), watching for exceptions and failure. If Goal executes successfully it generates a complete reply from the created output. Otherwise it generates an HTTP server error with additional context information derived from the exception.http_wrapper/5 supports the following options:
- request(-Request)
- Return the executed request to the caller.
- peer(+Peer)
- Add peer(Peer) to the request header handed to Goal. The format of Peer is defined by tcp_accept/3 from the clib package.
- http:request_expansion(+RequestIn, -RequestOut)
- This multifile hook predicate is called just before the goal
that produces the body, while the output is already redirected to
collect the reply. If it succeeds it must return a valid modified
request. It is allowed to throw exceptions as defined in
section 3.1.1. It is intended for
operations such as mapping paths, deny access for certain requests or
manage cookies. If it writes output, these must be HTTP header fields
that are added before header fields written by the body. The
example below is from the session management library (see section
3.4) sets a cookie.
..., format('Set-Cookie: ~w=~w; path=~w~n', [Cookie, SessionID, Path]), ..., - http_current_request(-Request)
- Get access to the currently executing request. Request is the same as handed to Goal of http_wrapper/5 after applying rewrite rules as defined by http:request_expansion/2. Raises an existence error if there is no request in progress.
- http_relative_path(+AbsPath, -RelPath)
- Convert an absolute path (without host, fragment or search) into a path
relative to the current page, defined as the path component from the
current request (see http_current_request/1).
This call is intended to create reusable components returning relative
paths for easier support of reverse proxies.
If ---for whatever reason--- the conversion is not possible it simply unifies RelPath to AbsPath.
3.11 Library http/http_host -- Obtain public server location
This library finds the public address of the running server. This can
be used to construct URLs that are visible from anywhere on the
internet. This module was introduced to deal with OpenID, where a reques
is redirected to the OpenID server, which in turn redirects to our
server (see http_openid.pl).
The address is established from the settings http:public_host and http:public_port if provided. Otherwise it is deduced from the request.
- [det]http_current_host(+Request, -Hostname, -Port, Options)
- Current global host and port of the HTTP server. This is the basis to
form absolute address, which we need for redirection based interaction
such as the OpenID protocol. Options are:
- global(+Bool)
- If
true(defaultfalse), try to replace a local hostname by a world-wide accessible name.
3.12 Library http/http_log -- HTTP Logging module
Simple module for logging HTTP requests to a file. Logging is enabled
by loading this file and ensure the setting http:logfile is not the
empty atom. The default file for writing the log is httpd.log.
See library(settings) for details.
The level of logging can modified using the multifile predicate
http_log:nolog/1 to hide HTTP request
fields from the logfile and
http_log:password_field/1 to hide
passwords from HTTP search specifications (e.g. /topsecret?password=secret).
- [semidet]http_log_stream(-Stream)
- Returns handle to open logfile. Fails if no logfile is open and none is defined.
- [det]http_log_close(+Reason)
- If there is a currently open HTTP logfile, close it after adding a term
server(Reason, Time). to the logfile. This call is intended
for cooperation with the Unix logrotate facility using the following
schema:
- Move logfile (the HTTP server keeps writing to the moved file)
- Inform the server using an HTTP request that calls http_log_close/1
- Compress the moved logfile
- author
- Suggested by Jacco van Ossenbruggen
- [det]http_log(+Format, +Args)
- Write message from Format and Args to log-stream. See format/2 for details. Succeed without side effects if logging is not enabled.
3.13 Debugging Servers
The library library(http/http_error.pl) defines a hook
that decorates uncaught exceptions with a stack-trace. This will
generate a 500 internal server error document with a
stack-trace. To enable this feature, simply load this library. Please do
note that providing error information to the user simplifies the job of
a hacker trying to compromise your server. It is therefore not
recommended to load this file by default.
The example program calc.pl has the error handler loaded
which can be triggered by forcing a divide-by-zero in the calculator.
3.14 Handling HTTP headers
The library library(http/http_header) provides
primitives for parsing and composing HTTP headers. Its functionality is
normally hidden by the other parts of the HTTP server and client
libraries. We provide a brief overview of http_reply/3
which can be accessed from the reply body using an exception as explain
in section 3.1.1.
- http_reply(+Type, +Stream, +HdrExtra)
- Compose a complete HTTP reply from the term Type using
additional headers from HdrExtra to the output stream Stream.
ExtraHeader is a list of
Field(Value). Type is one of:- html(+HTML)
- Produce a HTML page using print_html/1,
normally generated using the
library(http/html_write)described in section 3.15. - file(+MimeType, +Path)
- Reply the content of the given file, indicating the given MIME type.
- tmp_file(+MimeType, +Path)
- Similar to
File(+MimeType, +Path), but do not include a modification time header. - stream(+Stream, +Len)
- Reply using the next Len characters from Stream. The user must provides the MIME type and other attributes through the ExtraHeader argument.
- cgi_stream(+Stream, +Len)
- Similar to
stream(+Stream, +Len), but the data on Stream must contain an HTTP header. - moved(+URL)
- Generate a ``301 Moved Permanently'' page with the given target URL.
- moved_temporary(+URL)
- Generate a ``302 Moved Temporary'' page with the given target URL.
- see_other(+URL)
- Generate a ``303 See Other'' page with the given target URL.
- not_found(+URL)
- Generate a ``404 Not Found'' page.
- forbidden(+URL)
- Generate a ``403 Forbidden'' page, denying access without challenging the client.
- authorise(+Method, +Realm)
- Generate a ``401 Authorization Required'', requesting the client to retry using proper credentials (i.e. user and password).
- not_modified
- Generate a ``304 Not Modified'' page, indicating the requested resource has not changed since the indicated time.
- server_error(+Error)
- Generate a ``500 Internal server error'' page with a message generated from a Prolog exception term (see print_message/2).
3.15 The library(http/html_write)
library
Producing output for the web in the form of an HTML document is a requirement for many Prolog programs. Just using format/2 is satisfactory as it leads to poorly readable programs generating poor HTML. This library is based on using DCG rules.
The library(http/html_write) structures the generation
of HTML from a program. It is an extensible library, providing a DCG
framework for generating legal HTML under (Prolog) program control. It
is especially useful for the generation of structured pages (e.g. tables)
from Prolog data structures.
The normal way to use this library is through the DCG html//1. This non-terminal provides the central translation from a structured term with embedded calls to additional translation rules to a list of atoms that can then be printed using print_html/[1,2].
- html(:Spec)
// - The DCG non-terminal html//1 is the main predicate of this library. It
translates the specification for an HTML page into a list of atoms that
can be written to a stream using print_html/[1,2].
The expansion rules of this predicate may be extended by defining the
multifile DCG html_write:expand//1. Spec is either a single
specification or a list of single specifications. Using nested lists is
not allowed to avoid ambiguity caused by the atom
- Atomic data
Atomic data is quoted using html_quoted//1. - Fmt - Args
Fmt and Args are used as format-specification and argument list to format/3. The result is quoted and added to the output list. \List
Escape sequence to add atoms directly to the output list. This can be used to embed external HTML code or emit script output. List is a list of the following terms:- Fmt - Args
Fmt and Args are used as format-specification and argument list to format/3. The result is added to the output list. - Atomic
Atomic values are added directly to the output list.
- Fmt - Args
\Term
Invoke the non-terminal Term in the calling module. This is the common mechanism to realise abstraction and modularisation in generating HTML.- Module:Term
Invoke the non-terminal <Module>:<Term>. This is similar to\Term but allows for invoking grammar rules in external packages. - &(Entity)
Emit &<Entity>; or &#<Entity>; if Entity is an integer. SWI-Prolog atoms and strings are represented as Unicode. Explicit use of this construct is rarely needed because code-points that are not supported by the output encoding are automatically converted into character-entities. Tag(Content)
Emit HTML element Tag using Content and no attributes. Content is handed to html//1. See section 3.15.4 for details on the automatically generated layout.Tag(Attributes, Content)
Emit HTML element Tag using Attributes and Content. Attributes is either a single attribute of a list of attributes. Each attributes is of the formatName(Value)or Name=Value. Value is the atomic attribute value but allows for a limited functional notation:- A + B
Concatenation of A and B encode(Atom)
Use www_form_encode/2 to create a valid URL component.location_by_id(ID)
HTTP location of the HTTP handler with given ID. See http_location_by_id/2.- List
A list is handled as a URL `search' component. The list members are terms of the format Name = Value orName(Value). Values are encoded as in the encode option described above.
The example below generates a URL that references the predicate set_lang/1 in the application with given parameters. The http_handler/3 declaration binds
/setlangto the predicate set_lang/1 for which we provide a very simple implementation. The code between...is part of an HTML page showing the english flag which, when pressed, callsset_lang(Request)where Request contains the search parameterlang=en. Note that the HTTP location (path)/setlangcan be moved without affecting this code.:- http_handler('/setlang', set_lang, []). set_lang(Request) :- http_parameters(Request, [ lang(Lang, []) ]), http_session_retractall(lang(_)), http_session_assert(lang(Lang)), reply_html_page(title('Switched language'), p(['Switch language to ', Lang])). ... html(a(href(location_by_id(set_lang) + [lang(en)]), img(src('/www/images/flags/en.png')))), ...- A + B
- Atomic data
- page(:HeadContent,
:BodyContent)
// - The DCG non-terminal page//2 generated a complete page, including the
SGML
DOCTYPEdeclaration. HeadContent are elements to be placed in theheadelement and BodyContent are elements to be placed in thebodyelement.To achieve common style (background, page header and footer), it is possible to define DCG non-terminals head//1 and/or body//1. Non-terminal page//1 checks for the definition of these non-terminals in the module it is called from as well as in the
usermodule. If no definition is found, it creates a head with only the HeadContent (note that thetitleis obligatory) and abodywithbgcolorset towhiteand the provided BodyContent.Note that further customisation is easily achieved using html//1 directly as page//2 is (besides handling the hooks) defined as:
page(Head, Body) --> html([ \['<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 4.0//EN">\n'], html([ head(Head), body(bgcolor(white), Body) ]) ]). - page(:Contents)
// - This version of the page/[1,2]
only gives you the SGML
DOCTYPEand theHTMLelement. Contents is used to generate both the head and body of the page. - html_begin(+Begin)
// - Just open the given element. Begin is either an atom or a
compound term, In the latter case the arguments are used as arguments to
the begin-tag. Some examples:
html_begin(table) html_begin(table(border(2), align(center)))This predicate provides an alternative to using the
\Command syntax in the html//1 specification. The following two fragments are the same. The preferred solution depends on your preferences as well as whether the specification is generated or entered by the programmer.table(Rows) --> html(table([border(1), align(center), width('80%')], [ \table_header, \table_rows(Rows) ])). % or table(Rows) --> html_begin(table(border(1), align(center), width('80%'))), table_header, table_rows, html_end(table). - html_end(+End)
// - End an element. See html_begin/1 for details.
3.15.1 Emitting HTML documents
The non-terminal html//1 translates a specification into a list of
atoms and layout instructions. Currently the layout instructions are
terms of the format nl(N), requesting at least N
newlines. Multiple consecutive nl(1) terms are combined to
an atom containing the maximum of the requested number of newline
characters.
To simplify handing the data to a client or storing it into a file, the following predicates are available from this library:
- reply_html_page(:Head, :Body)
- Same as
reply_html_page(default, Head, Body). - reply_html_page(+Style, :Head, :Body)
- Writes an HTML page preceded by an HTTP header as required by
library(http_wrapper)(CGI-style). Here is a simple typical example:reply(Request) :- reply_html_page(title('Welcome'), [ h1('Welcome'), p('Welcome to our ...') ]).The header and footer of the page can be hooked using the grammar-rules user:head//2 and user:body//2. The first argument passed to these hooks is the Style argument of reply_html_page/3 and the second is the 2nd (for head//2) or 3rd (for body//2) argument of reply_html_page/3. These hooks can be used to restyle the page, typically by embedding the real body content in a
div. E.g., the following code provides a menu on top of each page of that is identified using the style myapp.:- multifile user:body//2. user:body(myapp, Body) --> html(body([ div(id(top), \application_menu), div(id(content), Body) ])).Redefining the
headcan be used to pull in scripts, but typically html_requires//1 provides a more modular approach for pulling scripts and CSS-files. - print_html(+List)
- Print the token list to the Prolog current output stream.
- print_html(+Stream, +List)
- Print the token list to the specified output stream
- html_print_length(+List, -Length)
- When calling html_print/[1,2]
on List, Length characters will be produced.
Knowing the length is needed to provide the
Content-lengthfield of an HTTP reply-header.
3.15.2 Repositioning HTML for CSS and javascript links
Modern HTML commonly uses CSS and Javascript. This requires <link> elements in the HTML <head> element or <script> elements in the <body>. Unfortunately this seriously harms re-using HTML DCG rules as components as each of these components may rely on their own style sheets or JavaScript code. We added a `mailing' system to reposition and collect fragments of HTML. This is implemented by html_post/4, html_receive/3 and html_receive/4.
- [det]html_post(+Id,
:HTML)
// - Reposition HTML to the receiving Id. The http_post/4
call processes HTML using html/3.
Embedded
\-commands are executed by mailman/1 from print_html/1 or html_print_length/2. These commands are called in the calling context of the html_post/4 call.A typical usage scenario is to get required CSS links in the document head in a reusable fashion. First, we define css/3 as:
css(URL) --> html_post(css, link([ type('text/css'), rel('stylesheet'), href(URL) ])).Next we insert the unique CSS links, in the pagehead using the following call to reply_html_page/2:
reply_html_page([ title(...), \html_receive(css) ], ...) - [det]html_receive(+Id)
// - Receive posted HTML tokens. Unique sequences of tokens posted with html_post/4
are inserted at the location where
html_receive/3 appears.
- See also
- - The local predicate sorted_html/3
handles the output of
html_receive/3.
- html_receive/4 allows for post-processing the posted material.
- [det]html_receive(+Id,
:Handler)
// - This extended version of html_receive/3
causes Handler to be called to process all messages posted to
the channal at the time output is generated. Handler is a
grammar rule that is called with three extra arguments.
- A list of Module:Term, of posted terms. Module is the contest module of html_post and Term is the unmodified term. Members are in the order posted and may contain duplicates.
- DCG input list. The final output must be produced by a call to html/3.
- DCG output list.
Typically, Handler collects the posted terms, creating a term suitable for html/3 and finally calls html/3.
The library predefines the receiver channel head at the
end of the
head element for all pages that write the html head
through this library. The following code can be used anywhere inside an
HTML generating rule to demand a javascript in the header:
js_script(URL) -->
html_post(head, script([ src(URL),
type('text/javascript')
], [])).
This mechanism is also exploited to add XML namespace (xmlns)
declarations to the (outer) html element using xhml_ns/4:
- xhtml_ns(Id,
Value)
// - Demand an xmlns:id=Value in the outer html tag. This uses the
html_post/2 mechanism to post to
the
xmlnschannel. Rdfa (http://www.w3.org/2006/07/SWD/RDFa/syntax/), embedding RDF in (x)html provides a typical usage scenario where we want to publish the required namespaces in the header. We can define:rdf_ns(Id) --> { rdf_global_id(Id:'', Value) }, xhtml_ns(Id, Value).After which we can use rdf_ns/3 as a normal rule in html/3 to publish namespaces from library(semweb/rdf_db). Note that this macro only has effect if the dialect is set to
xhtml. Inhtmlmode it is silently ignored.The required
xmlnsreceiver is installed by html_begin/3 using thehtmltag and thus is present in any document that opens the outerhtmlenvironment through this library.
3.15.3 Adding rules for html//1
In some cases it is practical to extend the translations imposed by
html//1. When using XPCE for example, it is comfortable to be able
defining default translation to HTML for objects. We also used this
technique to define translation rules for the output of the SWI-Prolog
library(sgml) package.
The html//1 non-terminal first calls the multifile ruleset html_write:expand//1.
- html_write:expand(+Spec)
// - Hook to add additional translation rules for html//1.
- html_quoted(+Atom)
// - Emit the text in Atom, inserting entity-references for the
SGML special characters
<&>. - html_quoted_attribute(+Atom)
// - Emit the text in Atom suitable for use as an SGML attribute,
inserting entity-references for the SGML special characters
<&>".
3.15.4 Generating layout
Though not strictly necessary, the library attempts to generate reasonable layout in SGML output. It does this only by inserting newlines before and after tags. It does this on the basis of the multifile predicate html_write:layout/3
- html_write:layout(+Tag, -Open, -Close)
- Specify the layout conventions for the element Tag, which is
a lowercase atom. Open is a term Pre-Post.
It defines that the element should have at least Pre newline
characters before and Post after the tag. The Close
specification is similar, but in addition allows for the atom
-, requesting the output generator to omit the close-tag altogether orempty, telling the library that the element has declared empty content. In this case the close-tag is not emitted either, but in addition html//1 interprets Arg inTag(Arg)as a list of attributes rather than the content.A tag that does not appear in this table is emitted without additional layout. See also print_html/[1,2]. Please consult the library source for examples.
3.15.5 Examples
In the following example we will generate a table of Prolog predicates we find from the SWI-Prolog help system based on a keyword. The primary database is defined by the predicate predicate/5 We will make hyperlinks for the predicates pointing to their documentation.
html_apropos(Kwd) :-
findall(Pred, apropos_predicate(Kwd, Pred), Matches),
phrase(apropos_page(Kwd, Matches), Tokens),
print_html(Tokens).
% emit page with title, header and table of matches
apropos_page(Kwd, Matches) -->
page([ title(['Predicates for ', Kwd])
],
[ h2(align(center),
['Predicates for ', Kwd]),
table([ align(center),
border(1),
width('80%')
],
[ tr([ th('Predicate'),
th('Summary')
])
| \apropos_rows(Matches)
])
]).
% emit the rows for the body of the table.
apropos_rows([]) -->
[].
apropos_rows([pred(Name, Arity, Summary)|T]) -->
html([ tr([ td(\predref(Name/Arity)),
td(em(Summary))
])
]),
apropos_rows(T).
% predref(Name/Arity)
%
% Emit Name/Arity as a hyperlink to
%
% /cgi-bin/plman?name=Name&arity=Arity
%
% we must do form-encoding for the name as it may contain illegal
% characters. www_form_encode/2 is defined in library(url).
predref(Name/Arity) -->
{ www_form_encode(Name, Encoded),
sformat(Href, '/cgi-bin/plman?name=~w&arity=~w',
[Encoded, Arity])
},
html(a(href(Href), [Name, /, Arity])).
% Find predicates from a keyword. '$apropos_match' is an internal
% undocumented predicate.
apropos_predicate(Pattern, pred(Name, Arity, Summary)) :-
predicate(Name, Arity, Summary, _, _),
( '$apropos_match'(Pattern, Name)
-> true
; '$apropos_match'(Pattern, Summary)
).
3.15.6 Remarks
on the library(http/html_write) library
This library is the result of various attempts to reach at a more satisfactory and Prolog-minded way to produce HTML text from a program. We have been using Prolog for the generation of web pages in a number of projects. Just using format/2 never was a real option, generating error-prone HTML from clumsy syntax. We started with a layer on top of format/2, keeping track of the current nesting and thus always capable of properly closing the environment.
DCG based translation however naturally exploits Prolog's term-rewriting primitives. If generation fails for whatever reason it is easy to produce an alternative document (for example holding an error message).
The approach presented in this library has been used in combination
with
library(http/httpd) in three projects: viewing RDF in a
browser, selecting fragments from an analysed document and presenting
parts of the XPCE documentation using a browser. It has proven to be
able to deal with generating pages quickly and comfortably.
In a future version we will probably define a goal_expansion/2
to do compile-time optimisation of the library. Quotation of known text
and invocation of sub-rules using the \RuleSet
and
<Module>:<RuleSet> operators are
costly operations in the analysis that can be done at compile-time.
3.16 Library http/js_write -- Utilities for including javascript
This library is a supplement to library(http/html_write) for producing JavaScript fragments. Its main role is to be able to call JavaScript functions with valid arguments constructed from Prolog data. E.g. suppose you want to call a JavaScript functions to process a list of names represented as Prolog atoms. This can be done using the call below, while without this library you would have to be careful to properly escape special characters.
numbers_script(Names) -->
html(script(type('text/javascript'),
[ \js_call('ProcessNumbers'(Names)
]),
The accepted arguments are described with js_args/3.
- [det]js_call(+Term)
// - Emit a call to a Javascript function. The Prolog functor is the name of
the function. The arguments are converted from Prolog to JavaScript
using js_args/3. Please not that Prolog
functors can be quoted atom and thus the following is legal:
... html(script(type('text/javascript'), [ \js_call('x.y.z'(hello, 42) ]), - [det]js_new(+Id,
+Term)
// - Emit a call to a Javascript object declaration. This is the same as:
['var ', Id, ' = new ', \js_call(Term)]
- [det]js_args(+Args:list)
// - Write javascript function arguments. Each argument is separated by a
comma. Elements of the list may contain the following terms:
- Variable
- Emitted as Javascript
null - List
- Produces a Javascript list, where each element is processed by this library.
- object(Attributes)
- Where Attributes is a Key-Value list where each pair can be written as Key-Value, Key=Value or Key(Value), accomodating all common constructs for this used in Prolog.
- json(Term)
- Emits a term using json_write/3.
- @(true), @(false), @(null)
- Emits these constants without quotes.
- Number
- Emited literally
- symbol(Atom)
- Emitted without quotes. Can be used for JavaScript symbols (e.i., function and variable-names)
- Atom or String
- Emitted as quoted JavaScript string.
3.17 Library http/http_path -- Abstract specification of HTTP server locations
- To be done
- - Make this module replace the http:prefix
option.
- Remove hard-wired support for prefix().
This module provides an abstract specification of HTTP server locations that is inspired on absolute_file_name/3. The specification is done by adding rules to the dynamic multifile predicate http:location/3. The speficiation is very similar to user:file_search_path/2, but takes an additional argument with options. Currently only one option is defined:
- priority(+Integer)
- If two rules match, take the one with highest priority. Using priorities
is needed because we want to be able to overrule paths, but we do not
want to become dependent on clause ordering.
The default priority is 0. Note however that notably libraries may decide to provide a fall-back using a negative priority. We suggest -100 for such cases.
This library predefines three locations at priority -100: The icons
and css aliases are intended for images and css files and
are backed up by file a file-search-path that allows finding the icons
and css files that belong to the server infrastructure (e.g., http_dirindex/2).
- root
- The root of the server. Default is /, but this may be overruled the the
setting (see setting/2)
http:prefix
Here is an example that binds /login to login/1.
The user can reuse this application while moving all locations using a
new rule for the admin location with the option [priority(10)].
:- multifile http:location/3.
:- dynamic http:location/3.
http:location(admin, /, []).
:- http_handler(admin(login), login, []).
login(Request) :-
...
- [det]http_absolute_location(+Spec, -Path, +Options)
- Path is the HTTP location for the abstract specification Spec.
Options:
- relative_to(Base)
- Path is made relative to Base. Default is to generate absolute URLs.
3.18 Library http/html_head -- Automatic inclusion of CSS and scripts links
- To be done
- - Possibly we should add img/4
to include images from symbolic path notation.
- It would be nice if the HTTP file server could use our location declarations.
This library allows for abstract declaration of available CSS and
Javascript resources and their dependencies using html_resource/2.
Based on these declarations, html generating code can declare that it
depends on specific CSS or Javascript functionality, after which this
library ensures that the proper links appear in the HTML head. The
implementation is based on mail system implemented by html_post/2
of library html_write.pl.
Declarations come in two forms. First of all http locations are
declared using the http_path.pl library. Second, html_resource/2
specifies HTML resources to be used in the head and their
dependencies. Resources are currently limited to Javascript files (.js)
and style sheets (.css). It is trivial to add support for other material
in the head. See
html_include/3.
For usage in HTML generation, there is the DCG rule html_requires/3 that demands named resources in the HTML head.
3.18.1 About resource ordering
All calls to html_requires/3 for the page are collected and duplicates are removed. Next, the following steps are taken:
- Add all dependencies to the set
- Replace multiple members by `aggregate' scripts or css files. see use_agregates/4.
- Order all resources by demanding that their dependencies preceede the resource itself. Note that the ordering of resources in the dependency list is ignored. This implies that if the order matters the dependency list must be split and only the primary dependency must be added.
3.18.2 Debugging dependencies
Use ?- debug(html(script)). to see the requested and
final set of resources. All declared resources are in html_resource/3.
The edit/1 command recognises the names of
HTML resources.
- [det]html_resource(+About, +Properties)
- Register an HTML head resource. About is either an atom that
specifies an HTTP location or a term Alias(Sub). This works similar to absolute_file_name/2.
See http:location_path/2 for details.
Recognised properties are:
- requires(+Requirements)
- Other required script and css files. If this is a plain file name, it is interpreted relative to the declared resource. Requirements can be a list, which is equivalent to multiple requires properties.
- virtual(+Bool)
- If
true(defaultfalse), do not include About itself, but only its dependencies. This allows for defining an alias for one or more resources. - aggregate(+List)
- States that About is an aggregate of the resources in List.
- [det]html_requires(+ResourceOrList)
// - Include ResourceOrList and all dependencies derived from it
and add them to the HTML
headusing html_post/2. The actual dependencies are computed during the HTML output phase by html_insert_resource/3.
3.19 Library http/http_pwp -- Serve PWP pages through the HTTP server
- To be done
- - Support elements in the HTML header that
allow controlling the page, such as setting the CGI-header,
authorization, etc.
- Allow external styling. Pass through reply_html_page/2? Allow filtering the DOM before/after PWP?
This module provides convience predicates to include PWP (Prolog Well-formed Pages) in a Prolog web-server. It provides the following predicates:
pwp_handler()/2- This is a complete web-server aimed at serving static pages, some of which include PWP. This API is intended to allow for programming the web-server from a hierarchy of pwp files, prolog files and static web-pages.
reply_pwp_page()/3- Return a single PWP page that is executed in the context of the calling module. This API is intended for individual pages that include so much text that generating from Prolog is undesirable.
- pwp_handler(+Options, +Request)
- Handle PWP files. This predicate is defined to create a simple HTTP
server from a hierarchy of PWP, HTML and other files. The interface is
kept compatible with the library(http/http_dispatch). In the typical
usage scenario, one needs to define an http location and a file-search
path that is used as the root of the server. E.g., the following
declarations create a self-contained web-server for files in
/web/pwp/.user:file_search_path(pwp, '/web/pwp'). :- http_handler(root(.), pwp_handler([path_alias(pwp)]), [prefix]).
Options include:
- path_alias(+Alias)
- Search for PWP files as Alias(Path). See absolute_file_name/3.
- index(+Index)
- Name of the directory index (pwp) file. This option may appear multiple
times. If no such option is provided,
pwp_handler/2 looks for
index.pwp. - view(+Boolean)
- If
true(default isfalse), allow for ?view=source to serve PWP file as source. - index_hook(:Hook)
- If a directory has no index-file, pwp_handler/2 calls Hook(PhysicalDir, Options, Request). If this semidet predicate succeeds, the request is considered handled.
- hide_extensions(+List)
- Hide files of the given extensions. The default is to hide .pl files.
- Errors
- permission_error(index, http_location, Location) is raised if the handler resolves to a directory that has no index.
- See also
- reply_pwp_page/3
- reply_pwp_page(:File, +Options, +Request)
- Reply a PWP file. This interface is provided to server individual
locations from PWP files. Using a PWP file rather than generating the
page from Prolog may be desirable because the page contains a lot of
text (which is cumbersome to generate from Prolog) or because the
maintainer is not familiar with Prolog.
Options supported are:
- mime_type(+Type)
- Serve the file using the given mime-type. Default is text/html.
- unsafe(+Boolean)
- Passed to http_safe_file/2 to check for unsafe paths.
- pwp_module(+Boolean)
- If
true, (defaultfalse), process the PWP file in a module constructed from its canonical absolute path. Otherwise, the PWP file is processed in the calling module.
Initial context:
- SCRIPT_NAME
- Virtual path of the script.
- SCRIPT_DIRECTORY
- Physical directory where the script lives
- QUERY
- Var=Value list representing the query-parameters
- REMOTE_USER
- If access has been authenticated, this is the authenticated user.
- REQUEST_METHOD
- One of
get,post,putorhead - CONTENT_TYPE
- Content-type provided with HTTP POST and PUT requests
- CONTENT_LENGTH
- Content-length provided with HTTP POST and PUT requests
While processing the script, the file-search-path pwp includes the current location of the script. I.e., the following will find myprolog in the same directory as where the PWP file resides.
pwp:ask="ensure_loaded(pwp(myprolog))"
- See also
- pwp_handler/2.
- To be done
- complete the initial context, as far as possible from CGI variables. See http://hoohoo.ncsa.illinois.edu/docs/cgi/env.html
3.20 Security
Writing servers is an inherently dangerous job that should be carried out with some considerations. You have basically started a program on a public terminal and invited strangers to use it. When using the interactive server or inetd based server the server runs under your privileges. Using CGI scripted it runs with the privileges of your web-server. Though it should not be possible to fatally compromise a Unix machine using user privileges, getting unconstrained access to the system is highly undesirable.
Symbolic languages have an additional handicap in their inherent possibilities to modify the running program and dynamically create goals (this also applies to the popular perl and java scripting languages). Here are some guidelines.
- Check your input
Hardly anything can go wrong if you check the validity of query-arguments before formulating an answer. - Check filenames
If part of the query consists of filenames or directories, check them. This also applies to files you only read. Passing names as/etc/passwd, but also../../../../../etc/passwdare tried by experienced hackers to learn about the system they want to attack. So, expand provided names using absolute_file_name/[2,3] and verify they are inside a folder reserved for the server. Avoid symbolic links from this subtree to the outside world. The example below checks validity of filenames. The first call ensures proper canonisation of the paths to avoid an mismatch due to symbolic links or other filesystem ambiguities.check_file(File) :- absolute_file_name('/path/to/reserved/area', Reserved), absolute_file_name(File, Tried), atom_concat(Reserved, _, Tried). - Check scripts
Should input in any way activate external scripts using shell/1 oropen(pipe(Command), ...), verify the argument once more. - Check meta-calling
The attractive situation for you and your attacker is below:reply(Query) :- member(search(Args), Query), member(action=Action, Query), member(arg=Arg, Query), call(Action, Arg). % NEVER EVER DO THIS!All your attacker has to do is specify Action as
shelland Arg as/bin/shand he has an uncontrolled shell!
3.21 Tips and tricks
- URL Locations
With an application in mind, it is tempting to make all URL locations short and directly connected to the root (/). This is not a good idea. It is adviced to have all locations in a server below a directory with an informative name. Consider to make the root location something that can be changed using a global setting.- Page generating code can easily be reused. Using locations directly below the root however increases the likelihood of conflicts.
- Multiple servers can be placed behind the same public server as explained in section 3.9.7. Using a common and fairly unique root, redirection is much easier and less likely to lead to conflicts.
- Debugging
Please check the section ``Thread-support library(threadutil)'' of the SWI-Prolog reference manual.