This library provides the inverse functionality of the sgml.pl parser
library, writing XML, SGML and HTML documents from the parsed output. It
is intended to allow rewriting in a different dialect or encoding or to
perform document transformation in Prolog on the parsed representation.
The current implementation is particularly keen on getting character
encoding and the use of character entities right. Some work has been
done providing layout, but space handling in XML and SGML make this a
very hazardous area.
The Prolog-based low-level character and escape handling is the real
bottleneck in this library and will probably be moved to C in a later
stage.
- See also
- - library(http/html_write) provides a high-level library for
emitting HTML and XHTML.
- xml_write(+Data, +Options) is det
- sgml_write(+Data, +Options) is det
- html_write(+Data, +Options) is det
- xml_write(+Stream, +Data, +Options) is det
- sgml_write(+Stream, +Data, +Options) is det
- html_write(+Stream, +Data, +Options) is det
- Write a term as created by the SGML/XML parser to a stream in
SGML or XML format. Options:
- cleanns(Bool)
- If
true
(default), remove duplicate xmlns
attributes.
- dtd(DTD)
- The DTD. This is needed for SGML documents that contain
elements with content model EMPTY. Characters which may
not be written directly in the Stream's encoding will be
written using character data entities from the DTD if at
all possible, otherwise as numeric character references.
Note that the DTD will NOT be written out at all; as yet
there is no way to write out an internal subset, though
it would not be hard to add one.
- doctype(DocType)
- Document type for the SGML document type declaration.
If omitted it is taken from the root element. There is
never any point in having this be disagree with the
root element. A <!DOCTYPE> declaration will be written
if and only if at least one of
doctype(_)
, public(_)
, or
system(_)
is provided in Options.
- public PubId
- The public identifier to be written in the <!DOCTYPE> line.
- system(SysId)
- The system identifier to be written in the <!DOCTYPE> line.
- header(Bool)
- If Bool is 'false', do not emit the <xml ...> header
line. (xml_write/3 only)
- nsmap(Map:list(Id=URI))
- When emitting embedded XML, assume these namespaces
are already defined from the environment. (xml_write/3
only).
- indent(Indent)
- Indentation of the document (for embedding)
- layout(Bool)
- Emit/do not emit layout characters to make output
readable.
- net(Bool)
- Use/do not use Null End Tags.
For XML, this applies only to empty elements, so you get
<foo/> (default, net(true))
<foo></foo> (net(false))
For SGML, this applies to empty elements, so you get
<foo> (if foo is declared to be EMPTY in the DTD)
<foo></foo> (default, net(false))
<foo// (net(true))
and also to elements with character content not containing /
<b>xxx</b> (default, net(false))
<b/xxx/ (net(true)).
Note that if the stream is UTF-8, the system will write special
characters as UTF-8 sequences, while if it is ISO Latin-1 it
will use (character) entities if there is a DTD that provides
them, otherwise it will use numeric character references.
- emit_doctype(+Options, +Data, +Stream)[private]
- Emit the document-type declaration.
There is a problem with the first clause if we are emitting SGML:
the SGML DTDs for HTML up to HTML 4 do not allow any 'version'
attribute; so the only time this is useful is when it is illegal!
- emit(+Element, +Out, +State, +Options)[private]
- Emit a single element
- update_nsmap(+Attributes, -Attributes1, !State)[private]
- Modify the nsmap of State to reflect modifications due to xmlns
arguments.
- Arguments:
-
Attributes1 | - is a copy of Attributes with all redundant
namespace attributes deleted. |
- content(+Content, +Out, +Element, +State, +Options)[private]
- Emit the content part of a structure as well as the termination
for the content. For empty content we have three versions: XML
style '/>', SGML declared EMPTY element (nothing) or normal SGML
element (we must close with the same element name).
- add_missing_namespaces(+DOM0, +NsMap, -DOM)[private]
- Add xmlns:NS=URI definitions to the toplevel
element(s)
to
deal with missing namespaces.
- generate_ns(+URI, -NS, -URL) is det[private]
- Generate a namespace (NS) identifier for URI.
- xmlns(?NS, ?URI) is nondet[multifile]
- Hook to define human readable abbreviations for XML namespaces.
xml_write/3 tries these locations:
- This hook
- Defaults (see below)
- rdf_db:ns/2 for RDF-DB integration
Default XML namespaces are:
- See also
- - xml_write/2, rdf_register_ns/2.
- missing_namespaces(+DOM, +NSMap, -Missing)[private]
- Return a list of URIs appearing in DOM that are not covered
by xmlns definitions.
- writeq(+Text:codes, +Out:stream, +Escape:atom, +Escape:assoc) is det[private]
- empty_element(+State, +Element)[private]
- True if Element is declared with EMPTY content in the (SGML)
DTD.
- dtd_character_entities(+DTD, -Map)[private]
- Return an assoc mapping character entities to their name. Note
that the entity representation is a bit dubious. Entities should
allow for a wide-character version and avoid the &#..; trick.
Re-exported predicates
The following predicates are exported from this file while their implementation is defined in imported modules or non-module files loaded by this module.
- xml_write(+Data, +Options) is det
- sgml_write(+Data, +Options) is det
- html_write(+Data, +Options) is det
- xml_write(+Stream, +Data, +Options) is det
- sgml_write(+Stream, +Data, +Options) is det
- html_write(+Stream, +Data, +Options) is det
- Write a term as created by the SGML/XML parser to a stream in
SGML or XML format. Options:
- cleanns(Bool)
- If
true
(default), remove duplicate xmlns
attributes.
- dtd(DTD)
- The DTD. This is needed for SGML documents that contain
elements with content model EMPTY. Characters which may
not be written directly in the Stream's encoding will be
written using character data entities from the DTD if at
all possible, otherwise as numeric character references.
Note that the DTD will NOT be written out at all; as yet
there is no way to write out an internal subset, though
it would not be hard to add one.
- doctype(DocType)
- Document type for the SGML document type declaration.
If omitted it is taken from the root element. There is
never any point in having this be disagree with the
root element. A <!DOCTYPE> declaration will be written
if and only if at least one of
doctype(_)
, public(_)
, or
system(_)
is provided in Options.
- public PubId
- The public identifier to be written in the <!DOCTYPE> line.
- system(SysId)
- The system identifier to be written in the <!DOCTYPE> line.
- header(Bool)
- If Bool is 'false', do not emit the <xml ...> header
line. (xml_write/3 only)
- nsmap(Map:list(Id=URI))
- When emitting embedded XML, assume these namespaces
are already defined from the environment. (xml_write/3
only).
- indent(Indent)
- Indentation of the document (for embedding)
- layout(Bool)
- Emit/do not emit layout characters to make output
readable.
- net(Bool)
- Use/do not use Null End Tags.
For XML, this applies only to empty elements, so you get
<foo/> (default, net(true))
<foo></foo> (net(false))
For SGML, this applies to empty elements, so you get
<foo> (if foo is declared to be EMPTY in the DTD)
<foo></foo> (default, net(false))
<foo// (net(true))
and also to elements with character content not containing /
<b>xxx</b> (default, net(false))
<b/xxx/ (net(true)).
Note that if the stream is UTF-8, the system will write special
characters as UTF-8 sequences, while if it is ISO Latin-1 it
will use (character) entities if there is a DTD that provides
them, otherwise it will use numeric character references.
- xml_write(+Data, +Options) is det
- sgml_write(+Data, +Options) is det
- html_write(+Data, +Options) is det
- xml_write(+Stream, +Data, +Options) is det
- sgml_write(+Stream, +Data, +Options) is det
- html_write(+Stream, +Data, +Options) is det
- Write a term as created by the SGML/XML parser to a stream in
SGML or XML format. Options:
- cleanns(Bool)
- If
true
(default), remove duplicate xmlns
attributes.
- dtd(DTD)
- The DTD. This is needed for SGML documents that contain
elements with content model EMPTY. Characters which may
not be written directly in the Stream's encoding will be
written using character data entities from the DTD if at
all possible, otherwise as numeric character references.
Note that the DTD will NOT be written out at all; as yet
there is no way to write out an internal subset, though
it would not be hard to add one.
- doctype(DocType)
- Document type for the SGML document type declaration.
If omitted it is taken from the root element. There is
never any point in having this be disagree with the
root element. A <!DOCTYPE> declaration will be written
if and only if at least one of
doctype(_)
, public(_)
, or
system(_)
is provided in Options.
- public PubId
- The public identifier to be written in the <!DOCTYPE> line.
- system(SysId)
- The system identifier to be written in the <!DOCTYPE> line.
- header(Bool)
- If Bool is 'false', do not emit the <xml ...> header
line. (xml_write/3 only)
- nsmap(Map:list(Id=URI))
- When emitting embedded XML, assume these namespaces
are already defined from the environment. (xml_write/3
only).
- indent(Indent)
- Indentation of the document (for embedding)
- layout(Bool)
- Emit/do not emit layout characters to make output
readable.
- net(Bool)
- Use/do not use Null End Tags.
For XML, this applies only to empty elements, so you get
<foo/> (default, net(true))
<foo></foo> (net(false))
For SGML, this applies to empty elements, so you get
<foo> (if foo is declared to be EMPTY in the DTD)
<foo></foo> (default, net(false))
<foo// (net(true))
and also to elements with character content not containing /
<b>xxx</b> (default, net(false))
<b/xxx/ (net(true)).
Note that if the stream is UTF-8, the system will write special
characters as UTF-8 sequences, while if it is ISO Latin-1 it
will use (character) entities if there is a DTD that provides
them, otherwise it will use numeric character references.
- xml_write(+Data, +Options) is det
- sgml_write(+Data, +Options) is det
- html_write(+Data, +Options) is det
- xml_write(+Stream, +Data, +Options) is det
- sgml_write(+Stream, +Data, +Options) is det
- html_write(+Stream, +Data, +Options) is det
- Write a term as created by the SGML/XML parser to a stream in
SGML or XML format. Options:
- cleanns(Bool)
- If
true
(default), remove duplicate xmlns
attributes.
- dtd(DTD)
- The DTD. This is needed for SGML documents that contain
elements with content model EMPTY. Characters which may
not be written directly in the Stream's encoding will be
written using character data entities from the DTD if at
all possible, otherwise as numeric character references.
Note that the DTD will NOT be written out at all; as yet
there is no way to write out an internal subset, though
it would not be hard to add one.
- doctype(DocType)
- Document type for the SGML document type declaration.
If omitted it is taken from the root element. There is
never any point in having this be disagree with the
root element. A <!DOCTYPE> declaration will be written
if and only if at least one of
doctype(_)
, public(_)
, or
system(_)
is provided in Options.
- public PubId
- The public identifier to be written in the <!DOCTYPE> line.
- system(SysId)
- The system identifier to be written in the <!DOCTYPE> line.
- header(Bool)
- If Bool is 'false', do not emit the <xml ...> header
line. (xml_write/3 only)
- nsmap(Map:list(Id=URI))
- When emitting embedded XML, assume these namespaces
are already defined from the environment. (xml_write/3
only).
- indent(Indent)
- Indentation of the document (for embedding)
- layout(Bool)
- Emit/do not emit layout characters to make output
readable.
- net(Bool)
- Use/do not use Null End Tags.
For XML, this applies only to empty elements, so you get
<foo/> (default, net(true))
<foo></foo> (net(false))
For SGML, this applies to empty elements, so you get
<foo> (if foo is declared to be EMPTY in the DTD)
<foo></foo> (default, net(false))
<foo// (net(true))
and also to elements with character content not containing /
<b>xxx</b> (default, net(false))
<b/xxx/ (net(true)).
Note that if the stream is UTF-8, the system will write special
characters as UTF-8 sequences, while if it is ISO Latin-1 it
will use (character) entities if there is a DTD that provides
them, otherwise it will use numeric character references.
- xml_write(+Data, +Options) is det
- sgml_write(+Data, +Options) is det
- html_write(+Data, +Options) is det
- xml_write(+Stream, +Data, +Options) is det
- sgml_write(+Stream, +Data, +Options) is det
- html_write(+Stream, +Data, +Options) is det
- Write a term as created by the SGML/XML parser to a stream in
SGML or XML format. Options:
- cleanns(Bool)
- If
true
(default), remove duplicate xmlns
attributes.
- dtd(DTD)
- The DTD. This is needed for SGML documents that contain
elements with content model EMPTY. Characters which may
not be written directly in the Stream's encoding will be
written using character data entities from the DTD if at
all possible, otherwise as numeric character references.
Note that the DTD will NOT be written out at all; as yet
there is no way to write out an internal subset, though
it would not be hard to add one.
- doctype(DocType)
- Document type for the SGML document type declaration.
If omitted it is taken from the root element. There is
never any point in having this be disagree with the
root element. A <!DOCTYPE> declaration will be written
if and only if at least one of
doctype(_)
, public(_)
, or
system(_)
is provided in Options.
- public PubId
- The public identifier to be written in the <!DOCTYPE> line.
- system(SysId)
- The system identifier to be written in the <!DOCTYPE> line.
- header(Bool)
- If Bool is 'false', do not emit the <xml ...> header
line. (xml_write/3 only)
- nsmap(Map:list(Id=URI))
- When emitting embedded XML, assume these namespaces
are already defined from the environment. (xml_write/3
only).
- indent(Indent)
- Indentation of the document (for embedding)
- layout(Bool)
- Emit/do not emit layout characters to make output
readable.
- net(Bool)
- Use/do not use Null End Tags.
For XML, this applies only to empty elements, so you get
<foo/> (default, net(true))
<foo></foo> (net(false))
For SGML, this applies to empty elements, so you get
<foo> (if foo is declared to be EMPTY in the DTD)
<foo></foo> (default, net(false))
<foo// (net(true))
and also to elements with character content not containing /
<b>xxx</b> (default, net(false))
<b/xxx/ (net(true)).
Note that if the stream is UTF-8, the system will write special
characters as UTF-8 sequences, while if it is ISO Latin-1 it
will use (character) entities if there is a DTD that provides
them, otherwise it will use numeric character references.
- xml_write(+Data, +Options) is det
- sgml_write(+Data, +Options) is det
- html_write(+Data, +Options) is det
- xml_write(+Stream, +Data, +Options) is det
- sgml_write(+Stream, +Data, +Options) is det
- html_write(+Stream, +Data, +Options) is det
- Write a term as created by the SGML/XML parser to a stream in
SGML or XML format. Options:
- cleanns(Bool)
- If
true
(default), remove duplicate xmlns
attributes.
- dtd(DTD)
- The DTD. This is needed for SGML documents that contain
elements with content model EMPTY. Characters which may
not be written directly in the Stream's encoding will be
written using character data entities from the DTD if at
all possible, otherwise as numeric character references.
Note that the DTD will NOT be written out at all; as yet
there is no way to write out an internal subset, though
it would not be hard to add one.
- doctype(DocType)
- Document type for the SGML document type declaration.
If omitted it is taken from the root element. There is
never any point in having this be disagree with the
root element. A <!DOCTYPE> declaration will be written
if and only if at least one of
doctype(_)
, public(_)
, or
system(_)
is provided in Options.
- public PubId
- The public identifier to be written in the <!DOCTYPE> line.
- system(SysId)
- The system identifier to be written in the <!DOCTYPE> line.
- header(Bool)
- If Bool is 'false', do not emit the <xml ...> header
line. (xml_write/3 only)
- nsmap(Map:list(Id=URI))
- When emitting embedded XML, assume these namespaces
are already defined from the environment. (xml_write/3
only).
- indent(Indent)
- Indentation of the document (for embedding)
- layout(Bool)
- Emit/do not emit layout characters to make output
readable.
- net(Bool)
- Use/do not use Null End Tags.
For XML, this applies only to empty elements, so you get
<foo/> (default, net(true))
<foo></foo> (net(false))
For SGML, this applies to empty elements, so you get
<foo> (if foo is declared to be EMPTY in the DTD)
<foo></foo> (default, net(false))
<foo// (net(true))
and also to elements with character content not containing /
<b>xxx</b> (default, net(false))
<b/xxx/ (net(true)).
Note that if the stream is UTF-8, the system will write special
characters as UTF-8 sequences, while if it is ISO Latin-1 it
will use (character) entities if there is a DTD that provides
them, otherwise it will use numeric character references.