|Did you know ...||Search Documentation:|
|Loading Structured Documents|
SGML or XML files are loaded through the common predicate load_structure/3. This is a predicate with many options. For simplicity a number of commonly used shorthands are provided: load_sgml_file/2, load_xml_file/2, and load_html_file/2.
stream(StreamHandle)or a file-name. Options is a list of options controlling the conversion process.
A proper XML document contains only a single toplevel element whose name matches the document type. Nevertheless, a list is returned for consistency with the representation of element content. The ListOfContent consists of the following types:
CDATA. Note this is possible in SWI-Prolog, as there is no length-limit on atoms and atom garbage collection is provided.
ListOfAttributes is a list of Name=Value
pairs for attributes. Attributes of type
CDATA are returned
literal. Multi-valued attributes (
NAMES, etc.) are
returned as a list of atoms. Handling attributes of the types
NUMBERS depends on the setting of the
By default they are returned as atoms, but automatic conversion to
Prolog integers is supported. ListOfContent defines the
content for the element.
SDATAis encountered, this term is returned holding the data in Text.
NDATAis encountered, this term is returned holding the data in Text.
<?...?>), Text holds the text of the processing instruction. Please note that the
<?xml ...?>instruction is handled internally.
The Options list controls the conversion process. Currently defined options are below. Other options are passed to sgml_parse/2.
<!DOCTYPE ...>declaration is ignored and the document is parsed and validated against the provided DTD. If provided as a variable, the created DTD is returned. See section 3.5.
xmlns. See the option
dialectof set_sgml_parser/2 for details.
is accepted with warning as part of an unquoted attribute-value, though
/>still closes the element-tag in XML mode. It may be set to false for parsing HTML documents to allow for unquoted URLs containing
xml:space. See section 3.2.
NUMBERSare handled. If
token(default) they are passed as an atom. If
integerthe parser attempts to convert the value to an integer. If successful, the attribute is passed as a Prolog integer. Otherwise it is still passed as an atom. Note that SGML defines a numeric attribute to be a sequence of digits. The
sign is not allowed and
1is different from
01. For this reason the default is to handle numeric attributes as tokens. If conversion to integer is enabled, negative values are silently accepted.
truefor XML and
falsefor SGML and HTML dialects.
false. Setting this option sets the
case_sensitive_attributesto the same value. This option was added to support HTML quasi quotations and most likely has little value in other contexts.
false, only the attributes occurring in the source are emitted.
CDATAentities can be specified with this construct. Multiple entity options are allowed.
max_memory(0)(the default) means no resource limit will be enforced.
string. The choice is not obvious. Strings are allocated on the Prolog stacks and subject to normal stack garbage collection. They are quicker to create and avoid memory fragmentation. But, multiple copies of the same string are stored multiple times, while the text is shared if atoms are used. Strings are also useful for security sensitive information as they are invisible to other threads and cannot be enumerated using, e.g., current_atom/1. Finally, using strings allows for resource usage limits using the global stack limit (see set_prolog_stack/2).
string. See above for the advantages and disadvantages of using strings.
true, xmlns namespaces with prefixes are returned as
ns(Prefix, URI)terms. If
false(default), the prefix is ignored and the xmlns namespace is returned as just the URI.