|Did you know ...||Search Documentation:|
sgml, but implies
shorttag(false)and accepts XML empty element declarations (e.g.,
html, accept attributes named
data-without warning. This value initialises the charset to UTF-8.
xhtml5accepts attributes named
<?xml ...>is encountered. See section 3.3 for details.
xmlns) mode. Default and standard compliant is not to qualify such elements. If
true, such attributes are qualified with the namespace of the element they appear in. This option is for backward compatibility as this is the behaviour of older versions. In addition, the namespace document suggests unqualified attributes are often interpreted in the namespace of their element.
token(default), attributes of type number are passed as a Prolog atom. If
integer, such attributes are translated into Prolog integers. If the conversion fails (e.g. due to overflow) a warning is issued and the value is passed as an atom.
encoding=attribute in the header. Explicit use of this option is only required to parse non-conforming documents. Currently accepted values are
<!DOCTYPEdeclaration has been parsed, the default is the defined doctype. The parser can be instructed to accept the first element encountered as the toplevel using
doctype(_). This feature is especially useful when parsing part of a document (see the
parseoption to sgml_parse/2.
on_begin, etc. callbacks from sgml_parse/2.
end) is caused by an element written down using the shorttag notation (
#pcdatais part of Elements. If no element is open, the doctype is returned.
This option is intended to support syntax-sensitive editors. Such an editor should load the DTD, find an appropriate starting point and then feed all data between the starting point and the caret into the parser. Next it can use this option to determine the elements allowed at this point. Below is a code fragment illustrating this use given a parser with loaded DTD, an input stream and a start-location.
..., seek(In, Start, bof, _), set_sgml_parser(Parser, charpos(Start)), set_sgml_parser(Parser, doctype(_)), Len is Caret - Start, sgml_parse(Parser, [ source(In), content_length(Len), parse(input) % do not complete document ]), get_sgml_parser(Parser, allowed(Allowed)), ...
Input is a stream. A full description of the option-list is below.
string. See load_structure/3 for details.
source(Stream), this implies reading is stopped as soon as the element is complete, and another call may be issued on the same stream to read the next element.
elementbut assumes the element has already been opened. It may be used in a call-back from
call(to parse individual elements after validating their headers.
allowed(Elements)option of get_sgml_parser/2. It disables the parser's default to complete the parse-tree by closing all open elements.
max_errors(-1)makes the parser continue, no matter how many errors it encounters.
error(limit_exceeded(max_errors, Max), _)
quiet, the error is suppressed. Can be used together with
call(urlns, Closure)to provide external expansion of namespaces. See also section 3.3.1.
Handler(+Tag, +Attributes, +Parser).
Handler(+CDATA, +Parser), where CDATA is an atom representing the data.
Handler(+Text, +Parser), where Text is the text of the processing instruction.
<!...>) has been read. The named handler is called with two arguments:
Handler(+Text, +Parser), where Text is the text of the declaration with comments removed.
This option is expecially useful for highlighting declarations and comments in editor support, where the location of the declaration is extracted using get_sgml_parser/2.
Handler(+Severity, +Message, +Parser), where Severity is one of
errorand Message is an atom representing the diagnostic message. The location of the error can be determined using get_sgml_parser/2
If this option is present, errors and warnings are not reported using print_message/3
xmlnsmode, a new namespace declaraction is pushed on the environment. The named handler is called with three arguments:
Handler(+NameSpace, +URL, +Parser). See section 3.3.1 for details.
xmlnsmode, this predicate can be used to map a url into either a canonical URL for this namespace or another internal identifier. See section 3.3.1 for details.
In some cases, part of a document needs to be parsed. One option is
to use load_structure/2
or one of its variations and extract the desired elements from the
returned structure. This is a clean solution, especially on small and
medium-sized documents. It however is unsuitable for parsing really big
documents. Such documents can only be handled with the call-back output
interface realised by the
call(Event, Action) option of sgml_parse/2.
Event-driven processing is not very natural in Prolog.
The SGML2PL library allows for a mixed approach. Consider the case
where we want to process all descriptions from RDF elements in a
document. The code below calls
on each element that is directly inside an RDF element.
:- dynamic in_rdf/0. load_rdf(File) :- retractall(in_rdf), open(File, read, In), new_sgml_parser(Parser, ), set_sgml_parser(Parser, file(File)), set_sgml_parser(Parser, dialect(xml)), sgml_parse(Parser, [ source(In), call(begin, on_begin), call(end, on_end) ]), close(In). on_end('RDF', _) :- retractall(in_rdf). on_begin('RDF', _, _) :- assert(in_rdf). on_begin(Tag, Attr, Parser) :- in_rdf, !, sgml_parse(Parser, [ document(Content), parse(content) ]), process_rdf_description(element(Tag, Attr, Content)).