|Did you know ...||Search Documentation:|
This package allows you to parse SGML, XML and HTML data into a
Prolog data structure. The high-level interface defined in
provides access at the file-level, while the low-level interface defined
in the foreign module works with Prolog streams. Please use the source
sgml.pl as a starting point for dealing with data from
other sources than files, such as SWI-Prolog resources, network-sockets,
character strings, etc. The first example below loads an HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>Demo</title> </head> <body> <h1 align=center>This is a demo</title> Paragraphs in HTML need not be closed. This is called `omitted-tag' handling. </body> </html>
?- load_html('test.html', Term, ), pretty_print(Term). [ element(html, , [ element(head, , [ element(title, , [ 'Demo' ]) ]), element(body, , [ '\n', element(h1, [ align = center ], [ 'This is a demo' ]), '\n\n', element(p, , [ 'Paragraphs in HTML need not be closed.\n' ]), element(p, , [ 'This is called `omitted-tag\' handling.' ]) ]) ]) ].
The document is represented as a list, each element being an atom to
CDATA or a term
Content). Entities (e.g.
<) are expanded and
included in the atom representing the element content or attribute
value.1Up to SWI-Prolog 5.4.x,
Prolog could not represent wide characters and entities that
did not fit in the Prolog characters set were emitted as a term
With the introduction of wide characters in the 5.5 branch this is no
These predicates are for basic use of the library, converting entire and self-contained files in SGML, HTML, or XML into a structured term. They are based on load_structure/3.
dialect(HTMLDialect), where HTMLDialect is
html5(default), depending on the Prolog flag
html_dialect. Both imply the option
shorttag(false). The option
dtd(DTD)is passed, where DTD is the HTML DTD as obtained using
dtd(html, DTD). See dtd/2.